Apple - PowerPC G5 - Execution Core

A neo-cortex that finishes its thoughts before thinking them.

Within the brains of the PowerPC G5 is more processing power than you’ve ever experienced from a desktop chip. Its massively parallel circuits are capable of handling multiple assorted tasks at the same time. Called an execution core, it’s where your Mac does all its thinking.

The design was derived from IBM’s 64-bit POWER4 processor — recipient of the Microprocessor Report’s 2001 Analyst’s Choice Award for Best Workstation/Server Processor, which recognizes excellence in semiconductor technology innovation, design and implementation. With two double-precision floating-point units, advanced branch prediction logic and a high-bandwidth frontside bus, the POWER4 drives IBM’s successful eBusiness servers.

To that superscalar, superpipelined execution core, Apple and IBM added the Velocity Engine to the design, so that every Mac OS X application would run efficiently. Additionally, the PowerPC G5 features processing innovations that optimize the flow of data and instructions — meaning the PowerPC G5 can pump through more than 200 in-flight instructions at a time, a whopping 71% more than the 32-bit Pentium 4.

L2Cache
512K of L2 cache provide the execution core with ultrafast 64-MBps access to data and instructions.

L1 Cache
Instructions are prefetched from the L2 cache into a large, direct-mapped 64K L1 cache at 64 GBps. In addition, 32K of L1 data cache can prefetch up to eight active data streams simultaneously.

Fetch and Decode
As they are accessed from the L1 cache, up to eight instructions per clock cycle are fetched, decoded and divided into smaller, easier-to-schedule operations. This efficient preparation maximizes processing speed as instructions are dispatched into the execution core and data is loaded into the large number of registers behind the functional units.

Dispatch
Before instructions are dispatched into the functional units, they are arranged into groups of up to five. Within the core alone, the PowerPC G5 can track up to 20 groups at a time, or 100 individual instructions. This efficient group-tracking scheme enables the PowerPC G5 to manage an unusually large number of instructions “in flight”: 20 instructions in each of the five dispatch groups and an additional 100-plus instructions in the various fetch, decode and queue stages.

Queue
Once an instruction group is dispatched into the execution core, it is broken into individual instructions, which proceed to the appropriate functional unit. Each functional unit has its own dedicated queue, where multiple instructions are arranged for processing in whatever order is required.

Optimized Velocity Engine
The PowerPC G5 uses an optimized dual-pipelined Velocity Engine with two independent queues and dedicated 128-bit registers and data paths for efficient instruction and data flow. This vector processing unit accelerates data manipulation by applying a single instruction to multiple data at the same time, known as SIMD processing. The Velocity Engine in the PowerPC G5 uses the same set of 162 instructions as in the PowerPC G4, so it can run — and accelerate — existing Mac OS X applications already optimized for the Velocity Engine.

Two Double-Precision Floating-Point Units
Two double-precision floating-point units provide the precision required for highly complex scientific computations. Although 32-bit processors are able to execute double-precision 64-bit calculations by cycling through the floating-point math unit multiple times, a double-precision math unit on a 64-bit processor can complete the same calculation in a single clock cycle. Two double- precision floating-point units let the G5 complete at least two 64-bit mathematical calculations per clock cycle. This dramatically accelerates highly complex computations critical in genome-matching applications and in many of the filters used to manipulate or render 3D graphics and video content.

Two Integer Units
Integer units perform simple integer mathematics — such as add, subtract and compare — which are commonly used in many basic computer functions, as well as in imaging, video and audio applications. The PowerPC G5 has two integer units capable of a broad range of simple and complex instructions involving both 32-bit or 64-bit calculations. What’s more, they take full advantage of the processor’s 64-bit registers and data paths to complete 64-bit integer calculations in a single pass.

Load/Store
At the same time as instructions are queued, the load/store units load the associated data from L1 cache into the data registers behind the units that will be processing the data. After the instructions manipulate the data, these units store it back to L1 cache, L2 cache or main memory. Each functional unit is generously equipped with 32 registers that are 128-bit wide on the Velocity Engine and 64-bit wide on the floating-point units and the integer units. With two load/store units, the PowerPC G5 is able to keep these registers filled with data for maximum processing efficiency.

Condition Register
This special 32-bit register summarizes the states of the floating-point and integer units. The condition register also indicates the results of comparison operations and provides a means for testing them as branch conditions. By bridging information between the branch unit and other functional units, the condition register improves the flow of data throughout the execution core.

Three Component Branch Prediction Logic
The PowerPC G5 usually knows the answer before it asks the question, using branch prediction and speculative operation to increase efficiency. Like finishing someone else’s sentences, branch prediction anticipates which instruction should go next, and speculative operation causes that instruction to be executed. If the prediction is correct, the processor works more efficiently — since the speculative operation has executed an instruction before it’s required, as with a conversation that seems to be a mind meld. If the prediction is incorrect, the processor must clear the unneeded instruction and associated data, resulting in an empty space called a pipeline bubble. Pipeline bubbles reduce performance as the processor marks time waiting for the next instruction, not unlike wasting time hearing how very wrong your assumptions were. The G5 can predict branch processes with an accuracy of up to 95%, allowing the chip to efficiently use every processing cycle.

Complete
When operations on the data are complete, the PowerPC G5 recombines the instructions into the original groups of five and the load/store units store the data in cache or main memory for further processing.

.	.	.
--	PowerMac G5	Juli.
-	64-bit breakthrough	2003