The Active Network
ActiveWin: Reviews Active Network | New Reviews | Old Reviews | Interviews |Mailing List | Forums


Product: Pentium 4 1.5GHz
Company: Intel
Estimated Street Price: $848.00
Review By: Julien JAY

CPU Architecture

Table Of Contents
1: Introduction
2: CPU Architecture
: SSE2 & CPU Design
4: i82850 Chipset
5: Intel D850GB Motherboard
6: Benchmarks

7: BenchMarks' Results Analysis
8: Conclusion

    This gets complicated! Built on a P6 core engine, the Pentium 4 is the first processor from the brand new IA-32 NetBurst micro-architecture that allows operating at higher performance levels and clock speeds when compared to previous IA-32 based processors. The NetBurst architecture really boosts performances but don’t think that it’ll boost your internet downloading times, transfer rate, etc. The name of the architecture has no link with the internet, it's just a marketing trick. With the NetBurst architecture, Pentium 4 processors promise to support without any problem a several Gigahertz clock speed without the need for Intel to make major changes in its manufacturing process.


Intel Pentium 4 Die


Review Quotes
  "NetBurst architecture brings a major enhancement known as the Rapid Execution Engine to the superscalar architecture "  

The NetBurst architecture is also the first one to use a 20 stage pipeline against only 10 for the Pentium III, that can stores up to 126 instructions –in flight-. A pipeline is a group of units that achieve to work together hand-in-hand in order to handle software instructions. With more pipelines, tasks are managed in a shorter time and require fewer transistors than before, allowing higher frequency operation. If using more pipelines present several advantages it has also a major drawback: to handle the software instructions the processor tries to guess which one will be the next using some tests. With a pipeline enabled CPU the instruction that follows the test should be managed before the processor knows the test result in order to continually feed the pipeline. To know which instructions should be used the CPU uses a ‘branch prediction’ mechanism: most of the time the CPU runs instructions it has already ran before and probably knows the result ahead of time. It has a four times larger BTB (branch target buffer) than on Pentium III to store the history of all previous tests results in 4KB memory helping software to make decisions. If the CPU encounters a test that has already ran it’ll use the same branch as before in order to accelerate its work speed. Pentium 4 processors achieve more than 94% of successful predictions (against only 90% for a Pentium III which Intel claims to be a gain of 33%). But in case of a test failure the whole BTB is trashed as well as all the pipelines in order for the CPU to restart the operation: this process obviously slows down the whole performance of the computer. The Pentium 4 CPU also takes charge of  ‘out of order’ instructions in order to not block ALU processes unlike when they are run in ordered mode. Like with every P6 based processor the Pentium 4 comes with two arithmetic logic units and one floating point unit known as superscalar architecture (Pentium CPUs were the first to use it). NetBurst architecture brings a major enhancement known as the Rapid Execution Engine to the superscalar architecture since both the ALU (Arithmetic Logic Unit) & the AGU (Address Generation Unit that manages where data are stored and loaded in the correct address) work twice as fast as the CPU frequency, so it can now handle four instructions per cycle rather than two before. For those of you who don’t know an ALU is the name that was given to the integer unit that manages math related operations like dividing, adding, multiplying as well as logical operators like ‘OR’, ‘AND’, ‘XOR’, etc. Just like every good superscalar processor worth of this name, the Pentium 4 still includes a ‘Micro Operation Operand’ Unit that comes with simple instructions directly managed by the processor: most of the time x86 instructions are converted into Ops.


Intel Pentium 4 Architecture Schema


    With the 486 DX4 and the Pentium, Intel introduced on board cache memory directly in the chip: it was a real premiere that boosted performance. Pentium III enhanced further this concept by integrating on-die cache memory. So the Pentium 4 cache memory characteristic have also evolved: L1 cache memory now includes a 8 KB data cache (which is quite small when you know the PIII included a 16KB one) while the L1 Instruction Cache was renamed to Instruction Trace Cache since it has widely evolved too. The Pentium 4 L1 cache uses a four way set and uses 64 byte cache lines and due to its dual port design it can store data while loading it. The level 1 cache size reduction (an AMD Athlon comes with a 64Kb level 1 cache!) was probably caused by the Intel design's goal to enable a low latency of two clocks per cycles. Trace Cache memory now stores instructions after they are converted from x86 into micro-ops in the order they should be run, saving processor cycles if a bad branch prediction occurs (since the alternative solution is already stored in it). This also allows faster access to the most used instructions avoiding problems Pentium III may have with complex x86 instructions that were decoded with slow decoders. Trace Cache memory can stores 12,000 micro-ops which corresponds to an approximate size of 92 or 96 KB (Intel didn’t specify the exact size). Once µOPs are in the trace cache the Pentium 4 can easily check for dependencies to correctly achieve its branch predictions and ensure that the pipelines are continuously supplied with data: the trace cache can contains a whole pipeline with 6 µOps each 2 clocks. The L1 cache access speed is now about 1.4 nano seconds (twice as fast as Pentium III) and the bandwidth now reaches 41.7GB/s (against 14.9 for a Pentium III). L2 memory cache has also been enhanced. Like on Coppermine CPU, if the level 2 cache memory amount reaches 256KB it runs at the full frequency speed of the CPU (and not like on Pentium II or first Pentium III at a twice-slower speed than the nominal frequency of the CPU). As a reminder Level 2 cache memory enhances computer brut performance by approximately 25%. L2 Pentium 4 on die cache memory bandwidth now reaches 48.1 GB per second for a 1.5 GHz model, since it uses 128 bytes cache lines divided in two 64 bytes pieces reading at least 64 bytes of data in one pass, ensuring highest performance.



Pentium 4 Back


    To draw a perfect picture of the Pentium 4 we shouldn’t forget to say that it includes a micro-code ROM allowing users to upload new micro code ROMs in order to solve minor problems.  

A new Bus: Don't miss it!

Review Quotes
  "Pentium 4 Bus exchanges data with the rest of the system faster than ever removing one of the major bottleneck Pentium III had "  

Latest Pentium III processors use a 133Mhz –only- front side bus with a 1065Mbps bandwidth, that was a bit pale compared to the AMD Athlon ‘266’ one. The front side bus has always been a real strangulation for a high performance PC. With 400Mhz computer a FSB of 100 MHz was just sufficient but for a 1 GHz plus computer a 133Mhz was a bit weak. Intel has revamped it by introducing a 400 MHz front side bus using a Quad Pumped 64-bit bus where each level operates at 100 MHz for a global 3051 MB/s bandwidth. Intel used a technical trick so the FSB sends four 64-bit instructions per cycle making it work like a “400 MHz” normal one. Not only this new bus improves performances but it’s also the first one that lets a x86 processor exchanges data so fast between the CPU, the memory and the rest of the system components letting far behind the recent AMD EV6-bus.

The truth on the Pentium 4 Glitch

Review Quotes
  "Intel Pentium 4 and its i850 chipset still work considerably better than every other chipsets like VIA ones"  

According to a recent ZDNet story, the Pentium 4 glitch that delayed the worldwide launch of the processor wasn't totally fixed by Intel. What's the bug you'll ask? If an user attaches a second PCI graphic card to its computer, on certain rare situations due to the ICH2 component, the computer can slow down causing decreased processor speed and eventual data corruption. The risk of such a problem to occur is relatively small. Here at ActiveWin.Com we decided to make up our mind attaching to our test system an ATI All In Wonder 128 16Mb PCI card to the system working in combination with the Guillemot 3D Prophet II Ultra in dual screen mode (one Sony 17' computer screen and one Sony TV): we tested the whole system with Adobe Illustrator 9, some last generation games like Microsoft Crimson Skies, MechWarrior 4, Red Alert 2, etc. and we didn't encounter any freezes, slow downs, crashes, hangs, etc. If there's an issue with PCI cards the risk it affects you is strongly weak and you really don't have to be scarried about it. Workaround for this potential problem already exist such as using a Matrox G450 DualHead card that can handle two monitors at a time preventing the use of two physical graphic cards. Even with this very, very, very minor bug, the Intel Pentium 4 and its i850 chipset still work considerably better than every other chipsets like the VIA ones that have so many bugs you can't count them on your hands.


<-- Introduction

SSE2 & CPU Design -->


  *   *