The Active Network
ActiveWin: Reviews Active Network | New Reviews | Old Reviews | Interviews |Mailing List | Forums


Product: GeForce 4 Ti 4600
Company: NVIDIA
Estimated Street Price:
Review By: Julien Jay

GeForce Ti 4600 GPU

Table Of Contents
1: Introduction
2: GeForce4 Ti 4600 Technology Explanation
3: GeForce 4 Ti 4600 Technology Explanation 2
4: GeForce 4 Ti 4600 Technology Explanation 3
5: nView
6: Direct 3D Benchmarks
7: OpenGL Benchmarks

   Engraved in 0.15µ and still manufactured by TSMC, this brand new GeForce 4 GPU includes 63 million transistors against 57 million for the GeForce 3 Ti 500. This compares to the Pentium 4 Northwood’s 55 million transistors. In terms of speed, the GeForce 4 Ti 4600 is clocked at 300 MHz for the GPU and 325 MHz for the DDR-SDRAM memory. The GPU is seconded by 128 MB of 2.8ns 128-bit DDR-SDRAM. The features of this new GPU are listed in the table below:

  • 63 million transistors
  • Manufactured in TSMC's .15 µ process
  • GPU clocked at 300 MHz
  • Memory clocked at 650 MHz
  • 128 MB frame buffer by default
  • AGP 2x/4x
  • nfiniteFX II engine
  • Accuview Anti Aliasing
  • Light Speed Memory Architecture II
  • nView

The theoretical performances of the chip are as follow: 

  • Vertices per Second: 136 Million
  • Fill Rate: 4.8 Billion AA Samples/Sec
  • Fill Rate: 1200 Mpixels/Sec
  • Operations per Second: 1.23 Trillion
  • Memory Bandwidth: 10.6GB/Sec.
  • Maximum Memory: 128MB

The GeForce 4 Ti 4600 GPU features only one pixel shader like the GeForce 3 but also includes a supplementary vertex shader for a total of two vertex shaders as well as many new features that we’ll see in detail below. In order to further improve performance, NVIDIA added many new optimizations, fine tuning and tweaking to the GeForce 4 in order to boost treatment times for an overall better performance thus not requiring ground breaking new technology. This new approach was already present in the GeForce 3.

GeForce 4 Ti 4600 GPU Die

GeForce 4 Technology Explanation
LightSpeed Memory Architecture II

   The new enhanced Light Speed Memory Architecture II is aimed to optimize the memory’s bandwidth for a better and more realistic gaming experience. This new architecture includes six unique technologies responding to the sweet names of ‘Crossbar Memory Controller’, ‘Quad Cache’, ‘Z-Occlusion’, ‘Lossless Z Compression’, ‘Auto Pre-Charge’ and ‘Fast Z-Clear’.

CrossBar Memory Controller

   Just like the GeForce 3, the GeForce 4 uses a new memory controller called CrossBar whose main task is to widely optimize the fillrate of the chip by avoiding bit wasting, thus reducing latency times. Traditionally a GPU uses a 256-bit memory controller that can transfer data only in 256-bit. So if a triangle is only one pixel in size it requires a memory access of 32 bytes when only 8 bytes are in fact required: more than 75% of the memory bandwidth is wasted with this process! That’s why NVIDIA intelligently solved the problem by implementing the new CrossBar controller. Unlike yesterday’s GPU, the CrossBar controller has four independent wide memory sub-controllers that can treat 64 bit blocks per clock for a global total of 256 bit (it can also group data to treat them entirely in 256 bit). This new memory controller is the key for better memory management in order to answer today’s game developers’ needs: complexity of 3D scenes (the number of triangles per frame has widely increased in recent games). Comparing to a traditional memory controller, the CrossBar cuts the average latency down to 25%. That way any 3D applications can take benefit of this marvel of technology. According to NVIDIA, the CrossBar controller can speed up memory access up to four times. It’s obvious that data that are about to be written or read make only 64 bits, but hopefully this situation is far from being an everyday occurrence. The difference between GeForce 3 and GeForce 4 Crossbar Memory Controllers is the algorithm they used. For the GeForce 4 the load balancing algorithms of the Crossbar Memory Controller have been streamlined for a more efficient use of the memory between the 4 partitions.

GeForce 4 Ti 4600 CrossBar Memory Controller

Quad Cache

   The Quad Cache is a brand new cache memory sub-system that regroups four distinct memory caches. Each of the four memory caches is dedicated to achieve a specific task: one is in charge of the primitives, one for the vertex, one for the texture, and one for the pixel. NVIDIA doesn’t disclose the size of each cache memory. 

Once data has been processed by the GPU, a small quantity of data or the result of a calculation is stored in each cache with the enormous advantage to be instantly available for the GPU. Even if the GPU needs some of the information calculated before in order to render the next scene, it’ll retrieve this information from the quad cache rather than searching the whole memory to find back this data. We can detail the use of the Quad Cache like this: 

  • Vertex Cache: This cache stores vertices after they are sent over the AGP bus. It makes the AGP more efficient by making sure there are no multiple transmissions of the same vertices. 

  • Primitive Cache: This cache stores information issued from the operation that assembles vertices into fundamental primitives. 

  • Dual Texture Cache: This feature was already present in the GeForce 3 but its algorithm has been slightly improved making it more efficient with multi texturing or high quality filtering.  

  • Pixel Cache: Located at the end of the processing pipeline this cache is a coalescing cache. It waits for a certain quantity of pixels to be drawn before writing them to the memory using burst modes.

Auto Pre-Charge

   Typically the information that is stored in a memory, no matter what kind of memory is being used, is always identically organized in banks. The problem of this architecture appears if the GPU needs to access information contained in another bank other than the one that is currently opened. To do so, the memory should close the bank actually used and then pre-charge and enable the new bank to give the GPU the information it needs. It results in a dramatic loss of performance since all the operations described above take ten clock cycles to complete while the GPU does nothing except wait.  

GeForce 4’s Auto Pre-Charge feature is able to automatically pre-charge a memory bank that isn’t currently used according to a certain prediction algorithm in order to boost performance. Sure the activation phase is still required but it will take only 2 or 3 clock cycles.

Fast Z-Clear

   After each rendering of an image from a 3D scene a conventional GPU should erase the Z-Buffer. Usually this is done by writing a value of 0 for each pixel in the frame buffer. Fast Z-Clear sets a flag corresponding to a specific area of the Z-Buffer and fills the flag with zero rather than filling the whole frame buffer. This technology has nothing new and is already present in ATI Radeon GPU. It is supposed to speed up displaying times by up to 10%.   

« Introduction GeForce 4 Ti 4600 Technology Explanation 2 »


  *   *