The New Memory Architecture
As sketchy details of the R520 GPU were gradually leaked to the media over the past few months, one of the more intriguing rumors included the R520's memory architecture, which was supposed to have an 'internal 512-bit memory interface' while the external interface with the memory modules remained at the standard 256-bit. While that statement was a simplification of course, it was still a pretty good gist of the actual outcome. What ATI did was to recognize the shortcomings of the existing memory architecture in graphics cards and revamped it so as to increase its efficiency through the following means:
Finer 32-bit Memory Channels
The memory bus currently found in mid and high-end graphics cards are still 256 bits wide. This is usually divided into quad 64 bit channels (with four memory controllers of course), while the low-end cards usually have half this capacity. The problem with this design is that the cost of scaling the bus rapidly becomes very significant, as more wires are needed to expand it and this implies greater complexity in both the PCB design, signaling issues, power consumption and the number of connections with the GPU and even increase in pin outs too. Hence, ATI has wisely decided that while expanding this bus beyond the current 256 bits width would seem like a 'straightforward' solution, it is not practical now.
Instead, ATI has chosen to optimize the structure of the 256-bit memory bus on the Radeon X1800 GPU. Now the 4 x 64-bit channels have been subdivided further into 8 x 32-bit channels. The total still adds up to a 256-bit wide memory interface, so how does that help? Well, a smaller data request that takes up 32 bits for example, would now only occupy only one 32-bit channel. This compares more favorably to the previous memory structuring for the same data request that would have hogged one 64-bit channel by itself, even for a chunk of 32-bit data, and that's a potentially waste of memory bandwidth utilization. To use an everyday example, would you rather your service provider bill your mobile phone calls in blocks of minutes or seconds? Obviously, you'll end up paying less if you're charged by the second. The same applies here as the finer 32-bit channels would result in less wasted bandwidth and hence increasing the bus efficiency.
The Radeon X1800 GPUs adopt a 256-bit wide memory bus (eight 32-bit memory channels) while the Radeon X1600 and X1300 GPUs adopt a 128-bit wide bus (quad 32-bit memory channels).
Ring Bus Architecture
The top half of the illustration depicts the new 256-bit memory interface structure adopted by the Radeon X1800 GPU while the bottom half depicts the traditional structure.
So where then does the 512-bit internal memory interface fit in the picture? This is none other than ATI's new Ring Bus memory architecture, arguably one of the more radical features in the Radeon series and in graphics memory architecture in a while. In your typical memory architecture, a centralized memory controller handles all the requests for read/writes between the memory modules and a number of memory clients. These clients are all connected in some way or another to the controller. Again, scalability becomes an issue as GPUs get faster and wider, demanding an ever-increasing number of clients that must be connected to the controller to service the needs of these speedy GPUs. As the design gets denser and denser, complexity becomes the stumbling block to scale the speed of the GPU as well as the physical routing of these pathways between the memory controller and these clients. To overcome this issue and a vital design element that made the highly clocked Radeon X1800 and X1600 series a reality, ATI has adopted a Ring Bus architecture to solve it all. Take special note that this ring bus architecture does not extend to the Radeon X1300 series as its specifications do not necessitate a need for it and it saves costs for ATI's new low-end GPU. The Radeon X1600 GPU on the other hand, uses the Ring Bus architecture, but at 256 bits wide, it has half the bandwidth of the X1800 GPU.
The new Ring Bus architecture found on the Radeon X1800 and X1600 series; representation here is of the Radeon X1800 GPU.
As per the illustration, we'll be focusing on the Radeon X1800 GPU variant to explain this segment. The memory controller is still at the heart of the GPU, but it is now complemented with a 512-bit ring bus that's configured in a pair of 256-bit wide rings that operate in opposing direction flows. To simplify routing issues, the rings are all neatly arranged along the outer edge of the chip. While traditional memory architectures take the same route to read/write data on the memory devices (memory clients --> memory controller --> memory devices and vice versa), the new ATI memory architecture maintains this path for memory writes and read requests, but offloads requested data to the ring bus where it will travel between the ring stops until it is accessed by the memory clients (quite similar to how the Token Bus Ring LAN architecture works). The Radeon X1600 GPU is not much different with the exception of its 256-bit ring bus configured in dual 128-bit wide rings.
A typical memory read sequence on the Radeon X1800 GPU as highlighted in red.
ATI also says that the arbitration logic for the memory controller is highly programmable, meaning that it can be tuned for optimized performance for a particular application through the Catalyst A.I feature. Finally, the memory controller is also designed to handle newer, upcoming DDR memory technologies like GDDR4 and if this is true (well, nobody can actually verify this yet), ATI and can easily adopt this new standard for high-end graphics cards based on this same memory controller architecture when such graphics memory module are in mass production.
Overall, the new ring bus architecture working in conjunction with the tried and tested memory controller approach is a more extensible design that has greater throughput and frequency scaling overhead than previous memory architectures and should serve ATI well for the near future. Already, the Radeon X1800 XT is a testament to the design enhancements with core and memory clock speeds that have never been attained on other cards previously. It is also a vital for ATI because its top Radeon X1800 XT only has 16 pixel shader pipelines versus 24 of them on its direct competitor, hence it requires the processing frequency and memory throughput advantage to match or exceed the performance of the GeForce 7 series.