Wednesday, August 4, 2010

PCI Express 3.0: By This Time Next Year

After an unfortunate series of untimely delays, the folks behind PCI Express 3.0 believe they've worked out the kinks that have kept next-generation connectivity from achieving backwards compatibility with PCIe 2.0. We take a look at the tech to come.
Moore's Law states that the amount of transistors which can be placed on a chip will double every two years. This has often been misinterpreted as a statement that processor speed will double every two years. It’s a misinterpretation that the computer-buying general public has turned into an expectation of exponentially-scaling PC performance.

However, as you’ve undoubtedly noticed, shipping processors have been stuck between 3 GHz and 4 GHz for about six years now. So, the computer industry has had to find other ways to make data move faster. One of the most important of those ways has been maintaining balance between platform components using PCI Express, the open standard technology that enables high-speed graphics cards, expansion cards, and other onboard computer components. It’s at least arguable that PCI Express is as important to scalable performance as multi-core processors. Although dual-core, quad-core, and hexa-core CPUs can only be adequately used by applications optimized for threading, every program installed on your machine can and will touch components attached via PCI Express in some way.

Many industry observers originally expected motherboards and chipsets based on next-generation PCI Express 3.0 to appear in the first quarter of 2010. Unfortunately, problems with backward compatibility delayed the launch of PCI Express 3.0, and as we enter the second half of this year, we’ve been left waiting for official word on the new standard's release.

Finally, following a conference call with PCI-SIG (the Special Interest Group that oversees the PCI and PCI Express standards), we at last have some answers.

Al Yanes, president and chairman of PCI-SIG, and Ramin Neshati, chairman of the PCI-SIG Serial Communications Workgroup, addressed the current timeline of PCI Express 3.0 development.

On Wednesday, June 23rd, PCI Express 3.0 revision 0.71 was released. Yanes stated that the 0.71 release marks what is believed to be the resolution of the backward compatibility issues that had caused an initial delay. Neshati described the primary compatibility issue as “DC wandering,” which he explained as PCI Express 2.0 and earlier devices “not having enough zeros and ones” to satisfy the demands of the PCI Express 3.0 interface.

Now that the backward compatibility issues are resolved, the PCI-SIG says it is on track for the base release of the 0.9 revision “later this summer.” This is expected to be followed by the base release of revision 1.0 in the fourth quarter of this year.

Of course, the most pressing question is when can we can expect to see PCI Express 3.0-based motherboards on store shelves. Neshati stated that he expects to see initial products in the first quarter of 2011 (the “FYI” triangle on the Timeline image).

Neshati added that there would be “no silicon-impacting changes” (the only changes would be software/firmware-related) between revision 0.9 and revision 1.0, which is what will allow some products to start to trickle into the marketplace before the final release of the 1.0 revision. During this time, products will be able to qualify for the PCI-SIG “Integrator’s List” (the “IL” triangle), which is PCI-SIG’s version of an approval logo.

Neshati jokingly referred to the third quarter of 2011 as the “Fry’s and Buy” date (an apparent reference to and either or Best Buy). This is the time in which we can expect to see a large selection of PCI Express 3.0-based merchandise for sale on the Web and in retail stores.

The primary difference for end users between PCI Express 2.0 and PCI Express 3.0 will be a marked increase in potential maximum throughput. PCI Express 2.0 employs 5 GT/s signaling, enabling a bandwidth capacity of 500 MB/s for each “lane” of data traffic. Thus, a PCI Express 2.0 primary graphics slot, which typically uses 16 lanes, offers bidirectional bandwidth of up to 8 GB/s.

PCI Express 3.0 will double those numbers. PCI Express 3.0 uses an 8 GT/s bit rate, enabling a bandwidth capacity of 1 GB/s per lane. Accordingly, a 16-lane graphics card slot will have a bandwidth capacity of up to 16 GB/s.

On the surface, the increase from 5 GT/s to 8 GT/s doesn’t quite sound like a doubling of speed. However, PCI Express 2.0 uses an 8b/10b encoding scheme, where 8 bits of data are mapped to 10-bit symbols to achieve DC balance. The result is 20% overhead, cutting effective bit rate.

PCI Express 3.0 moves to a much more efficient 128b/130b encoding scheme, eliminating the 20% overhead. So, the 8 GT/s won’t be a “theoretical” speed; it will be the actual bit rate, comparable in performance to 10 GT/s signaling with 8b/10b.

PCI-SIG states that it chose the route of eliminating overhead instead of increasing to 10 GT/s because “8 GT/s represents the most optimal tradeoff between manufacturability, cost, power, and compatibility.” The group further states that bumping the speed to 10 GT/s creates “prohibitive penalties” including “design complexity and increased silicon die size and power.” PCI-SIG’s Al Yanes added, “The magic is in the electrical stuff. These guys have really come through for us.”

I asked Yanes what devices he anticipates will require the increase in speed. He replied that these will include “PLX switches, 40 Gb Ethernet, InfiniBand, solid state devices, which are becoming very popular, and of course, graphics.” He added “We have not exhausted innovation, it’s not static, it’s a continuous stream,” clearing the way for even more enhancements in future versions of the PCI Express interface.


AMD is already integrating support for SATA 6Gb/s into its 8-series chipsets, and third-party motherboard vendors are adding USB 3.0 controllers. Intel is lagging behind in this area, with no chipset support yet for either USB 3.0 or SATA 6Gb/s (Ed.: Note that pre-production P67-based motherboards we've seen here in the lab do incorporate SATA 6 Gb/s support, but lack USB 3.0). However, as we’ve often seen in the AMD versus Intel saga, innovation at AMD usually inspires Intel. Given the data rates for both next-generation storage and peripheral interconnects, it's clearly not necessary to drop either technology onto PCI Express 3.0. Rather, a single lane of second-generation PCI Express is ample for both USB 3.0 (at 5 Gb/s) and SATA 6 Gb/s (which no storage device can even come close to saturating).

Of course, when it comes to storage, the interaction between drives and controllers is only part of the equation. Consider that dropping multiple SSDs on a SATA 6 Gb/s chipset and creating a RAID 0 array does actually have the potential to saturate the single lane of second-gen PCI Express that most motherboard vendors are using for their implementations. Deciding whether USB 3.0 and SATA 6 Gb/s can truly utilize PCI Express 3.0 support requires a closer look at the math.

As mentioned, USB 3.0 runs at 5 Gb/s. But as with PCI Express 2.1, USB 3.0 employs 8b/10b encoding, which lowers the actual peak speed to 4 Gb/s. Divide bits by eight to convert to bytes, and you get a peak throughput of 500 MB/s, which is the exact same speed as a modern PCI Express 2.1 lane. SATA 6Gb/s runs at 6 Gb/s of course, but its own 8b/10b encoding scheme drops the peak rate from a theoretical 6 Gb/s to an actual speed of 4.8 Gb/s. Again, convert that to bytes and you get 600 MB/s, or 20% more than the peak speed of a PCI Express 2.0 lane.

The problem here is that even the fastest SSDs cannot fully saturate a SATA 3 Gb/s connection. Nothing comes close to saturating a USB 3.0 connection, and the same holds true for the latest iteration of SATA 6Gb/s. At least as far as we're concerned today, PCI Express 3.0 isn't really a necessity for driving the biggest buzzwords in the platform space. Hopefully, as Intel shifts into its third generation of NAND flash manufacturing, however, speeds increase and we start to see devices capable of pushing beyond what a 3 Gb/s SATA port could have sustained in the past.

Our own experience with testing graphics throughput, after the introduction of PCI Express 2.0 and as recently as earlier this year, have revealed that it’s very difficult to saturate the x16 bandwidth currently available on PCI Express 2.1 motherboards. It really takes a multi-GPU configuration or a very high-end single-GPU card to distinguish the difference between a x8 and a x16 connection.

We asked both AMD and Nvidia to comment on the need for PCI Express 3.0 as an enabler for the next generation of graphics card performance. An AMD spokesperson replied that they weren't able to comment at this time.

A spokesperson from Nvidia was a little more forthcoming: “Nvidia is a key contributor to the industry’s development of PCI Express 3.0, which is expected to have twice the data throughput of the current generation (2.0). Whenever there is a major increase in bandwidth like that, applications emerge that take advantage of it. This will benefit consumers and professionals with increased graphics and computing performance from notebooks, desktops, workstations, and servers that have a GPU”.

Perhaps the key phrase here is “applications emerge that take advantage of it.” Nothing in the world of graphics is getting smaller. Displays are getting larger, high definition is replacing standard definition, the textures used in games are becoming even more detailed and intricate. We do not feel that the need exists today for the latest and greatest graphics cards to sport 16-lane PCI Express 3.0 interfaces. But enthusiasts have seen the same story again and again: the progression of technology paves the way for new ways to take advantage of fatter pipes. Perhaps we'll see a surge of applications that make GPU-based computing more mainstream. Or maybe the performance hit experienced when you run out of frame buffer and swap to system memory will be diminished on more mainstream boards. Either way, we have to look forward to the innovation that PCI Express 3.0 promises to AMD and Nvidia.

AMD and Intel have never been particularly chatty when it comes to detailing the interfaces they use to communicate between chipset components, or even been logic blocks within a northbridge/southbridge. We know the data rates at which those connections run, and we know that they're generally designed to be as bottleneck-free as possible. Sometimes we even know where a certain piece of logic came from, such as the Silicon Logic-based SATA controller AMD used in its SB600. But we're often kept in the dark as to the technology used in building the bridge between components. PCI Express 3.0 certainly presents itself as a very attractive solution, similar to the A-Link interface AMD employs.

The recent emergence of USB 3.0 and SATA 6Gb/s controllers on a number of third-party motherboards may provide a glimpse into this process. Because Intel's X58 chipset does not provide native support for either technology, companies like Gigabyte had to integrate discrete controllers onto their boards using available connectivity.

Gigabyte’s EX58-UD5 motherboard did not have USB 3.0 or SATA 6Gb/s. However, it did include a x4 PCI Express slot:

Gigabyte replaced the EX58-UD5 with the X58A-UD5, which has support for two USB 3.0 and two SATA 6Gb/s ports. Where did Gigabyte find the bandwidth to support the new technologies? By using one lane of PCI Express 2.0 connectivity for each controller, cutting back on available external connectivity while adding functionality to the board, overall.

Besides the addition of support for USB 3.0 and SATA 6Gb/s, the only other real difference between the two motherboards is that the newer offering had its x4 slot removed.

Will PCI Express 3.0, like the standards that preceded it, wind up serving as an enabler of future technologies and controllers that won't make it into the next generation of chipsets as integrated features? Almost certainly.

We are entering an age of the desktop supercomputer. We have access to massively parallel graphics processors, along with power supplies and motherboards that can support as many as four cards at the same time. Nvidia’s CUDA technology is transforming the graphics card into a tool for programmers working not only with games, but with science and engineering. The programming interface has already played an instrumental role in solutions for enterprises as diverse as medical imaging, mathematics, and oil and gas exploration.

nVidia’s CUDA showcasenVidia’s CUDA showcase

I asked OpenGL programmer Terry Welsh, from Really Slick Screensavers, for his thoughts on PCI Express 3.0 and GPU processing. Terry told me “PCI Express was a great boost, and I'm happy with them doubling the bandwidth anytime they want, as with 3.0. However, for the types of projects I work on, I don't expect to see any difference from it. I do a lot of flight-sim stuff at work, but that's mostly bound by memory and disk I/O; the graphics bus isn't a bottleneck at all. I can easily see [PCI Express 3.0] being a big boost, though, for GPU compute applications, and people doing scientific viz on large datasets.”

The explosion Easter Egg in Terry Welsh’s Skyrocket Screensaver The explosion Easter Egg in Terry Welsh’s Skyrocket Screensaver

The ability to double transfer speed when working with mathematics-intensive workloads is sure to enhance both CUDA and Fusion development. This is one of the most promising areas for the upcoming PCI Express 3.0 interface.

Any gamer with an Intel P55 chipset can tell you about the advantages and disadvantages of P55 versus Intel's X58 chipset. Advantage: motherboards employing using the P55 chipset are more reasonably-priced than those using X58, on average. Disadvantage: P55 comes equipped with minimal PCI Express connectivity, instead relying on Intel Clarkdale- and Lynnfield-based CPUs with 16 lanes of second-gen PCIe built into the processor itself. Meanwhile, X58 leverages 36 lanes of PCI Express 2.0.

For P55 customers who want to use two graphics cards, both boards are forced down to x8 signaling rates. If you want to add a third card to a P55-based platform, it'll have to occupy the chipset's connectivity, which unfortunately runs at first-gen signaling rates and is limited to a maximum of four lanes on a board with the corresponding slot.

When I asked Al Yanes of the PCI-SIG group how many lanes we could expect to see in PCI Express 3.0-enabled chipsets from AMD and Intel, he responded that this was “proprietary information” that he “could not discuss.” I didn’t really expect an answer. But still, given the opportunity, the question had to be asked. We feel it’s unlikely that AMD and Intel, both members of the PCI-SIG Board of Directors, would invest time and money into PCI Express 3.0 development if they planned on using PCI Express as an excuse to reduce lane counts. Thus, we feel it’s far more likely that future AMD and Intel chipsets will continue to employ segmenting similar to what we see today, with high-end platforms sporting enough connectivity to support a pair of graphics cards at native x16 signaling, and more mainstream chipsets shaving off PCIe from there.

Picture a chipset like the P55, but with 16 available PCI Express 3.0 lanes. Since these 16 lanes run at twice the speed of PCI Express 2.0, you'd actually be getting the equivalent of 32 lanes. Then, it'd just be a matter of a company like Intel making its chipset compatible with three- and four-way GPU configurations. Unfortunately, we already know that Intel's next-generation P67 and X68 chipsets will still be limited to PCIe 2.0 (and the Sandy Bridge CPUs will similarly be limited to 16 lanes of on-die connectivity).

In addition to CUDA/Fusion/parallel processing, the expansion of mainstream capabilities through higher-bandwidth interconnects like PCI Express 3.0 is where we see the technology's true potential emerging. Without question, PCI Express 3.0 will enable moderately-priced motherboards with interfaces that were limited to high-end platforms in the previous generation. Those high-end platforms, armed with PCI Express 3.0, will naturally set new performance records, thanks to innovations in graphics, storage, and networking that exploit the available throughput.

No comments:

Post a Comment