Tuesday, August 3, 2010

CORE Or Boost? AMD And Intel Turbo Features Dissected

CORE Or Boost? AMD And Intel Turbo Features Dissected
Intel arms its Core i5 and Core i7 CPUs with Turbo Boost. AMD's hexa-core Phenom II X6 chips sport Turbo CORE. Both technologies dynamically increase performance based on perceived workloads and available thermal headroom. Which one does the better job?
Automotive turbochargers increase torque and power output, which is why they're used to increase the air-fuel mixture rate per combustion cycle. AMD’s and Intel’s performance-improving technologies don't actually a require an additional piece of hardware bolted on like a turbo would be, but they both invoke the gas compressor namesake anyway.Instead, both companies' latest six-core models dynamically increase their clock rates to deliver better performance under workload conditions that allow for faster frequencies. We wanted to see whether Intel's Turbo Boost or AMD's Turbo CORE is the better implementation.

Intel was first to offer this performance-enhancing feature. Its Nehalem architecture and the Core i7-900 family first introduced Turbo Boost in late 2008. The technology is capable of accelerating all cores by one clock speed bin (133 MHz) and one or two cores by two speed increments (depending on the particular model). In 2009, the Lynnfield Core i5/i7 quad-core processors for LGA 1156 enabled a more advanced implementation able to accelerate one or two cores by four clock speed increments. The 800-series even bumps clock speed up by five clock speed bins for a single core. One speed bin equals 133 MHz at stock speed, so we’re effectively talking about a 133 to 533 MHz dynamic increase. Turbo Boost is also an available feature on the Clarkdale-based Core i5 dual-core chips.

We grabbed the latest AMD Phenom II X6 and Core i7-980X six-core processors to find out which implementation works best across our benchmark suite in terms of performance and power efficiency. Since the performance level of these two chips is rather different—Intel has more punch—we decided to compare benchmark results with and without the Turbo feature and normalize these to 100% for the non-Turbo results. This way we can compare the relative impact on the respective configurations despite the absolute performance difference. In short, which Turbo implementation gives you more bang for the buck?

Turbo CORE is available on all AMD Phenom II X4 and X6 processors based on the recent 45 nm designs, namely the Thuban six-core and seen-in-the-wild but not-yet-available-at-retail Zosma quad-core models. Should it ever see retail availability, the Phenom II X4 960T at 3.0 GHz nominal speed could accelerate two cores up to 3.4 GHz (+400 MHz) with the thermal headroom available, and if the application load demands the increase. The Phenom II X6 processors increase their clock speeds by 500 MHz, with the exception of the 1090T flagship, which adds 400 MHz to reach from 3.2 to 3.6 GHz.

This implementation can be considered an addition to the Cool’n’Quiet feature, which reduces clock speeds and voltages if there is little work for the processor to do. Once half of the cores are idle, the system reduces their clock speed to the Cool’n’Quiet minimum of 800 MHz. The next step is a voltage increase for the remaining active cores paired with a speed lift of up to 500 MHz, as explained above.

Unfortunately, few workloads would tax exactly three cores by 100%—the conditions needed for AMD’s solution to run at 3.6 GHz. We found that a two-core load scenario is more realistic. This is why the feature works better on a CPU with an even core count, such as the Phenom II X4 960T.

AMD’s Turbo CORE control allows Black Edition processor users to adjust their number of accelerated cores. This makes analysis more complex, but also gives enthusiasts a more powerful tool for fine tuning their systems.

Intel's implementation works best on processors with a lot of scalability inherent to their design, as Turbo Boost covers much broader clock speed ranges. For example, the new six-core "Gulftown," Core i7-980X, is already running close to its thermal ceiling under load. Thus, it's limited to a 266 MHz boost with a single core active, and a modest 133 MHz bump when two or more cores are active. Knowing that Intel’s overclocking headroom is sizable, this is really a pity for enthusiasts. After all, the Phenom II X6 can speed up three cores by up to 400 MHz using a 45 nm process.

Intel’s power gate transistors facilitate cutting power to individual cores. This allows the processor to actually disengage those cores from the overall power envelope, consequently "buying" the overhead needed to increase the remaining cores’ clock speed. The premise here is that fewer cores can run at higher clock speeds before they reach the same thermal output.

While AMD basically reduces clock speed and voltage for inactive cores, Intel can physically shut them down. In theory, this should result in lower power consumption and, paired with the ability to dynamically scale one or more cores up or down, a better overall performance result.

Intel has another advantage that should be mentioned. While AMD's six-core processors access 6 MB of shared L3 cache, Intel's architecture currently offers a massive 12 MB repository. If you switch off individual cores, the remaining active processing units can still access the full 12 MB L3. This should provide advantages for applications that work with limited data and use few threads.

3DMark, a synthetic benchmark, realizes a slight advantage from Intel's architecture and Turbo Boost.

PCMark Vantage clearly shows that Intel’s approach delivers performance gains while AMD’s Turbo Core doesn’t seem to help as much.

iTunes is single-threaded, and is better-accelerated on the Phenom II X6 with Turbo CORE enabled.

The same applies to Lame.

MainConcept is optimized to take advantage of multiple cores, so it benefits more from Turbo Boost, which can kick in even if many cores are taxed.

Once again, we see the multi-threaded advantage in HandBrake, where AMD's processor easily hits its limits on all six cores, preventing Turbo CORE from kicking in.

As expected, switching the Turbo features on or off doesn’t change idle power.

However, peak power increases under Turbo Boost and Turbo CORE. The differences are small, though.

The runtime for our full efficiency suite decreases a bit more on the Intel platform, as there are more applications taking advantage of Intel’s Turbo Boost implementation than AMD’s Turbo CORE.

Average power consumption is much higher on the AMD system with Turbo CORE enabled.

The total power used is exactly the same on the Intel system. This is interesting because the Core i7-980X with Turbo Boost is still faster. AMD’s Turbo CORE-enabled Phenom II X6 delivers more performance, but it requires more power to deliver it.

In the end, the Intel chip's efficiency stays constant. The total power used is exactly the same, but the average power is higher during the workload. As a result, the efficiency is identical. This is like reaching your destination faster in a car without changing your mileage per gallon. AMD’s Turbo implementation sacrifices power efficiency. Runtime decreases, but average power and total power used increase at a higher proportion.

We can only recommend that AMD and Intel continue implementing and developing their Turbo-oriented features. Both do their job in increasing performance. Since the two approaches are different, though, we found that their outcomes in real life are different, as well.

Let’s start with Intel. The six-core, 3.2 GHz Core i7-980X speeds up a single core by 266 MHz if a single-threaded application wants maximum performance, and it can accelerate all six cores by 133 MHz if thermal headroom allows. This is the main difference compared to AMD’s solution, because Intel's Gulftown design can accelerate single-threaded apps, as well as high-end applications. From a multi-core processing standpoint, Turbo Boost makes more sense than Turbo CORE, since all types of workload benefit when compared to nominal clock speed.

AMD’s Turbo CORE only knows one acceleration mode. It increases clock speed for three cores by up to 400 MHz in the case of the Phenom II X6 1090T 3.2 GHz six-core. This means that all applications that utilize no more than three cores experience immediate acceleration. In this case, we found that AMD's performance improvement is higher, as a 400 MHz upgrade is much more noticeable than Intel’s 133/266 MHz speed bump. The downside is nonexistent acceleration if four to six cores are being taxed.

No comments:

Post a Comment