Tuesday, August 21, 2001
Pentium 4: Where are we now
The Pentium 4 has had a difficult first year. Paired exclusively with RDRAM and positioned as a high-end consumer platform, the P4 has been plagued with a confusing performance profile, excessive platform cost premiums and a soft market for high-end consumer PCs. Initially, consumers were startled with the realization that slower speed processors were able to handily outperform the P4 on many popular benchmarks. In addition the P4 has suffered from the “Osborn” effect, resulting from Intel’s early disclosure of Northwood, an upcoming P4 package type change, and promises of speeds soon to exceed 2GHz. This cast much of the market into a prolonged ‘wait-and-see’ mode regarding the P4.
Despite its slow start, Intel is preparing an aggressive top-to-bottom strategy for the Pentium4 to be unveiled in the coming months. Intel is banking on a combination of aggressive P4 price cuts and higher clock speeds to accelerate the demise of the Pentium III and move the P4 into higher volume market segments.
But perhaps the most confusing aspect of Intel’s strategy is the absence of a mainstream DDR platform solution. During the last 10 months have potential P4 buyers been holding their breath for the availability of a PC133 platform? It does not seem so. Rather, it seems that the market window for PC133 in new high-end systems is nearly closed. Since 1999, DDR has been recognized as the memory of choice for fast graphics accelerators, and has been available for high-end Athlon and P3 systems since the beginning of this year. Now at virtual price parity with PC133, DDR seems ready to wear the mainstream crown.
While the 850+RDRAM appears to overshoot the market, the 845+PC133 seems to undershoot it. This awkward balance has been seen before in Intel’s product mix (in 1999) when it offered the unpopular 820+RDRAM platform for the high end and offered the 810+PC100 for the low end. Lacking a mainstream PC133 chip set solution, Intel lost 40% of its chip set market share to VIA within one year. VIA then leveraged its new high volume platform infrastructure to aid AMD to gain huge market share against Intel during the year 2000.
Fast forward to 2001… Now it seems that history may repeat itself, and Intel may again fail to satisfy mainstream market requirements in this pivotal window of opportunity for the P4. But VIA is brashly stepping up to the plate once more, intending to hit another home run early in the game with a new DDR chip set for the Pentium4.
Enter the P4X266
VIA’s new P4X266 chipset successfully leverages its popular north bridge design used for Athlon and P3 processors. With a familiar feature set and upgraded memory controller, the P4X266 should deliver the right price performance mix to allow the Pentium 4 to move into higher volume market segments.
Looking forward, we expect the release of the P4M266 later this year, a pin-compatible DDR north bridge with an integrated S3 graphics controller. This will further extend the reach of the P4 into low cost markets.
VIA chip set strategy for the P4 has the potential of making a strong impact on the growth of the P4 in the mainstream beginning this year. It would seem wise for Intel to enthusiastically endorse VIA’s strategy and support for the P4.
Intel’s 850 is equipped with dual 16-bit RDRAM channels providing 3.2 GB per second, matching the 400 MHz, 64-bit wide FSB of the Pentium 4. VIA’s P4X266 pairs Intel’s FSB with 266MHz DDR which delivers 2.1 GB/sec of bandwidth. While the market has eagerly anticipated a P4+DDR platform, some have wondered what effect DDR’s lower memory bandwidth would have on application benchmarks. After extensive testing, we are quite pleased with the performance of the P4X266, and we believe that the market will not be disappointed either. As VIA has emerged to challenge the incumbent 850 platform, we will first focus on benchmarks contrasting these two platforms. In addition, we will offer some synthetic test results for the Intel 845 PC133 platform. The test configurations for all three systems are shown below.
P4X266 vs. 850: Application Benchmarks
Sysmark 2000 is the most comprehensive application benchmark around, and it is still the easiest to interpret. It comprises 12 application workloads and allows a detailed comparison of individual runtimes. It provides us with a foundation for evaluating the impact of DDR on the Pentium 4. The bar chart below displays run time in seconds for each application test, with a shorter bar indicating a faster completion time.
The performance variations between the two platforms are barely perceptible in this graph, as they would be in a typical application setting. The table below reveals a more detailed comparison based on the best-case run time results for each of the applications. In 7 out of 12 applications (including Bryce and Windows Media Encoder), DDR delivers a superior performance (2-4%). But Photoshop stands out as one exception wherein RDRAM demonstrates an 11% performance advantage. Overall, the average performance delta between the platforms is only 0.2%, favoring the 850.
Business / Content Creation Winstone
Business Winstone and Content Creation Winstone measure the performance of popular Windows applications using a series of scripted activities. The benchmark script requires that all applications be open simultaneously in order to allow task switching between them. In this respect, Winstone mimics the task switching behavior that is common among sophisticated users.
Under these benchmark loads the latency advantage of DDR can be observed again. Even though the Content Creation benchmark includes a Adobe Photoshop 5.5 script, the overall performance balance favors DDR. This might suggest that that the Sysmark Photoshop test results are not totally conclusive, or that task switching and other loads favor DDR sufficiently to shift the balance in favor of DDR.
Viewperf is a respected MCAD style benchmark which measures 3D rendering performance across multiple platforms and operating systems. The benchmark is known to be very CPU and DRAM centric due to a significant amount of cache thrashing which requires the CPU to access large amounts of floating point data from main memory. In this benchmark, we observe that the 850 + RDRAM platform edges the VIA P4X66 platform by an average of 5.6% in the suite of 6 tests.
Quake3 & VGL Mark
Although the 850 + Rambus solution performs well in professional 3D benchmarks, the Pentium 4 has had somewhat limited penetration into the enthusiast community due to its restrictive ties to expensive Rambus memory. The VIA P4X266 will provide the Pentium 4 with a platform that can finally allow the P4 to move confidently into this space without compromising performance. While Athlon has become a strong favorite in this space, many enthusiasts may find a 2GHz processor difficult to resist, especially when combined with low-cost DDR SDRAM.
The results for the Quake3a and VGL Mark show the 850 and P4X266 virtually neck and neck in gaming performance.
Note: We have seen some scores from other sources reporting lower numbers for DDR on Quake3. Our configuration uses two banks of CL2 DDR to enable interleaving. We have seen lower scores when configured with a single bank of CL2.5 DDR. We believe that performance sensitive gaming enthusiasts will prefer the faster configuration that we used in our testing. The results below are completely reproducible.
The 3Dmark game scripts are also capable of demonstrating performance variations based on different DRAM types. But when comparing RDRAM vs. DDR we saw nearly identical performance levels – with DDR edging out RDRAM by a tiny margin of 0.2%.
With this we are convinced that even the gamer enthusiast will not be able to observe a performance loss when comparing the 850 to a performance tuned P4X266 DDR platform.
Sphinx 3 Speech Recognition Benchmark
Spinx-III is a large vocabulary, speaker independent, fully continuous speech recognition system based on the Hidden Markov Model algorithm. It was developed at the Carnegie Mellon University Computer Science Department (www.speech.cs.cmu.edu). It is designed to be good enough to use over the phone without any user or software training, and is proven to be more accurate than humans at transcribing large volumes of random speech (statistically proven in accuracy tests for generating television closed caption text streams). In many respects this is what people dream of when they imagine speech recognition, but the problem is that performance is completely bound by DRAM performance (as seen below).
This powerful application simulation benchmark must traverse a language database of approximately 18.5 MB while simultaneously performing roughly 95.8 million multiplications per minute of speech. The output of this test corresponds to execution time, so a lower score in this test indicates better performance. A benchmark score of 1.0 or lower indicates that the system is capable of real-time performance.
This most challenging DRAM performance benchmark also favors the VIA DDR platform over RDRAM by about 3%.
Synthetic benchmarks are often able to facilitate a more precise CPU and DRAM performance evaluation. We have run the P4X266 and the 850 through a few of the popular synthetic benchmarks to further evaluate the performance impact of DDR on a Pentium 4 platform.
During our P4X266 evaluation, we also received an 845 motherboard for a short time and were able to run it through the same set of synthetic benchmarks. We cannot help but observe that the market’s expectations for the 845 are extremely low. After only a few public reviews of this platform, it is broadly seen as imbalanced and inadequate.
While PC133 has won the DRAM popularity contest since 1999, it has already been surpassed by DDR memory which is now on its own evolutionary path to higher clock speeds and higher performance levels. The benchmark results for the 845 are tabulated below with those for the 850 and P4X266.
CLI Bench is a synthetic benchmark which tests a combination of single and multithreaded floating point functions. For this test we performed single threaded tests and tests using 4 threads. The results for all tests were averaged for each platform configuration and tabulated below. We have created two charts, each showing the VIA platform compared to the 850 and 845.
The P4X266 demonstrates a consistent but modest performance advantage against both the 850 and 845 platforms (generally 1-10%). The Memory Throughput numbers are perhaps the most interesting on this graph. While we might have expected the 850 to show a slight performance advantage over the P4X266, it does not. The P4X266 does however show a 40% memory throughput advantage over the 845 as anticipated.
WinTune executes a series of read, write and copy functions to evaluate many aspects of DRAM performance. The reports generated by WinTune allows us to learn a bit more by breaking down the read, write and copy performance in separate figures. The results show a remarkable advantage for Rambus particularly on Writes and Copies. VIA’s DDR platform however enjoys a positive 7.4% performance advantage in Reads.
This data is worth interpreting. These figures help us to understand why the 850 and P4X266 are in a virtual dead heat on application benchmarks. Under normal application loads, CPU performance is impacted most by Read performance. Writes often do not fall into the critical performance path. But one notable exception is Photoshop which has a higher proportion of memory write or copy traffic than most applications. This can give RDRAM a performance edge under some unique circumstances (such as Photoshop).
Latency2.exe shows the latency of accessing data blocks of various size. This benchmark essentially measures the latency in ticks or clock cycles to access differing sizes of memory ranging from 1 KB up to 4096 KB. We have eliminated the smaller array sets, as they are able to fit easily inside the CPU’s cache and show no appreciable difference between platforms.
As in most latency sensitive benchmarks, DDR holds an edge over the highly serialized Rambus memory. As anticipated, this chart demonstrates a moderate 0 to 3% latency performance advantage over Rambus. The 845 in contrast with the 850 does not fare as well against the P4X266 with an over 60% performance hit.
In this case as in many others, memory benchmarks (even those that focus on latency) must test a combination of bandwidth and latency. When bandwidth performance is completely inadequate (as in the case of the 845) the apparent latency of memory also seems to be negatively impacted. However, we know that the device latency of SDR is nearly identical to DDR when evaluated at the component level.
In the ceaseless debate over DRAM types, these benchmark scores only provide part