Friday, October 17, 2008
Temperature Test on DIMM Memory: Key to Customer Confidence
At the Computex Show in Taipei this year, several memory companies were showing their memory burn-in system. Either it was a portable system or refrigerator size, the underline message says:”Our modules are the most reliable for your computer and server”.
Photo of a DDR2 Burn In System on Display at the Computex Show 2008
Mission critical server operational environment
In the server space, there is no room for mistake and down time. Server computers are placed at data centers with constant temperature and humidity. Power is on 24/7 and operation is non-stop. Memory modules used are either “fully buffered DIMM” or “register buffered DIMM”. These techniques are to distribute the memory chip loads so as to allow more memory to be populated in each system. As the number of modules increases in the system, chances for error increase. Built-in ECC (Error Correction Code) automatically detects error bit and corrects it as nothing had happened. In the extreme case of hardware failure, some server even has mechanism for memory modules “hot swap”. That is to allow the replacement of memories one at a time without turning the system off.
Since these server computers resides at remote computer centers, dispatching a technician to replace faulty memory is at a big expense. Therefore, owners are usually willing to spend a little more upfront for quality memory modules instead of paying for the service later.
Photo of the HP Proliant Servers
Burn-in and Bathtub curve
Since memory modules look the same, it is no way to judge the reliability of the module from its appearance. Customers can only lien on brand recognitions. As for second tier memory suppliers, they can only build on media rating and user reviews. One of the ways to demonstrate reliability is their Burn-in process.
As we know that semiconductor ages at elevated temperature. Most semiconductor failure happens in it first trimester of operation. Elevated temperature can accelerate these phenomena. Reputable memory manufacturers generally put their server module under elevated temperature burn-in for 72 hours to rid of possible infant mortality.
Typical Bath Tub Curve of any Semiconductor Device or Module.
However, there are three kinds of memory burn-in strategies. First is the “static burn-in” when memories are put into a temperature chamber at maximum rated temperature (70 or 80 deg. C). The memory is connected with power only but no input signal nor toggling activities. This is the most common practice and does not guaranty effective result.
The second strategy is “low frequency operational burn-in”. The memory is activated at reduced frequency to save the cost of high frequency tester and equipment. For example: an 800MHz DDR2 memory would be tested under 20MHz rate at burn-in. This method, again, cannot guaranty the final quality outcome.
There is also the “Dynamic Burn-in “ that the memories are under full frequency operations. In order to save equipment cost, the signal is generated by only one tester. The signals are buffered and distributed to each burn-in socket. While the forward write signals arrive at each module at the proper timing relationship, the read signals are activated from the module but not piped back to the tester for analysis. This is called “dumb read back” method. Under these conditions, the module is fully exercised as it is in a real system but the read back signal is not tested by the tester. This achieves dynamic burn-in conditions yet without paying the full cost of an expensive multi-tester system.
Photo of a DDR2 Burn In System Developed by CST
Heat inside an operating computer
In order to gauge the effect of the burn-in (or temperature test), we have to examine the heat conditional inside an operating computer. There are three critical temperature parameters. The “ambient temperature”; which refer to the air temperature surrounding the chip under test. It is usually 70 degree Celsius in an industrial environment or 80 degree Celsius in a military environment.
There is also the “case temperature” that refers to the actual temperature measured at the outside package of the memory device. This temperature is supposed to be the same as the ambient temperature provided the ambient air is sufficiently circulated. Then there is the “junction temperature” which refers to temperature at the junction of the semiconductor. This usually range from 15 to 25 degree Celsius over the case temperature and is depending on the design of the semiconductor circuit and geometry.
New generation of memory modules also have temperature sensor built-in at the register chip or at the SPD (Serial Presence Detect) chip. Dynamic memory (DRAM) requires more frequent refresh at high temperature which in return increases the heat dissipation and generate even more heat. Monitoring the module temperature through the sensors can allow the system to throttle back the clock frequency and thus reduces the operating temperature.
How to deliver reliability and value?
Once the memory module is burn-in for 72 hours, a post test is required to ensure the module is still fully operational. To ensure the module can still operate robustly in the system, a final elevated temperature post-test is usually required. This temperature test is done with a small “Rapid Heat” chamber. There are two kinds of rapid heat chambers available.
The Thermonics forced air system works with a heat cup concept to put forced heated air onto the memory module under test. This method is good but is bulky and expensive.
The second method is to use a regulated halogen lamp heater. It generates rapid heating that can attend set temperature within 3 to 5 seconds. The principle is to modulate halogen lamps at a low on/off frequency to regulate the small chamber temperature. The benefit of the regulated halogen heater is rapid heat/rapid cool at low cost.
To watch a video footage of the CST - Eureka2 Burn-In Test system , click this link :
Temperature sensor calibration/test
With the new temperature sensor on modules, rapid heating source is a must. Every module will have to go through a heating process to determine the operation and accuracy of the thermal sensor on the module.
Temperature test on server module is key to success. CST, Inc. (simmtester.com), with its reputation and experience has all the necessary memory module temperature test system to help you gain confidence from your customers.
Copyright © 2008 CST, Inc. All Rights Reserved