Tuesday, June 25, 2024

Industry News
CST News
Tester FAQs
Industry News

AI Servers Transitioning Towards Liquid Cooling

Friday, May 17, 2024

Generative AI has ushered in numerous innovative applications, propelling the demand for data center computing, thereby catalyzing the evolution of thermal architecture. Despite the maturity of air-cooling technology, its heat dissipation ceiling stands at approximately 10~15kW. This limitation has prompted a shift towards hybrid solutions such as “water-cooled + air-cooled” and liquid-cooled dissipation systems.

Recognizing this trend, the tech industry advocates for expedited governmental regulations on Power Usage Effectiveness (PUE) to align with global standards.

Currently, air-cooling fan systems remain the primary thermal solution in servers, leveraging the maturity of the technology and its cost-effectiveness, especially in non-high-speed computing setups. However, the escalating power demands of chips necessitate larger volumes, posing challenges for space-constrained server rooms (e.g., data centers at the edge), particularly those in edge data centers. Moreover, as chip power surpasses certain thresholds, fan energy consumption escalates, leading to noise issues and often requiring additional air conditioning, thereby compromising overall energy efficiency and PUE compliance.

AI Computing Power Surging, “Side Car” as Transition

Edward Kung, president of Taiwan Thermal Management Association (TTMA) and Liquid cooling cold plate work group lead at Intel, notes that with the rapid rise in AI computing, liquid-cooled solutions have become increasingly imperative, particularly for CPUs/GPUs generating 500W or more of heat.

However, environmental constraints in existing server rooms hinder the swift adoption of liquid cooling solutions. Without significant cabinet architecture modifications, Kung suggests many vendors may adopt a “water cooling + air cooling” approach (i.e., water-to-gas side car) capable of dissipating up to 80kW. For heat dissipation demands exceeding 100kW, solutions involving cabinets with dual sets of side cars may suffice. Nevertheless, such solutions present size challenges. Thus, as environments become conducive, high-speed computing will gravitate towards denser cold plate liquid cooling and immersion solutions.

Challenges Persist for Cold Plate and Immersion Liquid Cooling

Cold Plate Liquid Cooling offers superior heat dissipation, making it ideal for high-density, high-speed computing applications. It reduces energy consumption, enhances data center sustainability, and boasts a mature ecosystem compared to Immersion Liquid Cooling, with fewer material compatibility issues. PUE can range from 1.2 to 1.08. However, concerns linger regarding leakage, as achieving optimal energy efficiency necessitates liquid piping through each hardware component, augmenting design and construction costs.

Immersion liquid cooling similarly enhances energy efficiency without internal system mechanisms limiting heat dissipation. Unlike cold plate liquid cooling, it eliminates water leakage concerns. PUE can reach 1.08 or lower, promoting sustainability. Nevertheless, the immersion industry ecosystem remains incomplete, with ongoing clarifications on liquid compatibility and warranty issues. While two-phase immersion cooling offers superior thermal capacity, environmental and sustainability requirements impede progress, leading to a focus on single-phase immersion solutions to enhance effectiveness.

Urgent Ecosystem Development Needed

In response to market trends, Intel has spearheaded efforts to advance and standardize the heat dissipation industry, collaborating with supply chain partners to develop cutting-edge solutions for sustainable data centers. Since 2019, Intel has been actively involved in establishing OCP specifications and standards for cold plate and immersion liquid cooling, alongside crafting reference design documents for global data center liquid cooling adoption. These efforts, in tandem with collaborations with cooling ecosystem partners, accelerate the integration of liquid cooling solutions into the global ecosystem.

In the cold plate realm, Intel concentrates on vital component specifications, leakage detection, and cabinet manifold designs. For immersion solutions, it focuses on liquid compatibility, design, and validation standards. Kung stresses that as AI server demand surges, liquid cooling technologies will proliferate, particularly benefiting HPC and AI servers. However, he underscores that while liquid cooling represents the future, not all solutions will exclusively rely on it, emphasizing the importance of aligning with customer specifications, expectations, and goals.

By: DocMemory
Copyright © 2023 CST, Inc. All Rights Reserved

CST Inc. Memory Tester DDR Tester
Copyright © 1994 - 2023 CST, Inc. All Rights Reserved