Liquid Cooling & Immersion Cooling: Next-Generation Thermal Solutions for High-Density AI Servers | CKY HVAC Engineering

The training and inference demands of AI large language models are growing exponentially, driving GPU power consumption from NVIDIA A100's 400 W all the way to H100's 700 W, and further to B200's 1,000 W^[1]. When a rack fully loaded with GPU accelerators jumps from the traditional 5-10 kW to 40-120 kW, conventional air-cooled systems have physically reached their limits. Liquid cooling is no longer a forward-looking option but an engineering necessity for AI-era data centers. This article provides an in-depth analysis from a systems engineering perspective, covering the technical classifications of liquid and immersion cooling, design essentials, hybrid architecture strategies, and the sustainability benefits of waste heat recovery.

1. Why Is Air Cooling No Longer Sufficient?

To understand the necessity of liquid cooling technology, one must first recognize the physical bottleneck of air cooling. The volumetric heat capacity of air is approximately 1.21 kJ/(m3K), while water reaches 4,184 kJ/(m3K) -- a difference exceeding 3,400 times^[2]. This means that at the same temperature difference and flow rate, water can remove far more heat than air.

Generational Leap in GPU Power Consumption

From the perspective of data center thermal engineering, the evolution of GPU power consumption is the driving force behind this technological paradigm shift:

NVIDIA A100 (2020): TDP 400 W, approximately 20-30 kW per rack, barely manageable with traditional in-row cooling
NVIDIA H100 (2023): TDP 700 W, DGX H100 system power consumption 10.2 kW, single rack can reach 40-60 kW
NVIDIA B200 (2024): TDP 1,000 W, GB200 NVL72 rack power consumption reaches 120 kW, manufacturer requires liquid cooling architecture^[3]

Rack Density and PUE Bottlenecks

When rack power exceeds 30 kW, air cooling systems must significantly increase airflow to maintain acceptable inlet temperatures. ASHRAE TC 9.9 defines A1 class equipment with allowable inlet temperatures of 15-32°C^[4], but in high-density scenarios above 60 kW, even with hot/cold aisle containment and in-row cooling, fan power consumption and duct pressure losses push PUE above 1.5 while generating noise exceeding 80 dBA. This not only increases energy costs but also makes it difficult for operations staff to work in the data hall for extended periods.

According to Uptime Institute's 2024 Global Data Center Survey, over 52% of operators have included liquid cooling in their construction plans for the next three years, compared to just 18% in 2020^[5]. The tipping point for liquid cooling has arrived.

2. Liquid Cooling Technology Classification and Comparison

Liquid cooling is not a single solution but rather a technology spectrum encompassing different cooling precision levels and engineering complexity. Based on the contact method between coolant and IT equipment, it can be divided into four categories:

Rear-Door Heat Exchanger (RDHx)

RDHx is the most gentle approach to introducing liquid cooling. It installs a water-cooled heat exchanger at the rear door position of the rack, removing heat from the hot exhaust air as it passes through. RDHx requires no modifications to any internal IT equipment structures, making it suitable for gradual upgrades of existing data halls. The Open Compute Project (OCP) has published standardized design specifications for RDHx^[6], with each rear door capable of handling approximately 30-50 kW of heat dissipation. However, RDHx is essentially a hybrid "air cooling + liquid cooling" model with limited capability for ultra-high-density scenarios exceeding 50 kW.

Direct-to-Chip Liquid Cooling (DLC)

DLC places cold plates directly on the GPU or CPU package surface, using circulating coolant to remove heat at the source. This is currently the most mainstream liquid cooling technology for AI data centers. NVIDIA GB200 NVL72 adopts a DLC architecture with coolant inlet/outlet temperatures of approximately 25-45°C^[3]. DLC has extremely high heat transfer efficiency, capable of removing 70-80% of chip heat, while the remaining memory, VRM, and drive cooling still requires a small amount of supplementary air cooling.

Single-Phase Immersion Cooling

This involves submerging the entire server motherboard in a non-conductive dielectric coolant, using liquid convection circulation to remove heat. The coolant remains in liquid state within the tank and transfers heat to facility cooling water through an external heat exchanger. Single-phase immersion can handle 100-200 kW of cooling per tank and virtually eliminates fan noise^[7].

Two-Phase Immersion Cooling

Two-phase immersion utilizes low-boiling-point dielectric fluid that undergoes phase change (liquid boiling to gas) at the chip surface, absorbing large amounts of heat through latent heat of vaporization. The gaseous coolant rises to the condenser at the top of the tank, condenses, and flows back down, forming a self-driven cooling cycle. Two-phase immersion offers the highest cooling efficiency among all liquid cooling technologies but has the strictest requirements for coolant purity and system sealing.

Comparison of Four Liquid Cooling Technologies

Technology Type	Per Rack/Tank Cooling	IT Equipment Modification	Water in IT Space	Relative Cost	Applicable Scenarios
RDHx	30-50 kW	None	Rear door only	Low	Existing facility upgrades
DLC	50-120+ kW	Cold plate installation	Yes (manifold to rack)	Medium	AI/HPC new builds or expansions
Single-Phase Immersion	100-200 kW	Motherboard redesign	Full immersion	High	Ultra-high density / edge computing
Two-Phase Immersion	100-250+ kW	Motherboard redesign	Full immersion	Highest	Extreme density / research

3. Direct Liquid Cooling (DLC) System Design

DLC is currently the most widely deployed liquid cooling solution in AI data centers, with system design encompassing the Coolant Distribution Unit (CDU), coolant selection, manifold piping, and leak detection as key elements.

Coolant Distribution Unit (CDU)

The CDU is the heart of a DLC system, responsible for transferring cooling capacity from facility-side cooling water to the IT-side coolant loop through plate heat exchangers while maintaining stable pressure, flow rate, and temperature in the IT-side coolant loop. A typical CDU includes:

Plate heat exchanger: Exchanges heat between facility-side cooling water (approximately 15-25°C) and IT-side coolant, with the two loops physically isolated
Circulation pumps: Drive IT-side coolant through each rack's cold plates, typically configured with redundant (N+1) pumps
Expansion tank and makeup system: Maintain loop pressure stability and compensate for minor coolant losses
Filters and ion exchange resin: Maintain coolant cleanliness and electrical conductivity within safe ranges
Flow and temperature sensors: Provide real-time monitoring data to BMS or DCIM systems

CDU capacity sizing must consider the total heat dissipation of served racks, coolant flow requirements (typically 4-8 LPM/kW), and facility-side cooling water supply temperature and flow limitations. In Taiwan's high-temperature environment, facility-side cooling water temperature may reach 30-35°C during summer, requiring adequate margins in the CDU's heat exchanger design^[8].

Coolant Selection

The most common coolants in DLC systems are deionized water or propylene glycol-water solutions. Deionized water offers the best specific heat capacity and thermal conductivity but requires strict electrical conductivity control (typically below 1 uS/cm) to reduce short-circuit risk in case of leaks. Propylene glycol-water solution (25-40% concentration) provides additional freeze and corrosion protection but reduces specific heat capacity by approximately 10-15%, requiring correspondingly increased flow rates. The coolant pH must be maintained between 7.0-8.5 to prevent corrosion of copper, aluminum, and other metal fittings.

Manifold Piping Design

The DLC piping system runs from the CDU through main distribution lines to each rack row, then distributes via manifolds to cold plates within each rack. Key design points include:

Material selection: Stainless steel or copper for main lines, with flexible hoses (such as EPDM or stainless steel braided hoses) connecting cold plates inside racks for maintenance accessibility
Pressure rating: IT-side loop pressure is typically controlled at 2-4 bar to avoid excessive pressure on pipe fittings
Thermal expansion compensation: Long straight pipe runs require expansion joints or natural bends
Seismic design: Taiwan is located in a seismic zone, so liquid cooling piping must comply with building seismic codes for support design, and critical joints should use flexible connections

Leak Detection and Protection

Liquid entering the IT space is the greatest risk factor for DLC. A comprehensive leak detection system should include three lines of defense:

First line: Sensing cables at connections -- Deploy sensing cables at all quick disconnect (QD) connections and manifold joints to detect micro-leaks
Second line: Drip trays at rack base -- Install drip trays and water level sensors at the bottom of each rack as a containment barrier for connection leaks
Third line: Sensing cables under raised floor -- Lay continuous sensing cables along piping routes to provide zone-based leak alarms

Leak events should be linked to the CDU's automatic isolation valves -- when any zone detects a leak, the system automatically closes the supply and return solenoid valves for that zone, limiting the leak scope to the smallest possible section while sending an alert to the monitoring center.

4. Immersion Cooling System Design

Immersion cooling pushes thermal precision to the ultimate level -- the entire motherboard is submerged in dielectric fluid, with all heat-generating components operating in a liquid environment. This eliminates the two thermal resistances from chip to heat sink and from heat sink to air, significantly reducing component operating temperatures.

Dielectric Fluid Characteristics

Dielectric fluid for immersion cooling must simultaneously meet requirements for electrical insulation, chemical stability, low toxicity, and appropriate thermophysical properties. Current mainstream products include synthetic hydrocarbons (such as Shell S5 X), fluorinated fluids (such as 3M Novec 7100 series), and siloxane-based fluids^[7]. Key parameters include:

Dielectric strength: Typically greater than 40 kV/2.5mm, ensuring no arcing under any operating condition
Kinematic viscosity: Affects natural convection efficiency; lower viscosity fluids have faster convection speeds
Flash point: Hydrocarbon dielectric fluids typically have flash points of 160-200°C, which must be factored into fire safety assessments
GWP (Global Warming Potential): Fluorinated fluids can have GWP values ranging from hundreds to thousands, with some products facing phase-out pressure due to F-gas regulations
Density: 1.0-1.8 kg/L, directly affecting tank weight and floor load design

Tank Design

Immersion cooling tanks replace the role of traditional racks, with design considerations including:

Structural load bearing: A fully loaded immersion tank (including dielectric fluid) can weigh 1,500-2,500 kg, far exceeding the 300-500 kg of traditional racks; the data hall floor must have a load capacity of 15-25 kN/m2
Tank material: Stainless steel (SUS 304/316) or aluminum alloy are mainstream choices, requiring chemical compatibility with the dielectric fluid
Server carrier: Server motherboards are inserted vertically or horizontally into dedicated carriers within the tank, designed to balance cooling efficiency with maintenance accessibility
Overflow and level control: Fluid level must be maintained above all components with sufficient expansion margin

Cooling Loop Design

The cooling loop for single-phase immersion is relatively straightforward: heated dielectric fluid naturally rises within the tank, transfers heat to the facility cooling water loop through heat exchangers at the top or side of the tank, and cooled fluid sinks back to the bottom forming a natural circulation. Some designs add circulation pumps to enhance fluid flow for more uniform cooling.

The two-phase immersion cooling loop is more sophisticated: dielectric fluid boils at the chip surface producing bubbles, gaseous coolant rises to the space above the liquid surface, contacts the condenser (typically water-cooled coils) at the tank top, condenses back to liquid, and drips back to the surface. This process is entirely driven by phase change, requiring no pumps or fans -- truly passive cooling. The condenser design capacity must match peak cooling loads, and safety valves must be installed for abnormal pressure conditions.

Paradigm Shift in Maintenance Procedures

Immersion cooling fundamentally changes IT operations procedures. Server replacement requires extracting motherboards from the dielectric fluid, involving residual fluid recovery, component cleaning, and fluid level replenishment. Maintenance personnel need specialized tools and training, with motherboard replacement time extending from the traditional air-cooled 5-10 minutes to 20-40 minutes. Regular quality testing of the dielectric fluid (acid number, water content, dielectric strength) also becomes a new routine maintenance item.

5. Hybrid Cooling Architecture: Air + Liquid Cooling Transition Strategy

Most data centers will not transition to full liquid cooling in one step but adopt hybrid cooling architectures as a transitional strategy. This reflects both cost considerations and the reality of IT equipment generational changes -- a single facility may simultaneously house air-cooled storage servers and GPU computing racks requiring liquid cooling.

Retrofit Path Planning

A typical air-to-liquid cooling retrofit can be divided into three phases:

Phase 1 -- RDHx Introduction: Install rear-door heat exchangers on existing high-density racks without modifying IT equipment, only requiring additional cooling water piping to the rack row end, increasing per-rack cooling capacity from 15 kW to 40 kW
Phase 2 -- DLC Zone Deployment: Deploy CDU and DLC manifold systems in new or renovated areas specifically serving GPU computing racks. This phase requires planning CDU room space, piping routes, and leak detection systems
Phase 3 -- Immersion Cooling Evaluation: For next-generation ultra-high-density racks (150 kW+), evaluate the feasibility of introducing immersion cooling, including comprehensive adjustments to floor load capacity, fire codes, and operations procedures

Engineering Considerations for Hybrid Mode

In hybrid architectures where air and liquid cooling coexist, several key engineering issues require proper management:

Temperature zone management: The exhaust temperature from liquid-cooled zones is higher (40-50°C), and if released into the data hall environment, it may affect inlet conditions for air-cooled zones. This requires airflow management or independent exhaust heat loops for isolation
Cooling water system integration: CDU facility-side cooling water must integrate with existing chilled water or cooling tower systems, with water volume, pressure, and quality management becoming cross-system coordination challenges
Power distribution: Liquid-cooled zone rack power density is far higher than air-cooled zones, requiring differentiated design for busway and UPS capacity planning
Unified monitoring: Monitoring parameters for air and liquid cooling differ significantly, requiring BMS or DCIM systems to integrate heterogeneous sensor data including temperature, flow, pressure, and leak detection

Importance of Future-Ready Design

Even if liquid cooling has not yet been deployed, new data centers should reserve liquid cooling expansion capability during the design phase:

Reserve CDU room space and pipe penetration openings
Cooling water system capacity with 30-50% expansion margin
Reserve liquid cooling piping route space under raised floors
Design floor load capacity based on immersion cooling requirements (15 kN/m2 or above)
Reserve electrical system capacity for high-density rack requirements

Planning liquid cooling solutions for an AI data center? Contact our engineering team to get liquid cooling system design recommendations tailored to local conditions in Taiwan.

6. Waste Heat Recovery and Sustainability Benefits

An important added value of liquid cooling technology is the recovery and reuse of high-grade waste heat. Traditional air-cooled data centers have exhaust temperatures of only 30-35°C, limiting utilization scenarios. However, liquid cooling systems -- especially DLC -- can achieve coolant outlet temperatures of 45-60°C^[9], opening up diverse waste heat recovery possibilities at this temperature level.

Waste Heat Recovery Applications

District heating: Multiple Nordic countries have integrated data center waste heat into urban heating networks, with Finland's data center waste heat recovery ratio exceeding 40%
Agricultural greenhouses: Using 45-55°C waste heat water to warm greenhouses, extending crop growing seasons^[10]
Industrial preheating: Serving as a preheating source for raw materials or wash water in industrial processes, replacing some boiler loads
Adsorption/absorption cooling: Using 50-60°C waste heat to drive adsorption chillers, converting waste heat into cooling for office area air conditioning
Seawater desalination: In coastal areas, waste heat can be used for low-temperature multi-effect distillation (MED) seawater desalination

Carbon Emission Reduction Benefits

The carbon emission reduction benefits of liquid cooling for data centers are multifaceted. First, liquid cooling itself is more efficient than air cooling, with significantly reduced fan energy consumption, lowering PUE from air cooling's 1.3-1.5 to 1.05-1.15^[5]. Second, higher coolant temperatures expand the available hours for free cooling -- in the Kaohsiung area of Taiwan, calculating with 40°C coolant return water temperature, approximately 65-75% of the year can rely solely on cooling tower heat rejection without running chillers^[8]. Combined with waste heat recovery replacing traditional heating energy, liquid-cooled data centers can reduce carbon emissions by 30-50% compared to traditional air-cooled designs.

The EU's revised Energy Efficiency Directive (2023/1791) already requires data centers with rated power above 500 kW to disclose PUE, WUE, and waste heat recovery ratios starting in 2025. This regulatory trend signals that waste heat recovery will transition from a "bonus" to a "compliance requirement," providing additional policy impetus for liquid cooling system investment.

Conclusion

From rear-door heat exchangers to direct-to-chip liquid cooling, from single-phase immersion to two-phase immersion -- liquid cooling technology provides a clear upgrade path for AI-era data center thermal management. This is not merely a change in cooling methods but a paradigm shift in data center engineering from "ducts and air outlets" to "piping and heat exchangers." For Taiwan's data center industry, the hot and humid climate conditions actually make the energy-saving advantages of liquid cooling even more pronounced -- higher coolant temperatures and abundant cooling tower heat rejection potential enable liquid cooling systems in tropical environments to potentially achieve even better PUE performance than in temperate regions. Engineering teams need solid system design capabilities, rigorous piping construction quality, and continuous learning and practice with new technologies.