Rack Cooling, Workload Management Tools Fight Hotspots

Read this story at TechTarget.com
Hotspots are a pain in the CRAC, leading some data centers to implement source-of-heat cooling and software tools that redistribute workloads across servers. MGHPCC is in the vanguard.

Some IT teams push cabinet densities to reach peak efficiency per square foot and build cooling structures to support that. For others, even without hitting high kW densities per cabinet, IT departments run into hotspots.
Some like it hot
The research computing pods at the Massachusetts Green High Performance Computing Center (MGHPCC) in Holyoke contain petabytes of storage, high-speed central processing units (CPUs) and general processing units (GPUs) in blade servers, linked together via InfiniBand networks.
"We're running systems hot, hot, hot all the time -- the goal is 100% efficiency," said James Cuff, researcher at Harvard University who spoke about the center during the AFCOM Symposium 2014 in Boston last month.
Cabinets at the MGHPCC are designed for a typical 14 kW per rack load, and average around 10 kW per rack. However, they're capable of 20 or 25 kW per rack easily, and 100 kW density is theoretically possible.
We're running systems hot, hot, hot all the time -- the goal is 100% efficiency." James Cuff, researcher, Harvard University
"As power grows, water -- liquid cooling -- is definitely coming back into the data center," Cuff said. "But right now, the price/performance ratio favors air cooling."
MGHPCC uses hot aisle containment and in-row cooling to maintain 81 degrees Fahrenheit inlet temperature with a temperature delta of 25 degrees to a 106 degree Fahrenheit hot aisle, according to James Culbert, IT technical lead at the center.
Dense cabinets create complex air flow patterns that demand attentive layout and rigorous cooling. A data center is like a game of Tetris, said Hassan Moezzi, CEO of Future Facilities Ltd., a London-based provider of computational fluid dynamics modeling software and services.
While the goal is 100% packed racks, "in reality, players create voids and gaps," he said. "Physical fragmentation destroys energy efficiency." One Future Facilities client ended up with overheating racks even at 45% capacity.
The MGHPCC arranges racks to keep the cabling out of the way of heat flow, but with high-density and mixed-use racks, they still end up with hotspots that are cooled by the in-row systems. Other data centers take cooling even closer to the source of heat.
"In-rack cooling could save us a lot of money, effectively turning off our ACs, but we haven't pulled the trigger on it yet," said Greg Tupper, IT services manager at MTS Systems Corp., a testing and sensing company headquartered in Minnesota.
MTS has pulled out a lot of old servers, so they'd need to re-rack to increase density. Tupper envisions each rack at least 70% filled -- before putting in the coolers. Rack cooling vendors offer top-down or rear-door models, and various configuration options. Tupper advised investigating if your existing racks are compatible with a cooling system vendor's products -- not every rack is the same size or depth.
Rack cooling systems installation is a large one-time capital expense, with associated labor and design costs. Tupper recommended doing cost comparisons on four or five vendors, going in with solid knowledge about your density, existing problems and goals.
"For example, I like the modularity of the radiator units in a door that OptiCool [Technologies] offers, and the product is multipathing in case of a failure," Tupper said. "The top-down design from Subzero [Engineering] wouldn't work with the setup we currently have in the space."
MGHPCC found that they overdid the initial in-row cooler specifications, and were able to take out several coolers from each installation without degrading cooling performance.
"That's something we worry about too, just blanketing the rack and over-investing," Tupper said.
However, the drive to higher-density cabinets hasn't picked up speed as much as expected, according to William Dougherty, SVP and CTO of RagingWire Data Centers, a colocation provider in the U.S.
The majority of colocation customers and providers fill racks to the 4 to 6 kW per cabinet range, he said. "Very few customers are pushing over 10 kW per cabinet."
Dougherty believes the increased energy efficiency of processors keeps most commercial servers and IT equipment from pulling as much power as previous generations.
"Users aren't seeing the benefit of going more dense," he said, so it's pointless to increase density and deal with the specialized cooling requirements.
Workload management alleviates hotspots
Instead, data center IT staff can ameliorate hotspots by redistributing workloads, said Dave Wagner, director of market development at TeamQuest Corp., a systems management software provider based in Iowa. There are probably underutilized servers in another rack that could take on workloads from a hot cabinet, Wagner said. The trick is to know where you have room and where you're overtaxing the silicon.
Workload management is "a whole lot cheaper than new CRACs," Wagner said.
Your typical enterprise has many different form factors from 17 different vendors and several product generations in a typical data center, he added. You must build a map of the physical -- heat and power draw -- as well as the workload distribution -- where applications are consuming resources. Data center infrastructure management tools are getting there, he said, and they require IT and facility teams to both look at this map, and look often.
After all, the data center is full of IT equipment that operates on one constant: change.
"You think you've designed in flexibility and can do whatever you want power-wise as long as it is below your max number of, for example, 2 megawatts," Future Facilities' Moezzi said. "But every change erodes capacity and flexibility."
Anyone retrofitting or building a new data center should accurately gauge per-cabinet power draw to prevent over cooling or underutilizing cooling equipment and running it at low efficiency. As Wagner said, baseline everything you can. But under heavy load, cabinets might pull five times more power than under a typical load, so how do you design for both? Leave room to spread the load.
An automation and orchestration layer like PowerAssure's Software Defined Power lets data centers move workloads based on operational requirements, RagingWire's Dougherty said. When demand fluctuates, the important thing is to spin servers up here and down there reliably and consistently, which pays off in terms of power use, latency and management, he said.

Rack Cooling, Workload Management Tools Fight Hotspots

Research projects

Collaborative projects

Outreach & Education Projects