Total System Solution for DKRZ

Total System Solution for DKRZ The Deutsches Klimarechenzentrum GmbH (DKRZ), Hamburg, was founded in 1987 with the mission to provide state-ofthe- art supercomputing, data handling and associated services, including high level visualization, to allow the German scientific community, to conduct large scale earth system and climate modelling.

Updated Computer Systems at DKRZ

DKRZ recently upgraded its computer systems and became one of Europe's fastest supercomputer facilities, used for climate research. They now have the latest supercomputers from NEC, the SX-6 series, and a unified data management system, based on the Intel® Itanium® architecture and Linux. These elements are the core of the modernization project costing Euro 34 million.

With the advent of new technology, one trend in high performance computing is the fusion of computation, simulation and data analysis. With advancements in satellite technology delivering massive amounts of earth systems and climate data, the challenges and opportunities for fusing observational and/or experimental data with classical simulation have increased enormously.

To address this new reality, DKRZ developed a unified concept, capable of delivering a total solution with transparent access for the climate user community. In addition to the high capacity compute servers, an integrated distributed data management system was specified as an essential part of this upgrade. To satisfy the system requirements, new hardware and software had to be put in place to support the high speed numerical calculations, and high networking demands. In addition a scalable architecture unified shared file system and archive, facility was required to handle the massive volume of newly generated data.

Following a competitive procurement, the contract was awarded to NEC High Performance Computing Europe (HPC Europe), the vendor offering the compute power and functionality to satisfy DKRZ’s needs, as well as the system integration for a total solution. The winning proposal called for NEC to take control of the overall system integration process and deliver the service and system operation within the agreed budget. This involved using its long-established hardware and software engineering skills to select, install, maintain and operate the total system in order, to deliver the services required to fulfil the DKRZ mission.

Total Systems Overview
NEC used its own excellent products, the NEC SX series high capacity compute servers for numerical calculations and the NEC TX7 for data-handling. It also incorporated elements from other hardware and software vendors, Storagetek, the Legato hierarchical storage manager(HSM) and the ORACLE database running on top of Linux, to deliver an optimal solution.

NEC HPC Europe successfully completed the installation of the DKRZ system after a careful site planning and preparation process, an example of acting as a true total solutions provider. It supplied and set up a very complex hardware and software environment, performed system integration for a cutting edge task and delivered the final system on time, by using competent project management and deployment of skilled engineers. This achievement was only possible by listening to and closely collaborating with the DKRZ management, the customer.

To put it in perspective, this upgrade required the integration of a large compute server with a powerful data server. The functionality and performance requirements for the data service are transparent access to migrated data, a high bandwidth for data transfer and a shared file system capable of adaptation in upgrade steps, whenever the usage profile changes. To help satisfy these requirements, the data server and the HSM are both running on top of the Linux operating system. The successful implementation at DKRZ demonstrates that NEC HPC Europe has the competence to use Linux in mission critical computing centre environments.

In addition, ORACLE is running in a production environment on top of NEC Linux, using the Intel® Itanium® architecture, as implemented in the TX7, Itanium® based servers. The data server and HSM environment is one of the largest Linux based installations in the scientific/technical area. The system architecture of the data service was designed to be scalable. Although this implementation was targeted for typical HPC applications at DKRZ, the architecture adopted for the HSM environment could also be adapted to satisfy the requirements of businesses in computing, storage and archiving environments.

As Wolfgang Sell, director of DKRZ said: “What I admire most about NEC HPC Europe, is having listened to DKRZ ideas of how the final system should function, they looked for the most appropriate state-ofthe- art technologies available in the market today, both NEC products and best system components from third party vendors, integrated them and delivered the most advanced solution for our application. This was done within the original budget, in a seamless non-disruptive fashion, while the production system was fully operational.The architecture adopted is scalable to allow expansion and open enough for the introduction of new hardware and software for at least the next decade.”

Introduction of NEC HPC Systems

The first phase of the DKRZ system went into operation with 64CPUs in spring 2002 and was able to process climate simulations forty times faster than the Cray computer it replaced. The second phase, in September 2002, doubled this to 128CPUs and the final phase with 192CPUs was installed in summer of 2003. The DKRZ now has the second most powerful NEC system outside Japan. In 2003 this supercomputer may well be viewed as one of the world’s five fastest computer systems with respect to sustained productive performance delivered to climate application codes.

To summarise briefly, DKRZ has a long history in climate simulation research and in year 2001 made the decision to replace their Cray systems with a large NEC SX-6 system. They were given a fixed budget and the RFP required a compute server, a data server and infrastructure, to be delivered in three phases and completed by 2003. In addition the system had to be balanced between compute power and data handling. The final phase required delivery of at least 0.4TFLOPS, sustained performance, at least datarates of 5GBytes/s, transparent data access and management for HSM, around 50TeraBytes of disk cache, and 1.4PetaBytes of tape drives. NEC HPC Europe, built a complete solution, the high performance compute server based on a 192CPUs vector parallel SX-6 system, a main memory of 1.5 Terabytes, 1.5TFLOPS peak performance, and the data-handling server based on the TX7.

Conclusion
The DKRZ integrated system was delivered smoothly and on time. This enabled DKRZ to provide computing resources for climate research in Germany at the highest competitive international level. With this great success under its belt, HPC Europe is naturally and successfully expanding its solution business activities.

Intel and Itanium are registered trademarks of Intel Corporation in the United States.
Linux is a trademark or a registered trademark of Linus Torvalds in the United States and other countries.

Copyright© 2007 Itanium® Solutions Alliance. All rights reserved. Itanium® is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.