Reinventing HPC, ParTypes, Crossbar AI Accelerator and more

Reinventing HPC, ParTypes, Crossbar AI Accelerator and more

Spread the love

In this regular feature, HPCwire highlights recently published research in the high performance computing community and related fields. From parallel programming to exascale to quantum computing, the details are here.

Reinventing High Performance Computing: Challenges and Opportunities

Technical and economic forces reshaping HPC (credit: Reed, Gannon, Dongarra)

In this article by a team of researchers from the University of Utah, University of Tennessee, and Oak Ridge National Laboratory, researchers look at the challenges and opportunities associated with the state of computing high performance. The researchers argue “that current approaches to designing and building high-performance edge computing systems must change in profound and fundamental ways, embracing end-to-end co-design; custom hardware configurations and packaging; large-scale prototyping, as was the case thirty years ago; and collaborative partnerships with leading IT ecosystem companies, smartphone and cloud vendors. To prove their point, the authors provide a history of computing, discuss economic and technological changes in cloud computing, and semiconductor issues that enable the use of multi-chip modules. They also provide a summary of the technological, economic and future directions of scientific computing.

The article inspired the next session of the SC22 panel of the same name, “Reinventing High Performance Computing”.

Authors: Daniel Reed, Dennis Gannon and Jack Dongarra

A type discipline for parallel message passing programs

Researchers from the University of Lisbon and the University of the Azores (Portugal), the University of Copenhagen and DCR Solutions A/S (Denmark) and Imperial College London (UK) have developed a discipline type for parallel programs called ParTypes. In this research paper published in the journal ACM Transactions on Programming Languages ​​and Systems, the research team focused “on a parallel programming model featuring a fixed number of processes, each with its local memory, executing its own program and communicating exclusively by point-to-point synchronous message exchanges or by synchronization via collective operations, such as broadcast or reduction. The researchers state that “type-based approaches have clear advantages over competing solutions for type-checking functional properties that can be captured by types.”

Authors: Vasco T. Vasconcelos, Francisco Martins, Hugo A. López and Nobuko Yoshida

Measurement-based estimator scheme for continuous quantum error correction

The proposed protocol for CQEC using the Measurement-Based Estimator (MBE) scheme (Credit: Borah et al)

An international team of researchers from the Okinawa Institute of Science and Technology Graduate University (Japan), Trinity College (Ireland) and the University of Queensland (Australia) has developed a continuous quantum error correction (MBE- CQEC). In this paper published by the American Physical Society in the journal Physical Review Research, the researchers demonstrated that by creating a “measurement-based estimator (MBE) of the logical qubit to be protected, which is driven by the DC measurement currents noisy stabilizers, it is possible to accurately track errors occurring on physical qubits in real time.According to the researchers, by using the MBE, the newly developed scheme exceeded the performance of canonical discrete quantum error correction schemes (DQEC), which “use projective von Neumann measures on stabilizers to discretize error syndromes into a finite set, and fast unitary gates” are applied to recover corrupted information. The scheme “also allows the QEC to be carried out immediately or on a delayed basis with instantaneous feedback”.

Authors: Sangkha Borah, Bijita Sarma, Michael Kewming, Fernando Quijandría, Gerard J. Milburn and Jason Twamley

GPU-based data-parallel rendering of large, unstructured, non-convex partitioned data

A team of international researchers used the Texas Advanced Computing Center’s Frontera supercomputer to “interactively render the Fun3D Small Mars Lander (14 GB / 798.4 million finite elements) and Huge Mars Lander (111.57 GB / 6.4 billion finite elements) at 14 and 10 frames per second using 72 and 80 GPUs Motivated by Fun3D Mars Lander simulation data, researchers from Bilkent University (Turkey), NVIDIA Corp. (California, USA), Bonn-Rhein-Sieg University of Applied Sciences (Germany), NESC-TEC and University of Minho (Portugal), and University of Utah (Utah, USA) introduced a “scalable, memory-efficient GPU-based direct volume visualization framework suitable for in-situ and post-hoc use”.In this paper, the researchers described the capability of the approach to reduce “memory usage of unstructured volume elements by taking advantage of a n proprietary index reduction scheme based on gold and providing fast traversal based on ray walking without requiring large external data structures built on the elements themselves. .” Additionally, they also provide details on the team’s development of the “GPU-optimized deep compositing scheme that allows correct compositing in the order of accumulated intermediate color values ​​on different ranks that works even for non-clustered clusters.” convex”.

Authors: Alper Sahistan, Serkan Demirci, Ingo Wald, Stefan Zellmann, João Barbosa, Nathan Morrical, Uğur Güdükbay

Summit Supercomputer

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale Accelerator-Rich Systems

Researchers from the University of Wisconsin-Madison conducted a study with the goal of “understanding GPU variability in large-scale, accelerator-rich computing clusters.” In this paper, researchers seek to characterize the extent of variation due to GPU power management in modern HPC and supercomputing systems. Leveraging Oak Ridge’s Summit, Sandia’s Vortex, TACC’s Frontera and Longhorn, and Livermore’s Corona, the researchers are “collecting over 18,800 hours of data from over 90% of the GPUs in these clusters.” The results show an “average performance variation of 8% (max 22%) even if the GPU architecture and vendor SKU are identical in each cluster, with outliers up to 1.5x slower than the Median GPU”.

Authors: Prasoon Sinha, Akhil Guliani, Rutwik Jain, Brandon Tran, Matthew D. Sinclair and Shivaram Venkataraman

Tools for quantum computing based on decision diagrams

In this article published in ACM Transactions on Quantum Computing, Austrian researchers from the Johannes Kepler University of Linz and the Software Competence Center Hagenberg offer an introduction to tools for the development of quantum computing for users and developers. To begin, the researchers are “revisiting the concepts of using decision diagrams, for example, for the simulation and verification of quantum circuits.” Next, they present a “visualization tool for quantum decision diagrams, which allows users to explore the behavior of decision diagrams in the design tasks mentioned above”. Finally, the researchers dive into “decision diagram-based tools for the simulation and verification of quantum circuits using the methods described above as part of the open-source Munich Quantum Toolkit.” Tools and additional information are publicly available on GitHub at

Authors: Robert Wille, Stefan Hillmich and Lukas Burgholzer

Architecture of the proposed photonic crossbar AI accelerator (credit: Sturm and Moazeni)

Consistent and scalable optical crossbar architecture using PCM for AI acceleration

Computer engineers at the University of Washington have developed an “optical AI accelerator based on a crossbar architecture”. According to the researchers, the chip design addressed “lack of scalability, large footprints and high power consumption, as well as incomplete system-level architectures to fit into the data center architecture. existing data for real-world applications”. In this paper, the University of Washington researchers also provided “system-level modeling and analysis of our chip performance for the Resnet50V1.5, accounting for all critical parameters, including chip size. memory, die size, photon losses, and power consumption of electronic devices.” The results showed that “a proposed 128×128 architecture can achieve inference per second (IPS) similar to the Nvidia GPU A100 at 15.4× lower power and 7.24× lower area.

Authors: Dan Sturm and Sajjad Moazeni

Do you know of any research that should be included in next month’s list? If so, send us an email at [email protected]. We look forward to hearing from you.

Leave a Comment

Your email address will not be published.