November 2, 2022 — For cosmologists trying to study the formation of the Universe, knowing the mass of everything is critical. But the need to estimate the mass of black matter, which cannot be observed directly, limits their accuracy. A team of scientists led by Carnegie Mellon University (CMU) trained artificial intelligence (AI) on data from simulated galaxy clusters, in which the composition of all components is known. This AI then predicted a mass for the real-world Coma galaxy cluster that matches those of earlier, more human-intensive attempts. The result offers the possibility of a faster and more accurate assessment of the masses of galaxy clusters.
Today, cosmologists wonder how clusters of galaxies form and persist. The hundreds, if not thousands, of galaxies contained within these vast structures seem to be moving too quickly for their collective gravity to hold them together. Even when scientists take into account the mysterious dark matter – which is impossible to detect directly despite making up 85% of matter in the Universe – the uncertainties are far greater than scientists are comfortable with.
Since the galaxies in a cluster revolve around its center of mass, scientists can determine the mass of that cluster based on the speed at which they are moving. Galaxies moving away from us are slightly red-shifted – much like the lower tone of a train moving away, their light is a little redder. Light from galaxies heading towards us is similarly shifted a little more blue. Measuring the difference between the two shows how fast the galaxies are spinning. Higher speeds mean there has to be more mass to hold the cluster together. But the need to estimate dark (invisible) matter, hot ionized gases and visible galaxies means large uncertainties. Additionally, scientists have not yet worked out the three-dimensional structures of the clusters, which further limits their confidence in their understanding of what is going on.
Matthew Ho, a graduate student working in Hy Trac’s group at CMU’s McWilliams Center for Cosmology, wanted to know if there was a way to use AI to determine the mass of the Coma cluster, a wide range of galaxies 321 million light-years from Earth. . An AI approach to the problem, he explained, would make it possible to estimate the mass of galaxy clusters much faster than the painstaking surveys of the past. Equally important, it offered a way around uncertainties – as well as, potentially, other biases that humans inevitably introduce with their initial assumptions.
“Galaxy clusters are exactly what they sound like…groups of hundreds to thousands of galaxies that all seem to be in equilibrium orbits around each other,” Ho explained. matter in individual galaxies is not enough to…keep them all in orbit…Understanding their distribution in space and time is very important for us to constrain models of cosmology.”
To solve the Coma Cluster problem, Ho would use a powerful AI tool called deep learning. This type of AI works by first feeding computer data into which the correct answer is labeled by humans. Because the computer is so much faster than humans, it can learn to relate data to the correct answer through trial and error. Initially, it creates a series of interconnected “layers” that represent different aspects of the data. He then adjusts these connections until his responses match the labels provided by the human. Once done, scientists test the AI against data that is not labeled. Once it has given the correct answers in this testing phase, it is ready to work on data for which humans do not yet have the answers.
Building an accurate training data set is therefore essential to obtain good results. This is especially the case when we know that the real data has issues, such as those that limit cluster mass measurements. So Ho used his AI to analyze previous simulations of galaxy clusters on National Science Foundation-funded Bridges-2 as well as Vera. By using clusters of artificial galaxies whose composition was completely known, he could be sure that the computer was working with accurate data.
However, creating accurate artificial galaxy clusters was a difficult task since the simulation had to include so many “particles”. In total, the simulation would start with hundreds of gigabytes of data, enough to fill dozens, if not hundreds, of laptops. Then it would have to perform calculations on that data, which would inflate the juggled electronic bits.
The Big Data capabilities of Bridges-2 made it an ideal solution to the problem. With large memory nodes of 512GB and 4000GB, it could hold all data in a single node, greatly speeding up the most important simulation processing tasks by reducing the time required for communications between nodes. Along with processing on Vera, this allowed Ho to create a clean training dataset that his AI program, also running on Vera, used to learn how to judge the mass of galaxy clusters. In previous work, the team had also used Bridges-2’s advanced GPU nodes, which are perfect for the many parallel AI calculations needed.
When unleashed on real-world data from the coma cluster, the AI produced results that agreed with previous, human-guided estimates of the galaxy cluster’s mass. This result gave credence to earlier attempts to remove observational bias, because the computer had started out with none of the assumptions humans had. It also gave Ho confidence that the computer was giving a correct answer, not just an answer consistent with previous studies. More importantly, it suggests that the AI is able, when it receives data for other real galaxy clusters, to produce similarly reliable results. The scientists published their findings in the journal natural astronomy in June 2022.
Source: Ken Chiacchia, CFP