November 3, 2022 – The rise of artificial intelligence (AI) and a branch of AI called machine learning, which focuses on using data and algorithms to mimic the way humans learn, is rapidly changing the way data-intensive scientific discovery is being done.
Data-intensive science is a modern, exploration-centric style of science that relies heavily on advanced computing capabilities and software tools to manipulate and explore large data sets. The introduction of new and better machine learning techniques is now being used to aid and automate the scientific discovery of increasingly complex problems.
“AI research is making great strides,” said Michael E. Papka, Deputy Assistant Director of the Laboratory and Director of the Supercomputing Facility at the U.S. Department of Energy’s (DOE) Argonne National Laboratory. , who is also a professor of computer science at the University of Illinois at Chicago. (UIC). “We are seeing progress in many areas of AI, achieved not only by new techniques, but especially by new hardware to run compute-intensive AI models.”
Two years ago, Papka and a like-minded group of STEM (science, technology, engineering, and math) teachers and computer scientists began meeting weekly to discuss a future skills gap in the workforce. manpower to solve AI problems. They started thinking about a new tool, or maybe a teaching module, that would introduce the concepts of AI to the young researchers of tomorrow.
The team, which includes science and education outreach staff from Argonne and STEM educators from Northern Illinois University (NIU) and UIC, wanted to explore areas where a future workforce will be without any necessary doubt: AI software developers and data science experts. Over several months, the group met with experts in the field of AI, early-career scientists using AI systems in their research, and graduate students working on data science tools.

Some technologies are still in the experimental phase, including the growing collection of AI hardware found on the AI testbed at the Argonne Leadership Computing Facility (ALCF). Nevertheless, AI methods such as machine learning, which uses algorithms to analyze and learn from input data, and deep learning, a subfield of machine learning that uses a structure complex of algorithms modeled on the human brain for learning and making decisions, are beginning to contribute to profound scientific breakthroughs. These methods predicted the 3D structures of proteins for medical research and performed routine but vital tasks such as identifying optimal candidate materials for harvesting sunlight. The ALCF is a user facility of the DOE Office of Science.
Such developments will make AI proficiency an essential workforce skill. “Looking for ways to embed AI methodology into scientific problems is the first step to finding ways to solve them,” said Meridith Bruozas, director of institutional partnerships at Argonne and a member of the AI collaboration team. who helps develop the program. “As a leading research organization, we have a keen interest in cultivating the AI skills of the future workforce.”
Ideas began to take shape on what a data- and research-driven experience might look like. This would give students access to large datasets, model real-world scientific practices, and introduce the AI-based methods data scientists use to better understand a question of interest.
First lesson: what is it?
Argonne is one of a growing number of research organizations integrating powerful AI resources and techniques to drive new discoveries. But not all AI systems and techniques, such as machine learning, need to be powerful to make a meaningful contribution to society. An AI-powered system is any computing device or system that performs human intelligence tasks by leveraging complex data sets. Such systems can still be opaque; many people use AI models without knowing how they work. Understanding how to tune models to provide useful results is still a new and active area of research. The team saw a challenge in linking the technical aspects of AI to its potential to solve big problems in a way that students personally care about.
Raising awareness of how machines can be used to simulate human intelligence processes seemed like a good place to start before moving on to problem solving – like how to simulate data sets, how to build confidence in results and how to create training datasets that avoid the risks and shortcomings of AI-based methods.
“We expected a zero programming experience, so we quickly found ways to potentially teach AI concepts from an ‘unplugged’ data science perspective,” said John Domyancich, Learning Center Manager. Argonne. “Activities to get them thinking about how to use the data to answer the questions and begin to figure out what other information they might need to get an answer with a high degree of confidence.”
Another challenge for the team was that of scale: what type and how much data would be needed to be useful? And how would students begin to formulate the questions they sought to answer?
From labeling bird songs to identifying polluted rivers
In July 2021, the team held a month-long summer pilot program with high school students recruited through NIU’s Upward Bound program. After being introduced to the broader concepts of AI and machine learning, students worked in groups to analyze AI-generated datasets using some of the same tools scientists use to train different machine learning models, including Jupyter Notebooks.
They used Spotify data and models to learn how to identify a music genre, then catalog and recognize bird songs. “We asked them to think about how a human might approach the task and then how a computer might approach the same task,” said Brenda Lopez Silva, one of the science educators who attended the camp. summer pilot.” Spoiler alert: the computer essentially learns what a human teaches it. This opened the door to interesting discussions on how ethics could be considered in AI today and in the future.
In the summer of 2022, the team restructured the camp to be shorter and more intensive, and activities were based on data collected by sensors about the environment in northern Illinois. Students explored how computer vision could be used to optimize street crossings. Another task was to sort and classify images of a river to try to determine its level of pollution. This time, instructors used a different approach to tie what students did with pen and paper to how scientists use machine learning to make discoveries.
For the river health activity, students attempted to test their hypotheses about the river by structuring the data available to them (pictures) and constructed a decision tree (a method for reaching a conclusion based on these entries), only to realize that there was insufficient data to answer the question. “The task required an advanced level of thinking,” said Kristin Brynteson, director of NIU’s STEAM (science, technology, engineering, arts, and math) program, who led the AI camp sessions. “Students needed to see beyond the camera, to question the data and the way it is labeled.”
“It’s possible that an AI system could infer the health of the river from camera images, but that would require a lot of data for it to work,” said Nicola Ferrier, senior computer scientist at Argonne and expert in the field of AI who consults the team. “This exercise was a good first step towards introducing pattern detection concepts across features.”
The team noticed that once students performed a sorting and weighting data activity by hand, they were better able to grasp the algorithm and understand how a computer would perform the same task. “It was educational for students to reason about information processing tasks instead of just letting someone else’s algorithm do the work,” Papka said.
Much work remains to be done, but the team is closer to defining the framework for a “classroom-usable approach” to teaching the principles of AI – an approach that also incorporates accessible technologies to serve as instruments of Data collection and analysis Leading competitors include sensor nodes, either purpose-built to collect student-led surveys or to access a data portal tied to an already deployed sensor network.
“We’re excited to be actively researching something that could provide an entry point for students to explore science and learn about AI,” Bruozas said. “Something that serves as a portal for students to access and analyze data on demand and apply it to a problem they want to solve.”
About Argonne
The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding across a wide range of disciplines. Supported by the Advanced Scientific Computing Research (ASCR) program of the U.S. Department of Energy’s (DOE) Office of Science, the ALCF is one of two DOE advanced computing facilities dedicated to open science.
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts cutting-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state, and municipal agencies to help them solve their specific problems, advance American scientific leadership, and prepare the nation for a better future. With employees in more than 60 countries, Argonne is managed by UChicago Argonne, LLC for the US Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the largest supporter of basic physical science research in the United States and works to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.
Source: Laura Wolf, Argonne National Laboratory