Machine Learning is a Powerful Tool When Searching for Exoplanets

Astronomy has entered the era of big data, where astronomers find themselves inundated with information thanks to cutting-edge instruments and data-sharing techniques. Facilities like the Vera Rubin Observatory (VRO) are collecting about 20 terabytes (TB) of data on a daily basis. Others, like the Thirty-Meter Telescope (TMT), are expected to gather up to 90 TB once operational. As a result, astronomers are dealing with 100 to 200 Petabytes of data every year, and astronomy is expected to reach the “exabyte era” before long.

In response, observatories have been crowdsourcing solutions and making their data open-access so citizen scientists can assist with the time-consuming analysis process. In addition, astronomers have been increasingly turning to machine learning algorithms to help them identify objects of interest (OI) in the Universe. In a recent study, a team led by the University of Georgia revealed how artificial intelligence could distinguish between false positives and exoplanet candidates simultaneously, making the job of exoplanet hunters that much easier.

The study was led by Jason Terry, a doctoral student with the Center for Simulational Physics (CSP) at the University of Georgia (UGA) and a former researcher with the Los Alamos National Laboratory (LANL). He was joined by researchers from the University of California San Francisco (UCSF), the Cardiovascular Research Institute (CRI), and the University of Alabama. The paper that describes their research, “Locating Hidden Exoplanets in ALMA Data Using Machine Learning,” recently appeared in The Astrophysical Journal.

Disk Substructures at High Angular Resolution Project. Credit: DSHARP

The first confirmed exoplanet was found in 1992, and the number has grown exponentially over the past fifteen years. To date, 5250 exoplanets have been confirmed in 3,921 systems, while another 9,208 candidates are awaiting confirmation. Nevertheless, the vast majority of those belong to one of three categories: Neptune-like (1,825), Gas Giants (1,630), and Super-Earths (1,595). These planets are more massive and generally orbit farther from their stars than smaller, rocky planets (or “Earth-like”), of which only 195 have been found.

Meanwhile, exoplanets that are in the formation stage are difficult to see for two main reasons: One, they are often hundreds of lights years from Earth (too far to see clearly), and two, the protoplanetary discs from which they form are very thick, measuring up to 1 AU in diameter (the distance between the Earth and the Sun). From what astronomers have seen, planets tend to form in the middle of these discs and convey a signature of the dust and gases kicked up in the process. But as Terry said in a recent AGU press release, research shows that artificial intelligence can help scientists overcome these difficulties:

“One of the novel things about this is analyzing environments where planets are still forming. Machine learning has rarely been applied to the type of data we’re using before, specifically for looking at systems that are still actively forming planets… To a large extent the way we analyze this data is you have dozens, hundreds of images for a specific disc and you just look through and ask ‘is that a wiggle?’ then run a dozen simulations to see if that’s a wiggle and … it’s easy to overlook them – they’re really tiny, and it depends on the cleaning, and so this method is one, really fast, and two, its accuracy gets planets that humans would miss.”

For the sake of their study, the team developed a machine learning model based on Computer Vision (CV), a field of artificial intelligence that enables computers and systems to extract data from digital images and videos. The team trained their CV model using synthetic images they generated, then applied the model to real observations of protoplanetary disks conducted by the Atacama Large Millimeter-submillimeter Array (ALMA). In the end, they demonstrated that their machine learning method (based on CV) could correctly identify the presence of one or more planets in disks.

Atacama Large Millimeter/submillimeter Array (ALMA), located atop the Chajnantor Plateau in the Chilean Andes. Credit: ESO

They further demonstrated that it could correctly constrain the location of the planets in those disks. Co-author Cassandra Hall, an assistant professor of astrophysics, and the principal investigator of the Exoplanet and Planet Formation Research Group at UGA, explained:

“This is a very exciting proof of concept. The power here is that we used exclusively synthetic telescope data generated by computer simulations to train this AI, and then applied it to real telescope data. This has never been done before in our field, and paves the way for a deluge of discoveries as James Webb Telescope data rolls in.”

In the coming years, several next-generation space and ground-based observatories will join the James Webb Space Telescope (JWST). This includes the Nany Grace Roman Space Telescope (RST), the Extremely Large Telescope (ELT), the Giant Magellan Telescope (GMT), and the Thirty Meter Telescope (TMT). The and other telescopes will gather unprecedented levels of data in multiple wavelengths, which will be used to search for exoplanets. More than that, the cutting-edge instruments they will use will be able to characterize exoplanet atmospheres like never before. Said Terry:

Beyond exoplanet research, these observatories will investigate cosmological mysteries like Dark Matter, Dark Energy and probe the earliest ages of the Universe. Next-generation analytical tools are also needed to analyze this high-quality data so astronomers can spend more time interpreting the data and coming up with new theories to explain it. According to Terry, machine learning is already capable of meeting this demand, will save time and money, and efficiently guide scientific time, investments, and new proposals:

“There remains, within science and particularly astronomy in general, skepticism about machine learning and of AI, a valid criticism of it being this black box – where you have hundreds of millions of parameters and somehow you get out an answer. But we think we’ve demonstrated pretty strongly in this work that machine learning is up to the task. You can argue about interpretation. But in this case, we have very concrete results that demonstrate the power of this method.”

Further Reading: UGA Today, The Astrophysical Journal