Conference

CS 21 Scientific Rationale

Solar and stellar physics are entering an era of data-driven discovery, mainly due to modern data mining and machine learning (ML) techniques enabling new powerful ways to extract information out of observational material. During the last decades, large ground-based sky surveys and all-sky space missions have harvested hundreds of terabytes of data corresponding to hundreds of millions of astronomical sources. In the Gaia era, we expect that the data volume will move towards the petabyte domain, requiring the handling of billions of sources. Solar physics and astrophysics have indeed emerged from a data-poor science to an intensive data-driven discipline.

Current solar and stellar survey datasets present novel algorithmic, computational, and statistical challenges. Astronomers are not necessarily experts in data mining and ML algorithms, but need to master the background knowledge to apply these methods to their scientific problems. Several of them can be solved by an increased collaboration between research astronomers and ML experts.

This splinter session “Machine Learning for Cool Stars” has the ambition to introduce ML techniques applied to solar and stellar physics to non-experts and present some of its current successful applications, focusing on stellar spectroscopy. Large spectroscopic surveys at all resolutions and covering optical and infrared wavelengths like Gaia-ESO, GALAH, APOGEE, LAMOST (among other) have started to apply ML techniques to deliver astrophysical parameters and detailed chemical abundances revealing chemical galactic enrichments in various stellar populations and clusters, or even identifying chemical species from unknown spectral lines.

These data-driven techniques also offer a very “natural” way of bringing diverse surveys, such as those mentioned above, on the same scale through commonly observed stars. Thus the observational data which is collected by diverse instruments, wavelength regimes, selection criteria, and from physically distinct volumes of the Galaxy can now be combined and interpreted through a common analysis framework. Additionally, through the ML process of “label transfer”, we can now attempt to measure physical quantities (e.g. mass of stars) pertinent to a given observational domain (e.g. asteroseismology) by automatically identifying and exploiting their carriers in a different domain (e.g. in stellar spectra). ML techniques have also started to be used in conjunction with Non-LTE spectral models, characterization of spectroscopic binaries or for studies of dynamics in stellar atmospheres.

The computing resources at large are increasing with time and astronomers have become very efficient at making use of large computing facilities everywhere. However, the sheer amount of data that is currently available, and which will only increase when new observational facilities come online in the near future (WEAVE, SDSS-V, 4MOST), is too great for many computationally intensive state-of-the-art ab-initio analysis routines to handle it. It is therefore not only possible and more efficient to adopt ML techniques on modern datasets but is rather becoming a necessity, with the option of either fully employing ML in the analysis of data or taking a hybrid approach of ML + traditional analysis tools. At the brink of this transition it is therefore very timely to discuss and learn about different options that the world of ML offers to the spectroscopic domain.

ML codes, however, are not a magic solution, and several examples of mis-interpretation of ML results in stellar spectroscopy have recently emerged. Indeed, ML codes contain black-boxes and their results have to be thoroughly discussed and validated by comparison with physics-based model findings, before being applied to huge data sets. The time is ripe for an insightful discussion on the constitution of adequate training sets, the interplay and merging of data- and model-driven approaches, the careful use of ML techniques and the validation of their results, thereby addressing both the perspectives and limitations of ML in the field of cool star spectroscopy.

Back