Helge Stein1 Dan Guevarra1 Paul Newhouse1 Edwin Soedarmadji1 John Gregoire1

1, Joint Center for Artificial Photosynthesis, California Insitute of Technology, Pasadena, California, United States

UV-Vis spectroscopy is the first step in assessing light absorbers for solar fuels generation, but the community lacks sufficiently large experimental datasets and predictive models for experimental optical properties. Based on the largest and most diverse experimental materials science dataset of 180,902 distinct materials, including 45 elements, and more than 80,000 unique quinary oxide and 67,000 unique quaternary oxide compositions we trained different deep neural nets that enable us to predict complete UV-Vis absorption spectra from a materials sample image. The models learn how to spectrally hyperscale from a low energy but high spatial resolution input. Extracting direct bandgaps from predicted spectra yields an accuracy of bandgap prediction of 0.2 eV RMSE, which is well within the uncertainty of traditional extraction of bandgaps. Building upon these models we will present a one million experimental materials sample image dataset with complete data lineage and UV-Vis characterization. We will discuss methods and challenges in predicting optical properties from composition featurizers and chart pathways to autonomous experiment planning using state of the art visualization tools.