02

2026

-

07

The National Astronomical Observatories has released the StarCLR time-domain large model.

Author:


Recently, the National Astronomical Observatories of the Chinese Academy of Sciences unveiled StarCLR, a time-domain astrophysics foundation model designed for light curves, enabling light curves from different surveys to “speak the same language.” By leveraging contrastive learning and large-scale unsupervised pretraining, the model extracts more robust, intrinsic temporal features from data with varying sampling rates, observation durations, and wavelength bands. It achieves micro‑average F1 scores ranging from 92% to 99% on classification tasks spanning nearly 30 classes of variable stars. After pretraining on TESS data, StarCLR can be effectively transferred to Gaia and ZTF datasets, demonstrating strong cross‑survey generalization capabilities and offering a new technical pathway for unified modeling and joint analysis of multi‑source time-domain observations, as well as for processing future large‑scale survey data.

Light curves record the temporal variations in a celestial object’s brightness and serve as crucial observational evidence for identifying variable star types, studying stellar and binary evolution, mapping the structure of the Milky Way, and establishing the cosmic distance scale. However, different surveys differ markedly in observing wavelength bands, sampling cadence, time coverage, and photometric precision. Consequently, the light curves of the same variable star often exhibit distinct observational signatures across different surveys, rendering conventional classification models trained on single datasets difficult to apply directly.

To address this challenge, the research team proposed the StarCLR model, which focuses on achieving a unified representation of irregularly sampled light curves. Drawing inspiration from representation-learning principles in foundational AI models, the model employs contrastive learning to map light curves with similar physical properties—under different observational conditions—into a shared feature space. Analogous to how natural-language models can align semantically equivalent meanings across languages within a common embedding space, StarCLR seeks to disentangle intrinsic astrophysical variability from observational factors such as survey sampling strategies and instrument responses, thereby yielding time-series representations that are more transferable.

Unlike traditional models that are trained from scratch on a single classification task, StarCLR first performs pre-training on large-scale unlabeled light curves and then transfers the learned general-purpose representations to various downstream tasks. The research team completed model pre-training on TESS data and conducted cross‑survey validation on the Gaia and ZTF datasets. In a classification task spanning nearly 30 classes of variable stars, StarCLR achieved micro‑average F1 scores ranging from 92% to 99% across different datasets and experimental settings, not only enabling fine-grained differentiation among variable star types with similar light curve morphologies but also demonstrating strong cross‑survey generalization capabilities. These results indicate that StarCLR learns not merely the sampling patterns specific to a particular survey, but rather captures, to some extent, the more stable and intrinsic temporal structures inherent in stellar light curves. Leveraging this universal representation, future analyses involving new survey data could potentially reduce the costs associated with repeatedly constructing feature pipelines and retraining models, thereby enhancing the efficiency of collaborative multi‑source time-domain data analysis.

Beyond variable-star classification, StarCLR’s representations can be further extended to the discovery of anomalous celestial objects, the retrieval of similar light curves, the identification of rare objects from small samples, and other time-domain astronomy tasks. As surveys such as LSST, Mozi, and Mengfei continue to generate vast amounts of observational data, time-domain astronomy is transitioning from algorithm development tailored to single‑purpose tasks toward research on transferable, scalable foundation models. StarCLR offers a new avenue for interconnecting diverse survey observatory systems and lays the technical groundwork for the automated processing and scientific discovery of next‑generation time-domain astronomical data.

The relevant findings have been published in The Astrophysical Journal (ApJ). This work was carried out through a collaboration between the National Astronomical Observatories of the Chinese Academy of Sciences and Zhijiang Laboratory, with support from the Artificial Intelligence Promotion Committee of the National Astronomical Observatories. Ding Junyao, a graduate student jointly trained by the National Astronomical Observatories of the Chinese Academy of Sciences and Tibet University, is the first author of the paper, while Researcher Chen Xiaodian serves as the corresponding author.

Source: National Astronomical Observatories