Bold claim: Galaxy Zoo shows how everyday curiosity can unlock the cosmos, turning countless eyes into a powerful engine for scientific discovery. And this is the part most people miss: citizen scientists can reveal rare phenomena and train smarter AI, accelerating our understanding of the universe without waiting for slower expert-only workflows.
Cosmic Dawn at the Galaxy Zoo
Title: Galaxy Zoo: Cosmic Dawn – morphological classifications for over 41,000 galaxies in the Euclid Deep Field North from the Hawaii Two-0 Cosmic Dawn survey (http://arxiv.org/abs/2509.22311)
Authors: James Pearson, Hugh Dickinson, Stephen Serjeant, Mike Walmsley, Lucy Fortson, Sandor Kruk, Karen L. Masters, Brooke D. Simmons, R. J. Smethurst, Chris Lintott, Lukas Zalesky, Conor McPartland, John R. Weaver, Sune Toft, Dave Sanders, Nima Chartab, Henry Joy McCracken, Bahram Mobasher, Istvan Szapudi, Noah East, Wynne Turner, Matthew Malkan, William J. Pearson, Tomotsugu Goto, Nagisa Oi
First Author’s Institution: School of Physical Sciences, The Open University, Milton Keynes, MK7 6AA, UK
Status: Submitted to MNRAS [open access]
There are many ways to study a galaxy, but one of the most important—and perhaps the most straightforward—is simply to observe it. In scientific terms, that means analyzing a galaxy’s visual morphology. A galaxy’s shape and structure tell a story about its growth and evolution: the formation of disks and spiral patterns, the buildup of central bulges and bars, the activity of the central supermassive black hole, and the history of mergers. Rare features like gravitational lenses also reveal themselves through morphology.
With modern sky surveys collecting images of millions of galaxies (think Euclid and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time), it’s impractical for astronomers to inspect every galaxy by hand. This is where Galaxy Zoo comes in: citizen scientists worldwide help classify galaxies, lightening the workload and expanding observational reach.
In Galaxy Zoo: Cosmic Dawn, the public was invited to classify galaxies in Hyper Suprime-Cam (HSC) images from the multiwavelength Cosmic Dawn survey. The online project launched in October 2022 and ran for six months. Tens of thousands of volunteers contributed over four million galaxy classifications. These classifications are valuable on their own for studying galaxy evolution and can also train machine learning models to rapidly classify galaxies in upcoming surveys.
How Galaxy Zoo works
Volunteers are shown color images of individual galaxies and asked a sequence of multiple-choice questions. The first question asks whether the galaxy is merely smooth and featureless or if a disk is present. Subsequent questions become more specific, covering aspects like the number of spiral arms, the size of the central bulge, and whether the galaxy shows signs of merging or disturbance. Figure 1 in the original study maps the complete decision tree.
Data and phased classification
Galaxy Zoo: Cosmic Dawn started with a total of 47,347 galaxy images. These were split into two phases. In Phase 1, 16,671 images were released to volunteers. An image needed input from 40 volunteers to be considered fully classified and retired from further human classifications.
Phase 1 classifications were then used to fine-tune Zoobot, Galaxy Zoo’s deep learning model. In Phase 2, the remaining 30,676 images were classified collaboratively by volunteers and Zoobot.
Unexpected discoveries and insights
Remarkably, volunteers identified 51 gravitational lenses—rare objects that allow us to observe much farther or smaller structures than usual. These discoveries are exciting because they open windows to studying distant and faint objects.
Even non-object classifications provided valuable information. When volunteers selected the “Non-star Artifact” option, they specified whether the image showed a Saturation Feature (Bleed Trail), Diffraction Spike, Satellite Trail, Cosmic Ray, Scattered Light, or Other/Not Sure. The study found that most faulty images came from scattered light or saturation features, with few instances of other artifact types. This helps оценить which artifacts automated image processing handles well and where improvements are still needed.
Machine learning and human collaboration
Volunteer labels are used to train machine learning models to classify astronomical images faster and more accurately. Zoobot had already been trained on prior Galaxy Zoo data, but the Phase 1 classifications from Cosmic Dawn further refined its capabilities for deep imaging.
In Phase 2, Zoobot classified images alongside volunteers. For each image, Zoobot predicted the fraction of volunteers likely to choose each option following the decision tree. If Zoobot predicted less than 20% would select “Features or Disk” at the initial step, the image was retired as classified, letting volunteers focus on the more intriguing cases.
Weekly updates and efficiency gains
During Phase 2, Zoobot was retrained weekly on new volunteer classifications and continued to retire simpler galaxies. This iterative approach sped up the overall classification process by nearly a factor of three compared with Phase 1, where only volunteers did the work.
Limitations and future directions
Zoobot performed well in predicting common volunteer responses (for images with at least five responses, its predictions fell within about 0.12 sigma of actual results). However, it struggled with rarer outcomes because those cases lacked sufficient training examples. Subjective questions, such as judging the exact shapes of edge-on bulges or the tightness of spiral arms, also posed challenges for the model. The final question, which asked volunteers to identify any rare features, allowed multiple selections and was thus not predicted by Zoobot.
The bigger picture: citizen science meets machine learning
Projects like Galaxy Zoo: Cosmic Dawn demonstrate that citizen science can produce high-quality classifications and reveal rare objects, while machine learning models can scale those insights to datasets that would be infeasible to analyze manually. The two approaches complement each other, paving the way for handling the ever-growing volumes of data from next-generation surveys.
Astrobite edited by Niloofar Sharei
Featured image credit: Galaxy Zoo team
I’m a second-year Astronomy & Astrophysics PhD student at the University of California, Santa Cruz. I’m interested in applying machine learning to telescope surveys to explore a range of topics in extragalactic astronomy. Beyond research, I enjoy science outreach and journalism, photography, archery, and outdoor activities.
View all posts (https://astrobites.org/author/auppal/)
Would you like this rewritten version to lean more toward a technical summary suitable for researchers, or keep the accessible, reader-friendly tone aimed at general audiences? Also, should the piece include inline citations or references to specific figures and links as in the original?