6 December, 2025
ai-revolutionizes-georeferencing-in-natural-history-archives

A recent study from researchers at the University of North Carolina at Chapel Hill demonstrates that advanced artificial intelligence tools, specifically large language models (LLMs), can significantly enhance the process of georeferencing plant specimens. This crucial task, which involves pinpointing the original collection locations of specimens, has typically been labor-intensive and time-consuming. The findings reveal that LLMs can perform this work with near-human accuracy while being faster and more cost-effective.

Yuyang Xie, the first author and a postdoctoral researcher in the biology department at UNC, emphasized the importance of this breakthrough. “Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections,” Xie stated. The research aims to address whether AI can automate one of the most tedious steps in digitizing natural history collections, and the results indicate a resounding yes.

The Carolina team discovered that LLMs could complete georeferencing tasks with an error margin of less than 10 kilometers, surpassing traditional methods in both accuracy and efficiency. “Recent advances in LLMs can potentially transform the georeferencing process,” noted Xiao Feng, the corresponding author and an assistant professor in the biology department at UNC. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”

The implications of these advancements are profound. There are an estimated 2–3 billion herbarium specimens worldwide, yet only a small fraction have been digitized. Without digital records and spatial data, researchers face significant challenges in tracking biodiversity loss, understanding species movements in the context of climate change, and analyzing shifts in ecosystems. By implementing AI-powered georeferencing, scientists could soon digitize vast natural history collections that have remained largely inaccessible.

“This technology allows us to unlock millions of records that are currently sitting in cabinets,” Xie added. The application of LLMs enables rapid digitization of plant specimen data, which will be vital for addressing global environmental challenges.

Traditionally, georeferencing has relied on manual interpretation, specialized software, or multiple rounds of expert review. The UNC study is among the first to demonstrate that LLMs can outperform existing methods in terms of accuracy, efficiency, and scalability. This innovative approach paves the way for digitizing natural history collections at an unprecedented speed.

The full research paper is available online in Nature Plants, showcasing the potential for AI to revolutionize access to important biological data. As technology advances, the possibility of rapidly digitizing and analyzing previously inaccessible collections becomes a reality, offering new avenues for ecological research and conservation efforts.