6 December, 2025
ai-streamlines-natural-history-archive-digitization-process

A recent study conducted by researchers at the University of North Carolina at Chapel Hill (UNC) has revealed that advanced artificial intelligence tools, particularly large language models (LLMs), can significantly enhance the speed and accuracy of georeferencing plant specimens. This process, which identifies the original locations where plant specimens were collected, has traditionally been labor-intensive, costly, and time-consuming. The study demonstrates that LLMs can perform this task with remarkable precision and efficiency.

According to Yuyang Xie, the first author and a postdoctoral researcher in the biology department at UNC, “Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections.” This research signals a pivotal moment in the digitization of natural history archives, as it shows that AI can automate one of the most arduous steps in the process.

The study set out to determine whether AI could effectively streamline the georeferencing process, and the findings were affirmative. LLMs achieved an error margin of less than 10 kilometers, surpassing the performance of conventional methods. Additionally, the AI completed the task in a fraction of the time and cost typically required.

Xiao Feng, the corresponding author and an assistant professor in the biology department at UNC, stated, “Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate.” This transformation opens up new avenues for researchers to enhance their understanding of global biodiversity distributions.

The implications of this research are profound. An estimated 2–3 billion herbarium specimens exist worldwide, yet only a small percentage have been digitized. Without digitized records and geospatial data, researchers encounter significant hurdles in tracking biodiversity loss, studying species migration due to climate change, and analyzing shifts in ecosystems. By implementing AI-driven georeferencing, scientists may soon have the capability to rapidly digitize extensive natural history collections that have been largely inaccessible.

“This technology allows us to unlock millions of records that are currently sitting in cabinets,” Xie added. The integration of LLMs can expedite the digitization of plant specimen data, which is crucial for addressing pressing global environmental challenges.

Traditional georeferencing methods rely heavily on human interpretation, specialized software, and multiple rounds of expert review. The UNC study is among the first to apply LLMs to this task, demonstrating their superiority in accuracy, efficiency, and scalability. This innovative approach paves the way for unprecedented speeds in the digitization of natural history collections.

The full research paper, detailing these findings, is published in Nature Plants and is available online for further reading. The study highlights how the power of AI can enhance the digitization of natural history archives, unlocking vital information that could inform future ecological research and conservation efforts.