
Apple recently concluded a two-day workshop focused on advancements in natural language processing (NLP), held on May 15-16, 2025. The event featured presentations from leading researchers and industry experts discussing cutting-edge research and applications in the field. Participating institutions included the University of Oxford, MIT, Harvard University, Stanford University, and Princeton University, alongside industry giants like Microsoft and Google.
The workshop, formally titled the Workshop on Natural Language and Interactive Systems 2025, emphasized three significant research areas within NLP. Among the prominent speakers was Yarin Gal, an associate professor at the University of Oxford and Director of Research at the UK AI Security Institute. His first presentation, titled “AI Model Collapse,” examined the limitations of using the internet as a sustainable data source for training large language models (LLMs). Gal warned that the increasing reliance on model-generated content could compromise the knowledge and reasoning capabilities of these systems. He advocated for the development of new tools to differentiate between human and AI-generated content, alongside enhanced regulations and research into the societal impacts of LLMs.
In his second presentation, “Detecting LLM Hallucinations,” Gal proposed an innovative method for measuring the confidence levels of LLM outputs. The approach involves generating multiple answers to a query and then clustering these responses based on semantic meaning. This technique aims to achieve a more accurate assessment of the LLM’s certainty and correctness, particularly useful for longer conversational contexts.
Another notable presentation came from Kevin Chen, a researcher at Apple, who discussed a new agent trained using a method known as Leave-one-out proximal policy optimization, or LOOP. This agent is designed to execute multi-step tasks based on user prompts. Chen illustrated that traditional methods might falter when tasks involve complex dependencies. In contrast, the LOOP method enables the agent to learn from its previous interactions, thus reducing errors and improving performance. Although the agent was trained on 24 different scenarios, it still faces challenges with multi-turn user interactions.
Additionally, Irina Belousova, an Engineering Manager at Apple, presented on speculative decoding, a technique that allows smaller models to generate high-quality answers comparable to those produced by larger models. This process involves the smaller model creating candidate sequences of answers that are subsequently evaluated by a larger model. If the larger model accepts the answer, the process concludes, resulting in reduced memory usage and faster performance. Belousova emphasized that this framework simplifies deployment by minimizing the complexities associated with managing multiple models during inference.
The workshop provided an engaging platform for researchers and practitioners to exchange ideas and insights about the future of NLP. Apple has made recordings of the presentations available online, allowing broader access to the studies discussed. Those interested in exploring the full range of topics and presentations can find videos and papers linked in the company’s recent publication.
As the field of NLP continues to evolve, events like Apple’s workshop play a crucial role in fostering collaboration and innovation among leading minds in technology and research.