Dear colleagues, 

We are pleased to invite you to join the NLP and vision seminars (NLPV). This 
is a talk series supported by the Institute for Data Science and Artificial 
Intelligence (IDSAI) at the University of Exeter. We update talks and their 
recordings on the website: 
https://sites.google.com/view/neurocognit-lang-viz-group/seminars 
The recent two talks are listed below:

Talk 1
12 June 2025, 3-4pm: Prof Yulan He at King's College London
Zoom link: 
https://Universityofexeter.zoom.us/j/94376040685?pwd=UvLl8gKZnVoa4PNpv40jThQSjZfgbJ.1
 (Meeting ID: 943 7604 0685 Password: 582538)

Title: LLMs Need a Bayesian Meta-Reasoning Framework for More Robust and 
Generalisable Reasoning
Abstract: While Large Language Models (LLMs) excel at many reasoning tasks, 
they still struggle with robustness, cross-task generalisation, and efficient 
scaling. Current training approaches, such as next-token prediction, 
reinforcement learning, and prompt optimisation, offer performance improvements 
but often lack adaptability across diverse tasks. In this talk, I will advocate 
for a transformative shift in how LLMs approach reasoning by introducing a 
Bayesian meta-reasoning framework that equips them with self-awareness, 
monitoring, evaluation, regulation, and meta-reflection. This framework aims to 
enhance LLMs' ability to refine reasoning strategies and generalise more 
effectively. I will discuss key challenges in current approaches and outline 
future directions for developing more adaptable and generalisable LLMs.
Speaker's short bio: Yulan He is a Professor in Natural Language Processing at 
the Department of Informatics in King’s College London. She is currently 
holding a prestigious 5-year UKRI Turing AI Fellowship. Her recent research 
focused on addressing the limitations of Large Language Models (LLMs), aiming 
to enhance their reasoning capabilities, robustness, and explainability. She 
has published nearly 300 papers on topics such as self-evolution of LLMs, 
mechanistic interpretability, and LLM for educational assessment. She received 
several prizes and awards for her research, including an SWSA Ten-Year Award, a 
CIKM Test-of-Time Award, and was recognised as an inaugural Highly Ranked 
Scholar by ScholarGPS.

Talk 2
26 June 2025, 3-4pm: Dr Tanja Samardžić at the University of Zurich
Zoom link: 
https://Universityofexeter.zoom.us/j/97673721568?pwd=6IAO875zCe1GjYsFN2vig8U1bYGSdR.1
 (Meeting ID: 976 7372 1568 Password: 548777)

Title: Understanding text tokenisation across diverse languages
Abstract: Large language models (LLMs) are commonly said to be trained on raw 
text, but, in fact, they are trained on tokenised text, segmented into small, 
subword units (tokens). For example, a popular tokenizer splits the word 
'plausible' into 'pl aus ible'. Somewhat surprisingly, general data compression 
algorithms such as BPE turned out more efficient than linguistic subword 
analyses when used as tokenizers in the context of training LLMs. These 
algorithms seem to find recurrent patterns that are well exploited by neural 
networks despite being linguistically incorrect. In this talk, I will show 
that, despite the perceived linguistic inadequacy, structural features of 
languages can be extracted from text data by tracking the text compression 
steps in subword tokenisation. I will present several studies showing that 
exploiting these features more directly allows us improve cross-lingual 
transfer and multilingual fairness of pre-trained LLMs. 
Speaker's short bio: Dr Tanja Samardžić is a researcher in multilingual text 
processing with a background in language theory and machine learning. She 
teaches Natural Language Processing (NLP) at the University of Geneva (as a 
senior deputy lecturer) and collaborate on research projects of the NLP Group 
at IDSIA. She is one of the co-founders of ReLDI Centre Belgrade. She holds a 
PhD in Computational linguistics from the University of Geneva, where she 
studied in the group Computational Learning and Computational Linguistics 
(CLCL). After the PhD, She was the Head of the Text Group and a lab director 
(alternating) of the Language and Space Lab at the University of Zurich 
(2013-2024), a Visiting Scholar at the University of Cambridge (2024) and a 
Visiting Researcher at the IT University Copenhagen (2022). She is committed to 
promoting and facilitating the use of computational approaches in the study of 
language. Currently, she serves as a ACL Rolling Review Senior Area Chair, a 
UniDive COST Action Managing Committee Member and an External Governing Board 
Member of the SMASH Postdoctoral Program.

Joining our *Google group* for future seminars and research information: 
https://groups.google.com/g/neurocognition-language-and-vision-processing-group 

Best regards,
Hang Dong (on behalf of Aline Villaviencio, Rodrigo Wilkens, Yanda Meng)
Lecturer in Computer Science
University of Exeter
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to