Dear all,

this is the cfp for the first Workshop on Creating Interoperable Corpora of 
Historical Newspapers (PressMint-LREC2026) on 16 May 2026. see details below. 
Apologies for cross-posting!


------------


First Workshop on Creating Interoperable Corpora of Historical Newspapers 
(PressMint-LREC2026)

Call for Papers

Date: 16 May 2026, half-day workshop
Location: Palma de Mallorca, Spain
Submission Deadline: 1 March 2025
Submission link: https://softconf.com/lrec2026/PressMint/
Workshop Website: https://www.clarin.eu/PressMint-LREC2026

________________________________
Workshop description

Historical newspapers are of interest to historians and historical linguists, 
as well as to social and political scientists, ethnologists, anthropologists, 
media and communication scholars, and researchers in cultural studies. All of 
these are fields where contemporary digital resources, tools and methods (e.g. 
“distant reading”) are still underutilised. On the other hand, corpora of 
historical newspapers already exist for a number of languages and countries to 
a large extent, as they are out of copyright. Also, the images, and often OCR, 
are available through the national libraries. Also, in recent years these data 
started to be of big interest to the researchers since they preserve the 
historical, cultural, political, societal past. However, these corpora are not 
interoperable, which precludes methods for their comparison, as well as any 
translingual and transnational research, an especially important consideration, 
as statehood and nationhood are highly dynamic in Europe in the period to be 
covered by the project corpora. An initial joint attempt towards the creation 
of a corpus of historical newspapers from the beginning of 20. century on, is 
the CLARIN flagship project PressMint<https://www.clarin.eu/pressmint>. The 
project features data from 20 partners at the moment, aiming to develop a 
standard for interoperable resources of newspapers in diachronic timespans. The 
final goal is to provide structured and high quality multilingual data in a 
common format, with the same type of linguistic annotation that covers (at 
least partially) the same time period.

The workshop is supported by CLARIN

<https://research-and-innovation.ec.europa.eu/strategy/strategy-2020-2024/our-digital-future/european-research-infrastructures/eric_en>

 and the PressMint project.


Objective

The PressMint workshop aims to gather experts interested in creating, 
processing and analyzing interoperable corpora of historical data in general, 
but especially with a focus on newspapers. Another very important objective is 
to consider also the perspective of the communities who use historical data - 
their purposes, requirements, feedback.

We encourage the interested colleagues to present their work on both types of 
levels – national and pan-European; monolingual and multilingual as well as 
task-specific and multidisciplinary. We view this workshop as a venue to 
exchange research ideas and start collaboration on this topic.

The workshop will feature one invited speaker: Maud Ehrmann, EPFL, CH

We invite unpublished original work focusing on (but not exclusive to) on the 
following topics:

  *   compilation, annotation, visualisation and utilisation of historical 
newspaper corpora of the period relevant to PressMint (ideally around the start 
of the 20th century but not constrained by this period)
  *   harmonisation of the existing multilingual historical newspaper corpora 
that contain either synchronic or diachronic data, or both
  *   linking or comparing historical newspaper corpora with other datasets, 
including sources of structured knowledge, such as formal ontologies and LOD 
datasets
  *   enrichment of historical newspaper corpora (with e.g. sentiment 
annotation, etc.)
  *   machine translation of historical newspaper corpora
  *   employment of LLMs as stand alone tools or as parts of
 architectures for historical data processing, maintenance and knowledge 
deployment.
  *   various scenarios of usage of historical data

________________________________
Submission & Publication

We accept submission of long papers (from 6 to 8 pages), short papers (4 pages) 
and demo papers (4 pages) to be presented as a long or short oral presentation 
or poster presentations at the workshop. To support double-blind reviewing, all 
submissions must be fully anonymized and should be formatted according to the 
stylesheet available on the LREC 2026 
website<https://lrec2026.info/authors-kit/>. The papers of the workshop will be 
published in online proceedings.

At the time of submission, authors are also offered the opportunity to share 
related language resources with the community. All repository entries are 
linked to the LRE Map [https://lremap.elra.info/], which provides metadata for 
the resources.

Please note that the LREC style guide should be followed. The formatting 
guidelines can be found here: https://lrec2026.info/authors-kit/.


Important Dates

  *   Paper submission deadline:  1 March 2026
  *   Notification of acceptance: 15 March 2026
  *   Camera-ready paper: 30 March 2026
  *   Workshop date: TBA

________________________________
Organizing Committee

  *   Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of 
Sciences, PL
  *   Tanja Wissik, Austrian Academy of Sciences, AT
  *   Petya Osenova, Sofia University ”St. Kl. Ohridski” & Bulgarian Academy of 
Sciences, BG

To contact the organisers, please email 
[email protected]<mailto:[email protected]>.


Programme Committee (in alphabetical order)

  *   Tomaž Erjavec, Jožef Stefan Institute, SI
  *   Maria Gavriilidou, Institute for Language and Speech Processing, Athena 
Research Center, GR
  *   Normunds Grūzītis, University of Latvia, LV
  *   Matyáš Kopp, Faculty of Mathematics and Physics, Institute of Formal and 
Applied Linguistics. Charles University, CZ
  *   Taja Kuzman, Jožef Stefan Institute, SI
  *   Nikola Ljubešic, Jožef Stefan Institute, SI  ́
  *   Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of 
Sciences, PL
  *   Petya Osenova, Sofia University "St. Kl. Ohridski" and IICT-BAS, BG
  *   Adam Pawłowski, University of Wrocław, PL
  *   Stelios Piperidis, Athena Research Centre, GR
  *   German Rigau, HiTZ Basque Research Center for Language Technology, EHU, ES
  *   Claudia Resch, Austrian Academy of Sciences, AT
  *   Inguna Skadiņa, Institute of Mathematics and Computer Science, University 
of Latvia, LV
  *   Steinþór Steingrímsson, The Árni Magnússon Institute for Icelandic 
Studies, IS
  *   Tanja Wissik, Austrian Academy of Sciences, AT



Dr. Tanja Wissik
ACDH- Austrian Centre for Digital Humanities
Austrian Academy of Sciences
Bäckerstraße 13, A-1010 Vienna
E-mail: [email protected]
Tel: + 43 1 51581 - 2206
http://www.oeaw.ac.at/acdh/

CLARIN National Coordinator for Austria
https://www.clarin.eu/governance/national-coordinators-forum


Editor of the Journal of the Text Encoding Initiative

https://journals.openedition.org/jtei/<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjournals.openedition.org%2Fjtei%2F&data=05%7C01%7Cplarkin%40EBSCO.COM%7C85b527d5a90340939c0b08daff981d18%7C50fa36ca7dd344f19e3f1bf39a3963a5%7C0%7C0%7C638103326059981514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Zq8qC4KaE8I%2Foyl6qn6VBqOyXEcUge7c0N8afOjzJzc%3D&reserved=0>






_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to