Dear colleagues,

 

We are writing to invite your collaboration in a community-driven initiative
to develop annotation schemas for scientific process descriptions in
research articles. The effort is inspired by the spirit of schema.org
<https://schema.org/> , but focuses specifically on capturing experimental
and simulation workflows across scientific domains. The resulting schemas
will be openly published as templates in the Open Research Knowledge Graph
(ORKG,  <https://orkg.org/> https://orkg.org/) and will form the basis of a
paper planned for Nature Scientific Data <https://www.nature.com/sdata/> .

 

Motivation

Scientific papers describe complex processes-e.g., ALD and CVD in materials
science, PCR and CRISPR in molecular biology, tensile and fatigue testing in
engineering, leaching experiments in environmental science, RCTs and
cognitive tasks in psychology-using highly variable narrative text. This
variability makes it difficult to:

*       design consistent, interoperable annotation guidelines,
*       build cross-domain corpora of scientific methods,
*       compare and align experimental setups across papers, and
*       create FAIR, reusable metadata about how studies are actually
carried out.

Our goal is to define annotation schemas for these processes (inputs,
conditions, outputs, roles, and relations) and to populate them from
full-text articles. These schemas and resulting corpora are intended as
shared resources for corpus linguistics, NLP, scientific text mining, and
downstream applications.

 

Why Collaborate

We are seeking contributors who can:

*       provide collections of full-text articles (~50+) describing a
specific experimental or simulation process in their field,
*       offer expert feedback on automatically mined process schemas, or
*       run the schema-miner workflow themselves (with our support) and help
refine the resulting schema.

Individual or small-team participation is welcome, and co-authorship
opportunities are available depending on involvement.

A wide variety of processes can be included-thin-film deposition, synthetic
chemistry reactions, gene editing workflows, fatigue testing, soil leaching
experiments, drug dissolution assays, fMRI tasks, cognitive experiments, and
many more.
A broader (non-exhaustive) list is here:
<https://docs.google.com/document/d/1iyL1l9vCXhnQ0To7j79vlr-pW4JvPlQC95svygq
RDfg/edit>
https://docs.google.com/document/d/1iyL1l9vCXhnQ0To7j79vlr-pW4JvPlQC95svygqR
Dfg/edit

 

How to Participate

Please register your interest using this short form:
<https://forms.gle/9WEdouw4yMyNHcn19> https://forms.gle/9WEdouw4yMyNHcn19

 

We will notify selected contributors by January 31, 2026. Data collection
and schema mining will conclude by April 30, 2026, followed by manuscript
preparation.

 

We hope members of this community will consider contributing to this effort
to develop shared annotation schemas and corpora of scientific process
descriptions-a step toward more comparable, analyzable, and reusable
scientific text resources. Also please help us spread the word!

 

Best regards,
Jennifer D'Souza
TIB - Leibniz Information Centre for Science and Technology
(on behalf of the schema-miner coordination team)

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to