I think the key word here that will help you is biocuration and it's an 
established field involving people with scientific, computational, and 
linguistic backgrounds who are familiar with the problem space so I would 
suggest talking to people working in this area first to get an idea of what's 
feasible, what's already out there, etc., as they will know this better than 
the Postgres community.

You can see an example of the sort of annotation that is fully automated at the 
moment here:

https://monarchinitiative.org/tools/text-annotate

Given the potential impact on human health, some level of manual involvement in 
annotation is frequently part of the workflow.

Daniel

-----Original Message-----
From: Achilleas Mantzios <ach...@matrix.gatewaynet.com> 
Sent: 05 June 2021 10:49
To: pgsql-general@lists.postgresql.org
Subject: Ideas for building a system that parses medical research 
publications/articles [EXT]

Hello

I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store metadata 
for this paper ,some kind of global ID if applicable, authors, areas of 
research, whether the paper is "new", "highlighted", "historical", type (e.g. 
Case reports, Clinical trials), symptoms (e.g. 
tics, GI pain, psychological changes, anxiety, ), and other key attributes (I 
guess dynamic), it must be full text searchable, etc.

I am at the very beginning in this and it is done on a fully volunteer basis.

Lots of questions : is there any scientific/scholar analysis software already 
available? If yes and is really good and open source , then this will influence 
the rest of decisions. Otherwise , I'll have to form a team that can write one, 
in this case I'll have to decide DB, language, etc. I work 20 years with pgsql 
so it is the natural choice for any kind of data, I just ask this for the sake 
of completeness.

All ideas welcome.







-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Reply via email to