RE: Ideas for building a system that parses medical research publications/articles [EXT]

2021-06-07 Thread Daniel Perrett
I think the key word here that will help you is biocuration and it's an established field involving people with scientific, computational, and linguistic backgrounds who are familiar with the problem space so I would suggest talking to people working in this area first to get an idea of what's

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Achilleas Mantzios
Στις 5/6/21 10:12 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 10:39 AM, Achilleas Mantzios wrote: Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 9:56 AM, Achilleas Mantzios wrote: Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hell

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Adrian Klaver
On 6/5/21 10:39 AM, Achilleas Mantzios wrote: Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 9:56 AM, Achilleas Mantzios wrote: Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Achilleas Mantzios
Στις 5/6/21 8:03 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 9:56 AM, Achilleas Mantzios wrote: Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from various sources (web/files/etc) and in vario

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Adrian Klaver
On 6/5/21 9:56 AM, Achilleas Mantzios wrote: Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from various sources (web/files/etc) and in various formats (text, pdf, etc) and can store metadata

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Achilleas Mantzios
Στις 5/6/21 4:45 μ.μ., ο/η Vijaykumar Jain έγραψε: http://tika.apache.org/ I checked, it behaves better with downloaded PDF rather than URL PDFs, in the 2nd case the metadata are poor. Does not work with nih articles (but this is general problem not tika's ) To get

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Achilleas Mantzios
Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε: On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from various sources (web/files/etc) and in various formats (text, pdf, etc) and can store metadata for this paper ,some kind of global ID if app

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Adrian Klaver
On 6/5/21 2:49 AM, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from various sources (web/files/etc) and in various formats (text, pdf, etc) and can store metadata for this paper ,some kind of global ID if applicable, authors, areas of research, whether the pa

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Vijaykumar Jain
http://tika.apache.org/ To get started with collecting doc metadata. It looks this tool can help you started. postgres does support fuzzy text search, so I do think dumping meta data /abstract in postgresql and then using trigram tsearch etc like extensions it should work well for a POC. this bein

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Laura Smith
Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Saturday, 5 June 2021 12:14, Achilleas Mantzios wrote: > > I know its a huge work, but you are missing a point. Nobody wishes to > compete with anyone. This is a about a project, a parent-advocacy > non-profit that ONLY

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Achilleas Mantzios
Στις 5/6/21 1:52 μ.μ., ο/η Laura Smith έγραψε: ‐‐‐ Original Message ‐‐‐ On Saturday, 5 June 2021 10:49, Achilleas Mantzios wrote: Hello I am imagining a system that can parse papers from various sources (web/files/etc) and in various formats (text, pdf, etc) and can store metadata f

Re: Ideas for building a system that parses medical research publications/articles

2021-06-05 Thread Laura Smith
‐‐‐ Original Message ‐‐‐ On Saturday, 5 June 2021 10:49, Achilleas Mantzios wrote: > Hello > > I am imagining a system that can parse papers from various sources > (web/files/etc) and in various formats (text, pdf, etc) and can store > metadata for this paper ,some kind of global ID if