Kaggle has a challenge based on this data that is "asking for your help to 
develop text and data mining tools that can help the medical community develop 
answers to high priority scientific questions" [1]. There are some medical 
professionals, including at least one epidemiologist, that have weighed in on 
the discussion boards. I think submissions have to be in one Kaggle's notebooks 
format [2] but there are ideas and approaches posted outside of this.

art
---
1. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
2. https://www.kaggle.com/docs/notebooks
Notebooks Documentation | Kaggle<https://www.kaggle.com/docs/notebooks>
Explore and run machine learning code with Kaggle Notebooks, a cloud 
computational environment that enables reproducible and collaborative analysis
www.kaggle.com
[https://storage.googleapis.com/kaggle-datasets-images/551982/1008364/5cde9da345ce9deab89b6dfdfc201c49/dataset-card.png?t=2020-03-14-01-34-32]<https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge>
COVID-19 Open Research Dataset Challenge (CORD-19) | 
Kaggle<https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge>
An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House
www.kaggle.com


________________________________
From: Code for Libraries <[email protected]> on behalf of Eric Lease 
Morgan <[email protected]>
Sent: Friday, March 20, 2020 11:40 AM
To: [email protected] <[email protected]>
Subject: [CODE4LIB] public health or medical research

Do you know of any researcher or scholar in the realm of public health or 
medicine that may need/want to read the flood of scholarship being generated by 
Covid-19?

As you may or may not know, the Distant Reader is designed to read large 
amounts of narrative texts, such as scholarly journal articles. The Gates 
Foundation, the Allen Institute for AI, and their friends have made freely 
available a data set of 13,000 full text scholarly articles on the topic of 
covid-19. [1]

I have downloaded the data set and fed it to the Reader, and the initial 
results are here:

 https://carrels.distantreader.org/library/covid-19/

The results are okay, but they can be improved in a number of ways. For 
example, I can easily create a full text (Solr) index to the data set. I can 
create a network diagram illustrating the relationship of a given word to other 
nearby words. I could apply various types of machine learning to the Reader's 
output, such as topic modeling and classification, to look for patterns and 
anomalies.

To do some of these things additional resources may be needed, such as data 
processing power, data visualization skills, as well as some cyber 
infrastructure. I have been in touch with my XSEDE colleagues at IU, and they 
seem more than amenable to help, but the whole thing would be GREATLY improved 
and MUCH MORE relevant if we were working with somebody who has specific 
questions to answer -- somebody from the fields of public health, medicine, etc.

Do you know the names of anybody in public health, medicine, or some other 
discipline who might want to read -- use & understand -- the literature being 
generated?

Be safe.

[1] data set - https://pages.semanticscholar.org/coronavirus-research

--
Eric Morgan
University of Notre Dame

Reply via email to