And for going beyond the bibliographic citations to include abstracts as
well, https://grobid.readthedocs.io/en/latest/ might be useful. --Kevin
On 5/12/22 1:49 PM, Julia Bauder wrote:
Hi, Danielle,
Have you taken a look at https://text2bib.economics.utoronto.ca/ ? If it
works for you, that's likely to be one of the easiest methods to convert
the list into structured data.
Best,
Julia
_____________________________________________________
Julia Bauder
Social Studies and Data Services Librarian
Director, Data Analysis and Social Inquiry Lab
Grinnell College Libraries
1111 6th Ave.
Grinnell, IA 50112
On Thu, May 12, 2022 at 1:40 PM Danielle Reay <dr...@drew.edu> wrote:
Hello,
We have a faculty member looking to create a dataset from an annotated
bibliography she compiled. Right now it exists as a word file and as a pdf.
The entries are relatively structured with a citation and an abstract, but
the document is about 150 pages long with multiple entries per page. Rather
than manually copy and paste everything to create the spreadsheet/csv, I
wanted to ask for suggestions or approaches to doing this by either
scraping or extracting structured data from the pdf. Thanks very much in
advance!
Danielle Reay
Digital Scholarship Technology Manager
Drew University