I have begun working on a suite of software designed to enable a person to 
“read” the full text of hundreds (if not a thousand) articles from JSTOR 
simultaneously, and I call this software the JSTOR Workset Browser. [1]

Using JSTOR’s Data For Research service, it is possible for anybody to first 
search & browse the totality of JSTOR. [2] The reader is then able to create 
and download a “dataset” describing found items of interest. This dataset 
includes a citations.xml file. The Browser takes this citations.xml file as 
input and then: 1) harvests the content, 2) indexes it, 3) does some analysis 
against the content, 4) creates a few graphs illustrating characteristics of 
the dataset, and finally 5) generates a browsable “catalog” in the form of an 
HTML table. The table includes columns for things like authors, titles, dates 
as well as page lengths, number of words, and coefficients denoting the use of 
color words, “big” names, and “great” ideas. In the near future the Browser 
will support search as well as the generation of a report describing each 
reader-generated (curated) collection. You can see a number of collections 
created to date, including writings about Thoreau, E!
 merson, Dickinson, Longfellow, and Poe. [3]

Combined with similar tools designed to work against the HathiTrust and/or 
EEBO-TCP, the ultimate goal is to enable students and scholars to easily do 
research against massive amounts of content quickly and easily. [4, 5]

I’m looking for additional sample content. If you create a dataset from DFR, 
then send me the citations.xml file, and I will use it as input for the 
Browser. “Wanna play?”


[1] Browser on GitHub - http://bit.ly/jstor-workset-browser
[2] Data For Research - http://dfr.jstor.org
[3] sample collections - http://dh.crc.nd.edu/sandbox/jstor-workset-browser/
[4] HathiTrust Workset Browser - 
https://github.com/ericleasemorgan/HTRC-Workset-Browser
[5] EEBO-TCP Workset Browser - 
https://github.com/ericleasemorgan/EEBO-TCP-Workset-Browser


—
Eric Lease Morgan, Librarian

Reply via email to