Following on what others have said, I would suggest stepping back and asking 
what you're going to get from the screenscraped records... are you interested 
in content or design? Screenscrapes won't help you with content the way MARC 
records would since its just what's displayed and may not include all MARC 
fields (even if it should) and any comparison between libraries might see 
different MARC -> display field mappings. Are you studying holdings overlaps 
between libraries? Numbers of times a field is used in just one library? Or is 
it something for which the whole website is needed? We have some measures which 
attempt to shut down bots scraping ours too quickly or too deep (we're worried 
about search engine bots just trying to index every search result page).

As both the person responsible for how the catalog displays and having worked 
on occasion with screenscraped data, I can't imagine getting a ton of use out 
of a mass screenscrape compared to what I could do with the files.

Asking for extracts might be the way to go - though libraries can't necessarily 
share vended MARC records (the standard itself is open, much of the content 
created by other libraries is open, but vendors sometimes put restrictions on 
theirs. On a side note I sometimes wonder if it's less because they feel 
proprietary than that the records are often godawful in quality.). So if a 
library subscribes to 3 million ebooks and gets records for them, they may not 
be able to share most/all of those. But they may be happy to share the rest, 
give you a large sample, or point you to a way to download them. Most of our 
records are accessible via Z39.50, but I don't know if that really lends itself 
to this kind of mass search.

Your project sounds like you're looking for something interesting! I hope the 
suggestions in here are helpful in making it happen.

Ruth

My working day may not be your working day. Please don't feel obliged to reply 
to this e-mail outside of your normal working hours.

Ruth Kitchin Tillman
Sally W. Kalin Librarian for Technological Innovations
Assistant Librarian
Penn State University Libraries
Paterno Library 006
r...@psu.edu<mailto:r...@psu.edu>

she/her/hers

Reply via email to