Re: [CODE4LIB] ethics of screenscraping library opacs?

Kyle Banerjee Mon, 29 Nov 2021 09:33:58 -0800

Regarding Z39.50, they may have SRU enabled which would make this really
easy. Even if searches have to be passed through the UI, Primo interfaces
allow you to retrieve a structured PNX record via GET query parameter. With
the small number of searches, the harvest plan should work well.


kyle

On Mon, Nov 29, 2021 at 8:58 AM Tod Olson <t...@uchicago.edu> wrote:

> In that case, you might also try ISBN search via Z39.50, and ask for an
> OPAC record to be returned. The call number comes back in a very
> predictable place, so it might be less trouble than screen scraping, and
> easier to adapt to different target libraries.
>
> -Tod
>
> > On Nov 29, 2021, at 10:50 AM, M Belvadi <mbelv...@gmail.com> wrote:
> >
> > Thank you all. I perhaps should have clarified that I am NOT trying to
> get
> > a dump of a library's entire holdings. I just want to look for LC Call
> > numbers for a few specific ISBNs, if they own that book with an LC call
> > number.
> > So I don't need the entire MARC record, and a dump would be incredibly
> > inefficient and almost always out of date, as often the books I'm looking
> > for are fairly new.
> >
> > I am already using OCLC's Classify service, and I always check that first
> > before trying any other site, but I find that that only has about 19 out
> of
> > 20 that I look for, and when I look for the rest manually on various
> opacs,
> > including UC's, I can often find about 50% of the missing ones. So I'm
> just
> > trying to do that programmatically with BeautifulSoup.  I am also using
> > Harvard's API as second choice to the ones that OCLC misses, but that
> > almost never has the ones that OCLC didn't have, so I need more places to
> > look and those are the only public APIs I've been able to find that have
> > any chance of providing LC Calll Numbers (eg openlibrary and google apis
> > have other metadata but not call numbers).
> >
> > FYI, it seems to me that the new UC Library Search, when limited to the
> > catalog which is what I want, is Alma underneath.
> >
> > And further FYI, the reason I'm doing this is I'm attempting to write a
> > python program that can take a COUNTER R5 book report and add to it LC
> call
> > numbers to make it easier for librarians looking especially at the B1
> (use)
> > and B2 (turnaways) data to be able to quickly group the usage by
> "subject"
> > since no kind of subject classification is included in the COUNTER
> standard.
> >
> > When I have it completed, I will share it freely on Github, so I want to
> > make sure I'm doing nothing furtive, but only touching servers whose
> owners
> > wouldn't be upset to find themselves included in my code.
> >
> > Melissa Belvadi
> > mbelv...@gmail.com
> >
> >
> > On Mon, Nov 29, 2021 at 9:23 AM Eric Lease Morgan <emor...@nd.edu>
> wrote:
> >
> >> On Nov 28, 2021, at 6:01 AM, Peter Velikonja <pe...@koios.co> wrote:
> >>
> >>> As Kyle mentioned, a screenscraping method is inefficient and will
> >>> get you incomplete results.  As a vendor to public libraries, I
> routinely
> >>> request (and receive) MARC dumps.  Some libraries are better than
> >>> others at pulling these from their ILS, but records based on MARC come
> >>> from the Library of Congress and are therefore public information -- to
> >>> which you are entitled if you reside in the US.  A number of libraries
> >>> make dumps available through various Open Data initiatives -- spotty
> but
> >>> can be useful.  Screenscraping can be good for spot-checking, but if
> >>> you want a complete catalog, working with an ILS administrator is, in
> >>> my view, a better path.
> >>
> >>
> >> I concur. See if you can get an MARC dump. If you are seeking the
> >> bibliographic information, then this probably the most complete,
> accurate,
> >> and efficient. --Eric Morgan
> >>
>

Re: [CODE4LIB] ethics of screenscraping library opacs?

Reply via email to