Hi Christina,
You might consider the Harvard Open Metadata APIs, if they contain the works
you are interested in. Free API and unlicensed content (the Amazon APIs do come
with a license agreement/restrictions).
Doc here:
https://wiki.harvard.edu/confluence/display/LibraryStaffDoc/LibraryCloud
chine learning experiments on it, but alas, no time)..
https://emeritus.library.harvard.edu/open-metadata
--
Mark Watkins
Bookship (https://www.bookshipapp.com)
It does depend a bit on what kinds of "key terms" or "important words" you have
in mind, but I have had good luck with Google's NLP APIs. They free for small
numbers of queries (if memory serves in the thousands per day, but don't quote
me on it). It does a good job of identifying people, places
Adding to what others have said, an API will likely give you better (and
faster!) results than scraping. Not sure if the Harvard Library Data covers
what you need, but their data is available via API (or download), for free, in
a legal and rights-respecting manner. Amazing resource.
https://lib
We're hosting a CODEX Hackathon (our 3rd) at the MIT Media Lab on February
10-12, 2017. CODEX is a community of folks who want to imagine the future of
books and reading: programmers, designers, writers, librarians, publishers,
readers. All are welcome. It's the best intersecton of books and t
I have a recently released a bookclub - related app called Bookship, which
features the ability to scan a page of text from a book so users can post
quotes. (www.bookshipapp.com). So my use case is people taking pictures of
pages with their phone and OCR-ing it.
I extensively tested Tesseract (