Re: [CODE4LIB] Comparing OCR output to dictionary

2021-09-03 Thread Sarah Swanz
This Jupyter notebook from the National Library of Scotland has a section on how to evaluate OCR accuracy under the Data Cleaning chapter. You might also check out the 'fastwer' package described in this article. I have not used myself so cannot attest to it. Sarah Swanz University o

Re: [CODE4LIB] Comparing OCR output to dictionary

2021-09-03 Thread Sarah Swanz
to Github repo with code) Sarah On Sep 3, 2021, 10:20 AM -0500, Sarah Swanz , wrote: > This Jupyter notebook from the National Library of Scotland has a section on > how to evaluate OCR accuracy under the Data Cleaning chapter. > > You might also check out the 'fastwer' pac

Re: [CODE4LIB] Any libraries/academic institutions using CKAN/Socrata?

2023-04-27 Thread Sarah Swanz
organization — not individual author(s) — is the authorial unit. Tl;dr: It means a lot less metadata about individual authors and complicated user authorization levels. There were some other minor flaws, but that did it for us. Sarah Swanz On Apr 27, 2023, 9:56 AM -0500, Jenna Jordan , wrote

Re: [CODE4LIB] click/save, click/save, click/save, etc

2024-02-15 Thread Sarah Swanz
Similarly, CrossRef requires that DOIs link to landing pages and not the content file directly. This is their rationale:  https://www.crossref.org/documentation/member-setup/creating-a-landing-page/ Sarah Swanz School of Information, MSc ’18 University of Michigan, Ann Arbor On Feb 15, 2024 at 2

Re: [CODE4LIB] data sets in multiple respositories

2024-03-11 Thread Sarah Swanz
;s a field for Alternative Identifiers. Publishing in 2 repos is not the same as publishing the same article in two journals. A better analogy is having a book in more than one library. Sarah Swanz Digital Humanities Librarian & Data Curator Washington University in St Louis On Mar 11, 2