The UCSF Industry Documents Library is seeking applicants for a summer data 
science fellowship, to assess the impact of transcription accuracy on text 
analysis of digital archives. This Senior Data Fellow position will take a 
leading role in a collaborative project under the supervision of staff from the 
Industry Documents Library and the UCSF Library Data Science Initiative.

Senior Data Science Fellow

Library overview

The UCSF Industry Documents Library (IDL) is a digital archive of more than 15 
million documents created by industries which impact public health. It contains 
previously internal records from the tobacco, opioid, drug, chemical, food, and 
fossil fuel industries. Since the IDL was established in 2002, the documents 
have been used by researchers, journalists, lawyers, policymakers, community 
advocates, and others in more than 1,000 publications. These publications have 
supported significant scientific and investigative research that has 
facilitated efforts to reduce smoking and related diseases, saving millions of 
lives worldwide.

The IDL also contains thousands of audiovisual materials, including recordings 
of internal focus groups and corporate meetings, depositions of tobacco 
industry employees, Congressional hearings, and radio and TV cigarette 
advertisements.

The UCSF Library Data Science Initiative (DSI) serves as a campus hub for 
education and support in data science. Its mission is to build computational 
and data skills in the UCSF community by providing education and resources to 
trainees, faculty, and staff.

Fellowship overview

This fellowship will support a project to compare human-evaluated transcripts 
with computer generated transcripts for text and audiovisual materials in the 
IDL collections. Through tagging, human transcription, and computer-generated 
transcription, the team will assess how accuracy may differ between media or 
document types, and how and whether this difference is more or less pronounced 
in certain categories.

Through the identification of transcript accuracy in different media types in 
the collections, we will attempt to provide guidelines to researchers and 
technical staff for proper analysis, measurement, and reporting of transcript 
accuracy when working with digital media.

Position overview

The Senior Data Science Fellow will:

Assist with designing the project, including gathering project requirements and 
needs
Provide guidance to two Junior Data Science Fellows
Tag videos with a pre-defined list of categories
Review text extracted from video with Google Auto ML
Run Uberi/Speech Recognition programs on videos in the archive to extract text
Run sentiment analysis and/or topic extraction on the text extracted from videos
Study the sentiment/topics produced by Google Auto ML and Uberi in each 
category of video and gather statics

What you will be learning

Natural Language Processing (NLP) tools in the areas of speech to text, 
sentiment analysis, and topic modeling
Design and carry out a case study and present finding
Digital archival methods and practices
Participate in staff meetings
Attend data science workshops and classes (in-person and virtual options 
available)
Receive mentorship and training from data scientists, programmers, and 
librarians from the Data Science Initiative and Industry Documents Library

Who we are looking for

Must be enrolled in a degree/license program in a 2 or 4 year institution, 
graduate school, vocational school, etc
Interest in digital curation and collection building for libraries and archives
Two years or more of programming knowledge/experience preferred
Proficiency in one of the following programming languages preferred: Python, R, 
Java
Familiarity with Natural Language Processing (NLP) tools preferred
Excellent analytical and writing skills
High level of accuracy and attention to detail
Ability to work independently

Compensation and work environment

This position is fully remote and includes a total of up to 160 hours 
(approximately 20 hours/week for 8 weeks). These work hours are flexible and 
can be arranged to suit student schedules and course requirements. The Fellow 
should ideally be available to start by June 1, 2022.

Fellows are paid at least a minimum SF wage, currently $16.32 an hour.

How to apply

Please email a cover letter, contact information for two references, and resume 
to Kate Tasker, Industry Documents Library Managing Archivist, at 
kate.tas...@ucsf.edu. The position is considered open until filled.


----
Brought to you by code4lib jobs: 
https://jobs.code4lib.org/jobs/52633-senior-data-science-fellow-for-summer-2022

Reply via email to