The training is typically an apprenticeship under the senior cataloging librarians.
David Goodman, Ph.D, M.L.S. http://en.wikipedia.org/wiki/User_talk:DGG On Thu, Aug 13, 2009 at 1:48 AM, Samuel Klein<meta...@gmail.com> wrote: > DGG, I appreciate your points. Would we be so motivated by this > thread if it weren't a complex problem? > > The fact that all of this is quite new, and that there are so many > unknowns and gray areas, actually makes me consider it more likely > that a body of wikimedians, experienced with their own form of > large-scale authority file coordination, are in a position to say > something meaningful about how to achieve something similar for tens > of millions of metadata records. > >> OL rather than Wikimedia has the advantage that more of the people >> there understand the problems. > > In some areas that is certainly so. In others, Wikimedia communities > have useful recent experience. I hope that those who understand these > problems on both sides recognize the importance of sharing what they > know openly -- and showing others how to understand them as well. We > will not succeed as a global community if we say that this class of > problems can only be solved by the limited group of people with an MLS > and a few years of focused training. (how would you name the sort of > training you mean here, btw?) > > SJ > > > On Thu, Aug 13, 2009 at 12:57 AM, David Goodman<dgoodma...@gmail.com> wrote: >> Yann & Sam >> >> The problem is extraordinarily complex. A database of all "books" >> (and other media) ever published is beyond the joint capabilities of >> everyone interested. There are intermediate entities between "books" >> and "works", and important subordinate entities, such as "article" , >> "chapter" , and those like "poem" which could be at any of several >> levels. This is not a job for amateurs, unless they are prepared to >> first learn the actual standards of bibliographic description for >> different types of material, and to at least recognize the >> inter-relationships, and the many undefined areas. At research >> libraries, one allows a few years of training for a newcomer with just >> a MLS degree to work with a small subset of this. I have thirty years >> of experience in related areas of librarianship, and I know only >> enough to be aware of the problems. >> For an introduction to the current state of this, see >> http://www.rdaonline.org/constituencyreview/Phase1Chp17_11_2_08.pdf. >> >> The difficulty of merging the many thousands of partial correct and >> incorrect sources of available data typically requires the manual >> resolution of each of the tens of millions of instances. >> >> OL rather than Wikimedia has the advantage that more of the people >> there understand the problems. >> >> David Goodman, Ph.D, M.L.S. >> http://en.wikipedia.org/wiki/User_talk:DGG >> >> >> >> On Wed, Aug 12, 2009 at 1:15 PM, c<y...@forget-me.net> wrote: >>> Hello, >>> >>> This discussion is very interesting. I would like to make a summary, so >>> that we can go further. >>> >>> 1. A database of all books ever published is one of the thing still missing. >>> 2. This needs massive collaboration by thousands of volunteers, so a >>> wiki might be appropriate, however... >>> 3. The data needs a structured web site, not a plain wiki like Mediawiki. >>> 4. A big part of this data is already available, but scattered on >>> various databases, in various languages, with various protocols, etc. So >>> a big part of work needs as much database management knowledge as >>> librarian knowledge. >>> 5. What most missing in these existing databases (IMO) is information >>> about translations: nowhere there are a general database of translated >>> works, at least not in English and French. It is very difficult to find >>> if a translation exists for a given work. Wikisource has some of this >>> information with interwiki links between work and author pages, but for >>> a (very) small number of works and authors. >>> 6. It would be best not to duplicate work on several places. >>> >>> Personally I don't find OL very practical. May be I am too much used too >>> Mediawiki. ;oD >>> >>> We still need to create something, attractive to contributors and >>> readers alike. >>> >>> Yann >>> >>> Samuel Klein wrote: >>>>> This thread started out with a discussion of why it is so hard to >>>>> start new projects within the Wikimedia Foundation. My stance is >>>>> that projects like OpenStreetMap.org and OpenLibrary.org are doing >>>>> fine as they are, and there is no need to duplicate their effort >>>>> within the WMF. The example you gave was this: >>>> >>>> I agree that there's no point in duplicating existing functionality. >>>> The best solution is probably for OL to include this explicitly in >>>> their scope and add the necessary functionality. I suggested this on >>>> the OL mailing list in March. >>>> http://mail.archive.org/pipermail/ol-discuss/2009-March/000391.html >>>> >>>>>>>>>> *A wiki for book metadata, with an entry for every published >>>>>>>>>> work, statistics about its use and siblings, and discussion >>>>>>>>>> about its usefulness as a citation (a collaboration with >>>>>>>>>> OpenLibrary, merging WikiCite ideas) >>>>> To me, that sounds exactly as what OpenLibrary already does (or >>>>> could be doing in the near time), so why even set up a new project >>>>> that would collaborate with it? Later you added: >>>> >>>> However, this is not what OL or its wiki do now. And OL is not run by >>>> its community, the community helps support the work of a centrally >>>> directed group. So there is only so much I feel I can contribute to >>>> the project by making suggestions. The wiki built into the fiber of >>>> OL is intentionally not used for general discussion. >>>> >>>>> I was talking about the metadata for all books ever published, >>>>> including the Swedish translations of Mark Twain's works, which >>>>> are part of Mark Twain's bibliography, of the translator's >>>>> bibliography, of American literature, and of Swedish language >>>>> literature. In OpenLibrary all of these are contained in one >>>>> project. In Wikisource, they are split in one section for English >>>>> and another section for Swedish. That division makes sense for >>>>> the contents of the book, but not for the book metadata. >>>> >>>> This is a problem that Wikisource needs to address, regardless of >>>> where the OpenLibrary metadata goes. It is similar to the Wiktionary >>>> problem of wanting some content - the array of translations of a >>>> single definition - to exist in one place and be transcluded in each >>>> language. >>>> >>>>> Now you write: >>>>> >>>>>> However, the project I have in mind for OCR cleaning and >>>>>> translation needs to >>>>> That is a change of subject. That sounds just like what Wikisource >>>>> (or PGDP.net) is about. OCR cleaning is one thing, but it is an >>>>> entirely different thing to set up "a wiki for book metadata, with >>>>> an entry for every published work". So which of these two project >>>>> ideas are we talking about? >>>> >>>> They are closely related. >>>> >>>> There needs to be a global authority file for works -- a [set of] >>>> universal identifier[s] for a given work in order for wikisource (as >>>> it currently stands) to link the German translation of the English >>>> transcription of OCR of the 1998 photos of the 1572 Rotterdam Codex... >>>> to its metadata entry [or entries]. >>>> >>>> I would prefer for this authority file to be wiki-like, as the >>>> Wikipedia authority file is, so that it supports renames, merges, and >>>> splits with version history and minimal overhead; hence I wish to see >>>> a wiki for this sort of metadata. >>>> >>>> Currently OL does not quite provide this authority file, but it could. >>>> I do not know how easily. >>>> >>>>> Every book ever published means more than 10 million records. >>>>> (It probably means more than 100 million records.) OCR cleaning >>>>> attracts hundreds or a few thousand volunteers, which is >>>>> sufficient to take on thousands of books, but not millions. >>>> >>>> Focusing efforts on notable works with verifiable OCR, and using the >>>> sorts of helper tools that Greg's paper describes, I do not doubt that >>>> we could effectively clean and publish OCR for all primary sources >>>> that are actively used and referenced in scholarship today (and more >>>> besides). Though 'we' here is the world - certainly more than a few >>>> thousand volunteers have at least one book they would like to polish. >>>> Most of them are not currently Wikimedia contributors, that much is >>>> certain -- we don't provide any tools to make this work convenient or >>>> rewarding. >>>> >>>>> Google scanned millions of books already, but I haven't heard of >>>>> any plans for cleaning all that OCR text. >>>> >>>> Well, Google does not believe in distributed human effort. (This came >>>> up in a recent Knol thread as well.) I'm not sure that is the best >>>> comparison. >>>> >>>> SJ >>> >>> -- >>> http://www.non-violence.org/ | Site collaboratif sur la non-violence >>> http://www.forget-me.net/ | Alternatives sur le Net >>> http://fr.wikisource.org/ | Bibliothèque libre >>> http://wikilivres.info | Documents libres >>> >>> _______________________________________________ >>> foundation-l mailing list >>> foundation-l@lists.wikimedia.org >>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l >>> >> >> _______________________________________________ >> foundation-l mailing list >> foundation-l@lists.wikimedia.org >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l >> > > _______________________________________________ > foundation-l mailing list > foundation-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l