On Fri, Jun 19, 2015 at 07:52:40AM -0700, Mark Diggory wrote: > Putting all this "tail wagging the dog" aside. I think it would be very > good to get the appropriate "metadata" added to the PDF. > > I wanted to contribute that we recently had a "non-coverpage" case where > the title of a paper was correct in the first page of the pdf and in the > DSpace metadata, but the PDF had the incorrect title in its internal > metadata. This caused Google Scholar to show the incorrect title in its > search results, which caused much confusion for the owner of that document. > Changing the metadata resulted in the GS record changing. From this point, > it is clear the GS is leaning heavily on PDF internal metadata as is > primary source for its records. > > I think that if the appropriate metadata were populated in the pdf process, > that it would take precedence over the cover page in GS.
Hear, hear. Having correct, complete machine-readable metadata in the document itself is a Good Thing. Researcher: if you do this yourself, it's in your interest to ensure that you do it well. If you have an assistant to take care of such things, it's in your interest to ensure that your assistant knows how to do it well. If you depend on Google Scholar or something like it, you (all) get out of it what you (all) put into it. The notion of a repository doing this automatically, whether machine-readably or by generated cover pages, leads to some interesting corner cases. If the title page, repo. metadata, and document metadata disagree, which one is correct? If the document contains poor-quality metadata, but it does contain them, then should the repo. *replace* them with corrected values? On the other end of the ingestion process, what if we *extract* metadata from the document and then have to correct them? do we fix the document? And regardless of how much we trust our own process, will search engines trust our repo., the document metadata, or their own heuristic fishing in the first page? To gather some ideas, we might want to see what commercial publishers do about these issues. (Oh, boy: what if an academic repo. and a publisher make *different* adjustments to document metadata? Can we get repo.s, publishers, and researchers to agree on priorities and a process for polishing and harmonizing document metadata?) I think that, in the end, all parties want "the best we can reasonably do." But how do we get there? -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu
signature.asc
Description: Digital signature
------------------------------------------------------------------------------
_______________________________________________ Dspace-general mailing list Dspace-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general