[CODE4LIB] introduction, and a fun date visualization

2016-02-09 Thread Greg Lindahl
Hi! I'm a new employee of the Internet Archive, formerly a search engine guy, mostly working on search for the Wayback Machine. In my spare time I've been working on a visualization of dates and entities in scanned book contents. There's a blog post about it here: https://blog.archive.org/2016/02/

Re: [CODE4LIB] Best way to handle non-US keyboard chars in URLs?

2016-02-21 Thread Greg Lindahl
On Sun, Feb 21, 2016 at 05:08:59PM -0500, Chris Moschini wrote: > > 2) Google and friends are more than capable of handling redirects, even > > when done badly. > > Google punishes redirects actually. #38 here: > > https://blog.kissmetrics.com/penalized-by-google/ > > But you can find plenty mo

[CODE4LIB] User testing examples

2016-03-11 Thread Greg Lindahl
Carolyn at the Internet Archive recently started doing some user testing on our UI, and you can find our testing scripts and notes in https://archive.org/details/usertestingandresearchcollection She'll be adding more stuff to this collection over time. -- greg

Re: [CODE4LIB] ISO: State of the art in video annotation

2016-03-19 Thread Greg Lindahl
This may or may not be relevant to the "annotation" that the original poster had in mind, but the Internet Archive embedded video player takes subtitles in the common SubRip .srt format, which is apparently supported by many video players & subtitling programs. Instead of using this for closed cap

Re: [CODE4LIB] Google can give you answers, but librarians give you the right answers

2016-04-06 Thread Greg Lindahl
On Wed, Apr 06, 2016 at 07:42:11AM -0700, Karen Coyle wrote: > Also, without the links that fuel pagerank, the ranking is very > unsatisfactory - cf. Google Book searches, which are often very > unsatisfying -- and face it, if Google can't make it work, what are > the odds that we can? Karen, I

[CODE4LIB] Help me build a QA dataset for a Wayback search engine

2016-04-21 Thread Greg Lindahl
I'm working on a search engine for the Internet Archive's Wayback Machine web archive, and we're at the stage where we could use a diverse set of web search queries for quality assessment. If you have a few spare minutes, please fill out the form at: http://goo.gl/forms/HThG6R9Pp0 Thanks in advan

[CODE4LIB] Language codes

2016-06-01 Thread Greg Lindahl
Some of the Internet Archive's library partners are asking us about language metadata for regional languages that don't have standard codes. Is there a standard way of dealing with this situation? Overall we use MARC codes https://www.loc.gov/marc/languages/ which were last updated in 2007. LOC a