Re: ApacheCon next week

2005-12-12 Thread Ian Soboroff
Grant Ingersoll <[EMAIL PROTECTED]> writes: > You stole my thunder! :-) Was going to post the URL after doing the > actual talk, but that's all right. I will post a few changes I have > made on the plane tonight or tomorrow to the website below. > > Let me know if you have any questions... I h

Re: Lucene + LSI

2005-12-12 Thread Ian Soboroff
Paul Libbrecht <[EMAIL PROTECTED]> writes: > We're also thinking about implementing something similar to LSI within > ActiveMath which is lucene-powered where both formulae and text > searching would benefit of the latent-semantic-similarity. I've been > refrained of doing "exactly this" at least

Re: Question about scoring normalisation

2005-11-07 Thread Ian Soboroff
"Karl Koch" <[EMAIL PROTECTED]> writes: > I am not sure if I know exactly what pivoted normalisation is. I can tell > you what I do, in the meantime I will have a look to your paper and I hope > that we can discuss this issue further. Sort answer on pivoted document length normalization. You'll

Re: Vector Model and Relevance Feedback

2005-11-02 Thread Ian Soboroff
Stefan Gusenbauer <[EMAIL PROTECTED]> writes: > Is there an add on for lucene to get a real vector representation? > Does anyone has experiences with this issue? No code, but some small thinking. You can do hacks with boosts and whatnot, but I think in the end you really want a new Query subclas

Re: Implementing Lucene Search on DB2

2005-09-30 Thread Ian Soboroff
You should look to the various papers by Ophir Frieder, David Grossman, Abdur Chowdhury and others from IIT. They developed a whole IR system in SQL. http://ir.iit.edu/irwebsiteserv/IRViewer?docid=159 http://ir.iit.edu/irwebsiteserv/IRViewer?docid=132 Grossman and Frieder have a recent textbook

Re: Indexing .txt file containing english, german or french alphabet

2005-09-26 Thread Ian Soboroff
Otis Gospodnetic <[EMAIL PROTECTED]> writes: > For indexing text that has multiple languages I don't know what to > recommend. Well, I do - try the StandardAnalyzer and see if that > produces satisfactory results, but you'd really need a smart analyzer > that knows how to properly tokenize an

Re: Search Theory Book

2005-05-24 Thread Ian Soboroff
<[EMAIL PROTECTED]> writes: > I would go with "Information Retrieval: Algorithms and Heuristics" by > Grossman (a bit expensive, but worth the money > http://www.fetchbook.info/compare.do?search=0134638379). The second edition is about $38 in paperback. http://www.springeronline.com/sgw/cda/fron

Re: Search Theory Book

2005-05-23 Thread Ian Soboroff
"Monsur Hossain" <[EMAIL PROTECTED]> writes: > Much along the same lines, I'm curious if "Information Retrieval: Data > Structures and Algorithms" (by William B. Frakes, Ricardo Baeza-Yates) is a > good resource? Its referenced a lot in "Modern Information Retrieval", but > I imagine because its

Re: Search Theory Book

2005-05-13 Thread Ian Soboroff
Gary Moore <[EMAIL PROTECTED]> writes: > Salton, Gerald and McGill, Michael J. /Introduction to Modern > Information Retrieval/. McGraw-Hill, 1983. Not only hard to get ahold of these days, but really really really out of date. This book should be of historical interest only. Frakes and Baez

Re: indexing TREC

2005-05-08 Thread Ian Soboroff
Quoting Erik Hatcher <[EMAIL PROTECTED]>: > > On May 7, 2005, at 10:24 PM, [EMAIL PROTECTED] wrote: > > > Hi - > > > > Is there any TREC parser (indexing many documents that are in the > > same file)for Lucene avaiable? > > Not to my knowledge. I've been working on indexing some TREC data