Re: indexing unsupported mime types using Lucene

2008-06-20 Thread Otis Gospodnetic
Gaurav, If you go to http://lucene.apache.org/ you will see a Tika tab there. It's OSS. LIUS is either a part of Tika or is about to become a part of it. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Gaurav Sharma <[EMAIL PROTECTED]> >

Re: creating Array of IndexReaders

2008-06-20 Thread Otis Gospodnetic
Hi, I think you mentioned 225GB of data somewhere. You can open IndexReaders "on demand", but that's not a cheap operation, esp. not with so much data. You want to keep your IndexReaders opened for a while. Multiple requests/threads can share them. Otis -- Sematext -- http://sematext.com/ --

Re: Arbitrary String to String Similarity Score

2008-06-20 Thread Otis Gospodnetic
You should look into SecondString perhaps then, like Grant said. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Sangrish <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, June 20, 2008 1:45:52 PM > Subject: Re: Arbitrary

Re: Match "best one" from list

2008-06-20 Thread Chris Hostetter
correct, adding new syntax to the parser currently requires editing the grammer. Something else you might consider is that ifyou expect "BESTOF" type queries to be the default behavior people want, you could just overriget the getBooleanQuery method of hte QUeryarser and *always* generate a

Re: Copying a part of index and index structure

2008-06-20 Thread Andrzej Bialecki
Anshum wrote: Hey Andrzej, Could you tell me as to what research suggests this and why is it this way? My calculation says the average load on each server would go down as I would know what server to query for an index term as opposed to querying all servers for terms. I'm looking for a solution

Example using NGramSpeller.java

2008-06-20 Thread sumittyagi
is there any way i can find example of a program using NGramSpeller.java -- View this message in context: http://www.nabble.com/Example-using-NGramSpeller.java-tp18034945p18034945.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Re: Arbitrary String to String Similarity Score

2008-06-20 Thread Sangrish
Yes, "MoreLikeThis" is more like what I want. But theres one problem. Even here one has to run the query against an indexed set of documents. While I would like to create two Queries through "MoreLikeThis" and get a score of how similar they are to each other. Siddharth Otis Gospodnet

Re: Termdocs question

2008-06-20 Thread Erick Erickson
A couple of questions: 1> I assume by "not returning any docs" you mean that you never get into your while loop. Is that true? 2> I'm a little suspicious of the field labeled "id" and whether it's at all possible that this is getting confused with the internal Lucene doc ID. This is a

Termdocs question

2008-06-20 Thread Vinicius Carvalho
Hello there! I trying to query for a specific document on a efficient way. My index is structured in a way where I have an id field which is a unique key for the whole index. When I'm updating/removing a document I was searching for my id using a Searcher and a TermQuery. But reading the list it se

Re: Indexing and searching txt files

2008-06-20 Thread Erick Erickson
U, have you tried reading any of the info on the home page? See: http://lucene.apache.org/java/2_3_2/gettingstarted.html I'd also recommend "Lucene in Action" Best Erick On Fri, Jun 20, 2008 at 10:58 AM, jnance <[EMAIL PROTECTED]> wrote: > > Hi, > > I am new to Lucene. I have several text

Re: looking for efficient way to dump index info

2008-06-20 Thread Gerardo Segura
I'm using this to build a static index of documents and terms. A snapshot requested for further client (third party) analysis. regards, Gerardo Erick Erickson wrote: What's the high-level goal here? The reason I ask is that I'm not sure what *use* these scores are to you. Perhaps someone will

Indexing and searching txt files

2008-06-20 Thread jnance
Hi, I am new to Lucene. I have several text files I would like to index and search. How do I do this? Thanks, jnance -- View this message in context: http://www.nabble.com/Indexing-and-searching-txt-files-tp18031330p18031330.html Sent from the Lucene - Java Users mailing list archive at Nabbl

Re: Copying a part of index and index structure

2008-06-20 Thread Eric Bowman
Anshum wrote: Hey Andrzej, Could you tell me as to what research suggests this and why is it this way? My calculation says the average load on each server would go down as I would know what server to query for an index term as opposed to querying all servers for terms. I'm looking for a solution

Re: Copying a part of index and index structure

2008-06-20 Thread j . L
i think u can use solr to solve it. u just merge ur search result from 2 solr Instance(2 indexes). it is very simple and u can distribute it. On Wed, Jun 18, 2008 at 9:12 PM, Anshum <[EMAIL PROTECTED]> wrote: > I have 2 indexes and I would like to move index for a few 'selected' and > 'specifie

Re: Copying a part of index and index structure

2008-06-20 Thread Anshum
Hey Andrzej, Could you tell me as to what research suggests this and why is it this way? My calculation says the average load on each server would go down as I would know what server to query for an index term as opposed to querying all servers for terms. I'm looking for a solution wherein I could

Re: Copying a part of index and index structure

2008-06-20 Thread Anshum
Hey Otis, Could you suggest a few good distributed (lucene) search solutions? (Open Source) Yes, I do want to split by terms as the math tells a story. :) TF IDF would be handled separately. I'd just use a different cluster of machines to store the index instead of having the search run on the sam

Re: Copying a part of index and index structure

2008-06-20 Thread Andrzej Bialecki
Otis Gospodnetic wrote: Hi, Not doable with Lucene as far as I know. I'm not even certain you would want to split by term. What would that do TF IDF in your distributed search? What's wrong with splitting t the doc level? There are about half a dozen distributed (Lucene) search solutions floa