RE: Lucene indexes

2009-02-24 Thread Steven A Rowe
On 2/24/2009 at 5:36 PM, Chris Hostetter wrote: > Shingling is (lucene specific?) vernacular for word based ngrams "Shingle" is not a Lucene-specific term - here's an entry, e.g., from an IBM "Glossary of terms for enterprise search" at

RE: Lucene indexes

2009-02-24 Thread Chris Hostetter
: The problem that I am trying to solve is : How to index phrases (rather : than phrase querying)? I have a Questions/Answers corpus, the : architecture I am using for IR creates one index for questions and : another one for answers (based on single terms) and then matches between : them. I wa

Re: Faceted Search using Lucene

2009-02-24 Thread Amin Mohammed-Coleman
The reason for the indexreader.reopen is because I have a webapp which enables users to upload files and then search for the documents. If I don't reopen i'm concerned that the facet hit counter won't be updated. On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman wrote: > Hi > I have been ab

Re: Faceted Search using Lucene

2009-02-24 Thread Amin Mohammed-Coleman
Hi I have been able to get the code working for my scenario, however I have a question and I was wondering if I could get some help. I have a list of IndexSearchers which are used in a MultiSearcher class. I use the indexsearchers to get each indexreader and put them into a MultiIndexReader. Ind

Re: How to compute the simlarity of a web page?

2009-02-24 Thread Linhon
wow,it sounds very nice,thank you:) 在 2009-02-16一的 22:08 -0500,Grant Ingersoll写道: > Hmmm, you might be able to do the following: > > Create a document in a memory index containing the web page > Create a query from the keywords > Do a search with the query against the memory index and see the sco

Re: Lucene indexes

2009-02-24 Thread Shashi Kant
Nada, You might want to consider writing a custom tokenizer which will allow you to generate tokens based on your needs (other than whitespace). Another option would be to look at SpanQuery or SpanNearQuery which would help with the kind of problem you are trying to solve (assuming I understand

RE: Lucene indexes

2009-02-24 Thread Nada Mimouni
Thank you Erick. I am totally aware that Lucene uses inverted index (class: IndexWriter). I have read in the literature about new efficient indexes that are created to handle phrases indexing, so I wondered if there are some updates or new classes added to Lucene for that reason. The problem th

Re: Lucene indexes

2009-02-24 Thread Erick Erickson
I have to ask why do you care? Which is another way of asking what problem you're trying to solve that you think this information would help with. As far as I know Lucene is an inverted index, period. You use IndexWriter to create it. Really the best way to get a sense for which classes to use is

Re: Pylucene

2009-02-24 Thread Michael McCandless
Can you re-ask this on pylucene-...@lucene.apache.org? Mike Seid Mohammed wrote: I like python's text scripting capability. for that I have read pylucene and some of the samples taken from lucene in action book works fine. my question is, I have language specific tools such as analyzer, tokni

Pylucene

2009-02-24 Thread Seid Mohammed
I like python's text scripting capability. for that I have read pylucene and some of the samples taken from lucene in action book works fine. my question is, I have language specific tools such as analyzer, toknizer done for Amharic Language in java-lucene. how can I use these tools just in pylucen

Re: Why is the constructor of TopFieldDocs not public?

2009-02-24 Thread Michael McCandless
Good question. Are you hitting any other package-private issues in creating your own searcher? (Seems likely you may). TopDocs, in contrast, has a public ctor. If there are no objections I'll switch it to public... Mike Cheolgoo Kang wrote: I'm subclassing MultiSearcher and writing a cu

Re: IndexWriter 2-phase commit usage

2009-02-24 Thread Michael McCandless
prepareCommit does almost all the work, right up until writing a new segments_N file with all of its contents *except* the final checksum at the end of the file. Readers that try to open the index at this point will see an invalid checksum and will fallback to the previous segments_(N-1)

Lucene indexes

2009-02-24 Thread Nada Mimouni
Hello everybody, 1) What is the difference between : - inverted index - nextword index - common index 2) Which one(s) is(are) supported by Lucene? 3) Which class(es) create this(those) index(es)? Thank you in advance for your help. Nada Mimouni --

Re: IndexWriter 2-phase commit usage

2009-02-24 Thread mark harwood
As suggested, the window for failure here is very small. The commit is effectively an atomic single file rename operation to make the new segments file visible. However, should there be a failure between 2 commits the new deletion policy logic should help you recover to prior commit points. See

RE: IndexWriter 2-phase commit usage

2009-02-24 Thread Fang_Li
The prepareCommit should do most real works, so the chance index2.commit() failure should be slim. I think it's very hard to compensate the changes already committed. One solution is that you create separate indexes for each transaction and merge them later. Merging can fail, but the transaction