Re: Suggest search terms

2011-02-21 Thread Fernando Wasylyszyn
I think that the idea that Uwe mentions is completely valid. Although it has a few disadvantages: For example, what if you want to suggest "multiword suggestions" and in your index you have only "single word" tokens. Query: Ferrari Ideal suggestions: Ferrari 354 BT, Ferrari 355 C, Ferrari 356 I

Lucene TermVector

2011-02-21 Thread Ajay Anandan
Hi I am trying to implement an Expectation Maximization algorithm for document clustering. I am planning to use Lucene Term Vectors for finding similarity between 2 documents. There are 2 kinds of EM algos using naive Bayes: the multivariate model and the multinomial model. In simple terms, the

Re: Last/max term in Lucene 4.x

2011-02-21 Thread Jason Rutherglen
> Maybe we need a seekFloor in the TermsEnum?  (What we have now is > really seekCeil).  But, what's the larger use case here..? I opened an issue LUCENE-2930 to simply store the last/max term, however the seekFloor would work just as well. The use case is finding the last of the ordered IDs stor

RE: Suggest search terms

2011-02-21 Thread Uwe Schindler
Hi, I just have a suggestion to your first idea of enumerating terms, which is very fast if done right: > I'd like to suggest search terms to my users. My naïve approach would have > been: > After at least n characters have been typed (asynchronously) find terms in > IndexReader.terms() which "m

Re: Suggest search terms

2011-02-21 Thread Fernando Wasylyszyn
Hello Clemens: a short time ago, I 've faced the same exact problem. Using Apache Solr I built a "suggest" index as a complete separated index, which indexes all the possible terms for suggest (terms that come from the documents to be indexed, using n-grams from a minimum to a maximum number of

Suggest search terms

2011-02-21 Thread Clemens Wyss
I'd like to suggest search terms to my users. My naïve approach would have been: After at least n characters have been typed (asynchronously) find terms in IndexReader.terms() which "match" Is there a (even) more straight forward (and possible faster) approach to get "search term suggestions"?

Re: encodeNormValue

2011-02-21 Thread Kim Kokkonen
Thanks for the good news, Simon. That kind of timing will work for my project. Kim On Mon, Feb 21, 2011 at 1:15 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > Hi Kim, > > Bad news, branch_3x has not been released yet! Good news, we are about > to release 3_x in the very near futu

Re: Last/max term in Lucene 4.x

2011-02-21 Thread Michael McCandless
On Sun, Feb 20, 2011 at 8:47 PM, Jason Rutherglen wrote: >> Though, if you just want to get to the last term... VarGap's terms >> index can quickly tell you the last indexed term, and from there you >> can scan to the last term?  (It'd be at most 32 (by default) scans). > > In VariableGapTermsInde

Re: lucene3.0.3 | get correct document in case of multiple Boolean query in search criteria

2011-02-21 Thread Ranjit Kumar
Hi Ian; As you told we can use explicitly specify ANDs and ORs operator set parser. Otherwise we can use default parser to get hit(document) which is ORs. Do not gives correct hit(document)!!! My question is that, Is there any parser we can use in case of multiple Boolean clause in search stri

RE: ParallelReader

2011-02-21 Thread Uwe Schindler
Hi David, With current Lucene versions, the usage of ParallelReader is very complicated to keep in sync. The problem is how merges occur. For ParallelReader to work, all internal document ids (the integers) must be parallel. As the new MergePolicies now work on size of documents and also may work

ParallelReader

2011-02-21 Thread David Saile
Hello everybody, I was wondering, if someone could point me to what I need to be aware of, using a ParallelReader. My intention is to modify Nutch (http://nutch.apache.org/) in a way, that in the Lucene-index Nutch uses, only documents for changed websites are updated. However, due to the exi

Re: encodeNormValue

2011-02-21 Thread Simon Willnauer
Hi Kim, Bad news, branch_3x has not been released yet! Good news, we are about to release 3_x in the very near future. Branch is already in a code freeze status so the release might happen very soon though (next 2 weeks I hope / guess). You can follow the d...@lucene.apache.org if you are interest