I think that the idea that Uwe mentions is completely valid. Although it has a
few disadvantages:
For example, what if you want to suggest "multiword suggestions" and in your
index you have only "single word" tokens.
Query: Ferrari
Ideal suggestions: Ferrari 354 BT, Ferrari 355 C, Ferrari 356
I
Hi
I am trying to implement an Expectation Maximization algorithm for document
clustering. I am planning to use Lucene Term Vectors for finding similarity
between 2 documents. There are 2 kinds of EM algos using naive Bayes: the
multivariate model and the multinomial model. In simple terms, the
> Maybe we need a seekFloor in the TermsEnum? (What we have now is
> really seekCeil). But, what's the larger use case here..?
I opened an issue LUCENE-2930 to simply store the last/max term,
however the seekFloor would work just as well. The use case is
finding the last of the ordered IDs stor
Hi,
I just have a suggestion to your first idea of enumerating terms, which is
very fast if done right:
> I'd like to suggest search terms to my users. My naïve approach would have
> been:
> After at least n characters have been typed (asynchronously) find terms in
> IndexReader.terms() which "m
Hello Clemens: a short time ago, I 've faced the same exact problem. Using
Apache Solr I built a "suggest" index as a complete separated index, which
indexes all the possible terms for suggest (terms that come from the documents
to be indexed, using n-grams from a minimum to a maximum number of
I'd like to suggest search terms to my users. My naïve approach would have been:
After at least n characters have been typed (asynchronously) find terms in
IndexReader.terms() which "match"
Is there a (even) more straight forward (and possible faster) approach to get
"search term suggestions"?
Thanks for the good news, Simon. That kind of timing will work for my
project.
Kim
On Mon, Feb 21, 2011 at 1:15 AM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> Hi Kim,
>
> Bad news, branch_3x has not been released yet! Good news, we are about
> to release 3_x in the very near futu
On Sun, Feb 20, 2011 at 8:47 PM, Jason Rutherglen
wrote:
>> Though, if you just want to get to the last term... VarGap's terms
>> index can quickly tell you the last indexed term, and from there you
>> can scan to the last term? (It'd be at most 32 (by default) scans).
>
> In VariableGapTermsInde
Hi Ian;
As you told we can use explicitly specify ANDs and ORs operator set parser.
Otherwise we can use default parser to get hit(document) which is ORs. Do not
gives correct hit(document)!!!
My question is that, Is there any parser we can use in case of multiple Boolean
clause in search stri
Hi David,
With current Lucene versions, the usage of ParallelReader is very
complicated to keep in sync. The problem is how merges occur. For
ParallelReader to work, all internal document ids (the integers) must be
parallel. As the new MergePolicies now work on size of documents and also
may work
Hello everybody,
I was wondering, if someone could point me to what I need to be aware of, using
a ParallelReader.
My intention is to modify Nutch (http://nutch.apache.org/) in a way, that in
the Lucene-index Nutch uses, only documents for changed websites are updated.
However, due to the exi
Hi Kim,
Bad news, branch_3x has not been released yet! Good news, we are about
to release 3_x in the very near future. Branch is already in a code
freeze status so the release might happen very soon though (next 2
weeks I hope / guess). You can follow the d...@lucene.apache.org if you
are interest
12 matches
Mail list logo