On Mon, 2008-04-14 at 21:26 -0700, Otis Gospodnetic wrote:
> Toke, this is *super* juicy information, very useful and educational.
> Please do put this on the Wiki. There doesn't seem to be a benchmarking
> page on the Wiki yet, so I suggest you go to
> http://wiki.apache.org/lucene-java/LuceneBe
I have not tokenized phrases in index.
What query should I use?
Simple TermQuery does not work.
If I try to use QueryParser , what analyzer should I use?
Daniel Naber-10 wrote:
>
> On Montag, 14. April 2008, palexv wrote:
>
>> For example I need to search for "java de*" and recieve "java
>> d
Chris Hostetter wrote:
you can't ... that's why i said you'd need to rebuild the smaller index
completley on a periodic basis (going in the same order as the docs in the
Mmm, the annotations would only be stored in the index. It would be possible to
store them elsewhere, so I can investigate
: would then have to make a join using mailId against the core. However, if I
: want to use PR, I could have a single Document with multiple field, and using
: stored fields can 'modify' that Document. However, what happens to the DocId
: when the delete+add occurs and how do I ensure it stays t
Toke, this is *super* juicy information, very useful and educational. Please
do put this on the Wiki.
There doesn't seem to be a benchmarking page on the Wiki yet, so I suggest you
go to http://wiki.apache.org/lucene-java/LuceneBenchmarks, create that page,
and put everything you want and can s
Thanks all for the suggestions - there was also another thread "Lucene index on
relational data" which had crossover here.
That's an interesting idea about using ParallelReader for the changable index.
I had thought to just have a triplet indexed 'owner:mailId:label' in each Doc
and have multi
: The archive is read only apart from bulk deletes, but one of the requirements
: is for users to be able to label their own mail. Given that a Lucene Document
: cannot be updated, I have thought about having a separate Lucene index that
: has just the 3 terms (or some combination of) userId + ma
: How does this work internally? It seems as if all data for this field found
in
: the entire index is read into memory (?).
You can think of it as an "inverted-inverted index" Lucene needs a data
structure it can usefor fast lookups where the key is the docId and the
value is something "com
: - check maxDoc()
: - iterate from 0 to maxDoc() and process doc if it is not deleted
For the record: that is exactly what MatchAllDocsQuery does ... except
that you have an off by one error (maxDoc returns 1 more then the
largest possible document number).
Even if you don't want the Query AP
OK, if you're going after simple terms without any logic (or with
very simple logic), why search at all? Why not just use TermDocs and/or
TermEnum to flip through the index noticing documents that match?
I'd only recommend this if you are NOT trying to parse complex
queries. That is, say, you are
On Montag, 14. April 2008, palexv wrote:
> For example I need to search for "java de*" and recieve "java
> developers", "java development", "developed by java" etc.
If your text is tokenized, this is not supported by QueryParser but you can
create such queries using MultiPhraseQuery. If you don'
You can use your approach w/ or w/o the filter.
>td = indexSearcher.search(query, filter, maxnumhits);
You need to use a filter for the wildcards which is built in to the
query.
1) Extend QueryParser to override the getWildcardQuery method.
(Or even if you don't use QueryParser, j
Hi Everyone,
Any help around this topic will be very useful. Is
anyone partitioning the data into 2 or more indexes
and using parallelReader to search these indexes? If
yes, how do you handle updates to the indexes and make
sure the doc ids for all indexes are in same order?
Regards,
Rajesh
---
Hi Erick,
Here is a quick overview of what I hope to accomplish with lucene. I am
using a lucene database to store condensed information about a collection
of data that I have. The data has to be constantly updated for correctness
so that when one part changes certain other parts can be changed
As I stated in my original reply, a Hits object re-executes the
search every 100 or so objects you examine. So some loop like
Hits hits = search
for (int idx = 0; idx < hits.length; ++idx ) {
Document doc = hits.get(idx);
}
really does something like
for (int idx = 0; idx < hits.length; +
Hi all.
I have an index with a set of phrases(one or several words).
I need to make search for these phrases.
I am confused as I can not find a good way to search for phrases.
For example I need to search for "java de*" and recieve "java developers",
"java development", "developed by java" etc.
Hi Erick,
Thanks for the information. I tried using a HitCollector and a
FieldSelector. I'm getting some dramatic improvements gathering large
result sets using the FieldSelector. As it turned out I was able to assume
in many cases that I could break out after a specific field in each
document
17 matches
Mail list logo