OK here what i've come up with - After reading your suggestions
- bit set from DB stays untouched
- only one field shall be used to store interest field bits in the document:
"interest". Saves disk space.
- The bits shall be not be converted to readable string but added as values
separated by spa
Paul,
we are using a slightly modified version of Lucene,
so in order to run the performance tests on a nightly build, I need
Lucene's sources, not the compiled classes.
Is there a nice and easy way to get them?
Stanislav
Stanislav Jordanov wrote:
Paul,
We are working on delivering the next
Lucene will automatically separate tokens during index and search if you use
the right analyzer. See the various classes that implement Analyzer. I don't
know if you really wanted to use the numeric literals, but I wouldn't. The
analyzers that do the most for you (automatically break up on spaces,
The background of this is also separating content according to domains
Example:
- pictureA (marked as a "joke" #flag :1)
- pictureB (marked as a "adult picture" #flag: 2)
Site1: Users allowed to view everything (pictureA, pictureB )
Site2: Users allowed to view everything except pictureB (no adu
You could store a value for each flag then be careful about what analyzers
you use. For instance, using WhitespaceAnalyzer (index AND search) and doing
your own casing. That is, make sure you lowercase as necessary (NOTE:
operators AND, OR NOT must not be lowercased if you send them through
queryp
On Tuesday 28 November 2006 12:12, Stanislav Jordanov wrote:
> Paul,
> we are using a slightly modified version of Lucene,
> so in order to run the performance tests on a nightly build, I need
> Lucene's sources, not the compiled classes.
> Is there a nice and easy way to get them?
The sources ar
Mike,
Below is the pseudo code of the application. A few implementation
points to understand the pseudo-code:
- We have a home grown threadpool class that allows us to index
multiple documents in parallel. We usually submit 200 jobs to the
pool (2-3 worker threads usually for the pool). O
On 11/27/06, Michael McCandless <[EMAIL PROTECTED]> wrote:
Suman Ghosh wrote:
> On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote:
>> > Here are the values:
>> >
>> > mergeFactor=10
>> > maxMergeDocs=10
>> > minMergeDocs=100
>> >
>> >
The code works very well,
Thanks,
Laurie
-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED]
Sent: 27 November 2006 18:52
To: java-user@lucene.apache.org
Subject: Re: Hits length with no sorting or scoring
On Monday 27 November 2006 14:30, Hirsch Laurence wrote:
> Hello,
>
Yonik Seeley wrote:
Actually, in previous versions of Lucene, it *was* possible to get way
too many first level segments because of the wonky logic when the
IndexWriter was closed. That has been fixed in the trunk with the new
merge policy, and you will never see more than mergeFactor first lev
This looks correct to me. It's good you are doing the deletes
"in bulk" up front for each batch of documents. So I guess you
hit the error (& 5000 segments files) while processing batches
of 200 docs (because you then optimize in the end)?
Do you search this index while it's building, or, only
1) I don't really know anything about Syns2Index - but the errors you
cited don't seem to have anything to do with Lucene ... your compiler
appears to be complaining about assert statements within the core java
system classes ... which is a little strainge. you said you are psat the
HellowWorld
The search functionality must be available during the index build. Since a
relatively small number of documents are being affected (and also we plan to
perform the build during a period of time we know to be relatively quiet
from last 2 years site access data) during the build process, we hope tha
I have documents that can be referred to by multiple identifiers (and I want
to store the identifiers separate from the main indexed content). I'm
wondering if I should put each identifier in it's own keyword field, or have
one tokenized field with all of the identifiers in it. What I'm talking
a
On Nov 28, 2006, at 4:31 PM, Michael Rusch wrote:
I have documents that can be referred to by multiple identifiers
(and I want
to store the identifiers separate from the main indexed content). I'm
wondering if I should put each identifier in it's own keyword
field, or have
one tokenized fi
Hello,
we have one problem with the sort routine. We use the multisearcher function
over severall index.
The result will be sorted by the booknumber, but the produced list isn't
sorted correct. There are 300 hits from book a, then 150 from book b, 95
hits book 3, but then there are 1,2,3 hits of
Suman Ghosh wrote:
The search functionality must be available during the index build. Since a
relatively small number of documents are being affected (and also we
plan to
perform the build during a period of time we know to be relatively quiet
from last 2 years site access data) during the buil
17 matches
Mail list logo