[EMAIL PROTECTED] wrote:
Thanks Anthony for your response, I did not know about that field.
You make your own fields in Lucene, it is not something Lucene gives you.
But still I have a problem and it is about privacy. The users are concerned
about privacy and so, we thought we could have all
Cyndy wrote:
I want to keep user text files indexed separately, I will have about 10,000
users and each user may have about 20,000 short files, and I need to keep
privacy. So the idea is to have one folder with the text files and index
for each user, so when search will be done, it will be poin
Hi,
I am currently working on the calculation of score part in Lucene. And I
encounter a part that I do not understand.
return raw * Similarity.decodeNorm(norms[doc]); // normalize for field
As can be seen from the code above, the Similarity method decodeNorm() will
be called to decode the
Hello, I am new into Lucene and I want to make sure what I am trying to do
will not hit performance. My scenario is the following:
I want to keep user text files indexed separately, I will have about 10,000
users and each user may have about 20,000 short files, and I need to keep
privacy. So the
For some reason I am thinking I read somewhere that if you queried something
like:
"Eiffel Tower"
Lucene would execute the query "Eiffel AND Tower"
Basically I am trying to ask, does lucene automatically replaces spaces with
the AND operator?
Thanks
Dana
--
View this message in context:
http
Karsten F. wrote:
Hi Bill,
you should not use prefix-query (*), because in first step lucene would
generate a list of all terms in this field, and than search for all this
terms. Which is senceless.
That's not quite an accurate description of what it does as it nowhere
near as slow as doi
Doron Cohen wrote:
The API definitely doesn't promise this.
AFAIK implementation wise it happens to be like this but I can be wrong and
plus it might change in the future. It would make me nervous to rely on
this.
I made some tests and it 'seems' to work, but I agree, it also makes me nervous
Mmmkay. I think I'll wait, then.
Thank you so much for your help. I really appreciate it.
Also, I really dig Lucene, so thanks for your hard work!
-Matt
Michael McCandless-2 wrote:
>
>
> mattspitz wrote:
>
>> Is there no way to ensure consistency on the disk with 2.3.2?
>
> Unfortunately
mattspitz wrote:
Is there no way to ensure consistency on the disk with 2.3.2?
Unfortunately no.
This is a little off-topic, but is it worth upgrading to 2.4 right
now if
I've got a very stable system already implemented with 2.3.2? I don't
really want to introduce oddities because I'm u
Thanks for your replies!
Is there no way to ensure consistency on the disk with 2.3.2?
This is a little off-topic, but is it worth upgrading to 2.4 right now if
I've got a very stable system already implemented with 2.3.2? I don't
really want to introduce oddities because I'm using an "unfinish
mattspitz wrote:
Are the index files synced on writer.close()?
No, they aren't. Not until 2.4 (trunk).
Thank you so much for your help. I think the seek time is the issue,
especially considering the high merge factor and the fact that the
segments
are scattered all over the disk.
You
Mike-
Are the index files synced on writer.close()?
Thank you so much for your help. I think the seek time is the issue,
especially considering the high merge factor and the fact that the segments
are scattered all over the disk.
Will a faster disk cache access affect the optimization and merg
mattspitz wrote:
So, my indexing is done in "rounds", where I pull a bunch of
documents from
the database, index them, and flush them to disk. I manually call
"flush()"
because I need to ensure that what's on disk is accurate with what
I've
pulled from the database.
On each round, then,
So, my indexing is done in "rounds", where I pull a bunch of documents from
the database, index them, and flush them to disk. I manually call "flush()"
because I need to ensure that what's on disk is accurate with what I've
pulled from the database.
On each round, then, I flush to disk. I set t
Matt,
One important bit that you didn't mention is what your maxBufferedSize setting
is. If it's too low you will see lots of IO. Increasing it means less IO, but
more JVM heap need. Is your disk IO caused by searches or indexing only?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Sol
Is that really 1 byte for each document? Not 1 byte for each field of each
document?
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Doron Cohen <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Monday, August 18, 2008
Hi Bill,
A simpler suggestion, assuming you need to test for the existence of just one
particular field: rather than adding a field containing a list of all indexed
fields for a particular document, as Karsten suggested, you could just add a
field with a constant value when the field you want t
Karsten's saying that prefix and wildcard queries require a bunch of work.
Specifically, Lucene assembles a list of all terms that match the query and
then, conceptually at least, forms a huge OR query with one clause for
each term. Say, for instance, you have the following values indexed in a
fiel
Karsten,
Thanks for the feedback. Not sure I understand the reasoning behind not
using the "" prefix (do you have a link possibly?). But I see what
you are getting at with the additional field. I'll give it a try.
Thanks for the help.
regards,
Bill
-Original Message-
From: Kars
Hi Bill,
you should not use prefix-query (*), because in first step lucene would
generate a list of all terms in this field, and than search for all this
terms. Which is senceless.
I would suggest to insert a new field "myFields" which contains as value the
names of all fields for this docum
Hi
Lucene range queries and filters work on string comparison, not
numeric. You'll need to pad out any numeric fields you want to use in
a range to a consistent length. There may be a class floating around
that does this - NumberUtils or NumberTools or something like that.
--
Ian.
On Mon, A
Hi! I've made a sample program for testing lucene :
package indexer;
import com.sun.xml.internal.bind.v2.schemagen.xmlschema.Occurs;
import com.sun.xml.internal.ws.util.StringUtils;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.Random;
import
Hello,
I am creating fields for documents like this:
String name = ...
String value = ...
doc.add(new Field(name, value, Field.Store.NO,
Field.Index.UN_TOKENIZED));
On the query side, sometimes I want to want to search for documents for
which a given field, say 'foo' is equal to a giv
Mark Miller wrote:
Mark Miller wrote:
Robert Stewart wrote:
Anyone else run on Windows? We have index around 26 GB in size.
Seems file system cache ends up taking up nearly all available RAM
(26 GB out of 32 GB on 64-bit box). Lucene process is around 5 GB,
so very little left over for que
Toke Eskildsen wrote:
Lucene process is around 5 GB, so very little left over for queries,
etc, and box starts swapping during searches.
Not so fine and also unexpected. Are you sure that what you're
seeing is
swapping and not just flushing of the write-cache? Are you observing
the
disk-a
On Sat, 2008-08-16 at 07:40 -0400, Robert Stewart wrote:
> Anyone else run on Windows? We have index around 26 GB in size.
> Seems file system cache ends up taking up nearly all available RAM
> (26 GB out of 32 GB on 64-bit box).
Sounds fine so far. If the RAM isn't used for anything else, the
On Mon, Aug 18, 2008 at 7:28 AM, blazingwolf7 <[EMAIL PROTECTED]>wrote:
>
> Thanks for the info. But do you know where this is actually perform in
> Lucene? I mean the method involved, that will calculate the value before
> storing it into the index. I track it to one method known as lengthNorm()
>
> payload and the other part for storing, i.e. something like this:
>>
>>Token token = new Token(...);
>>token.setPayload(...);
>>SingleTokenTokenStream ts = new SingleTokenTokenStream(token);
>>
>>Field f1 = new Field("f","some-stored-content",Store.YES,Index.NO);
>>Field f2
28 matches
Mail list logo