Unfortunately yes. It doesn't really have anything to do with the way you
access the index (I don't think). The fact is that the data is simply not
in the document. When you add the document again it is effectively
"re-indexed", so if the raw data of the field is empty, then it won't be
indexed
I can share the data.. but it would be quicker for you to just pull out some
random text from anywhere you like.
The issue is that the text was in an email, which was one of about 2,000 and
I don't know which one. I got the 4.5MB figure from the number of bytes in
the byte array reported in the
On Fri, 2006-08-11 at 01:58 +1000, Jason Polites wrote:
> Are your storing the contents of the fields in the index? That is,
> specifying Field.Store.YES when creating the field?
>
> In my experience fields which are not stored are not recoverable from the
> index (well.. they can be reconstructe
Hello Adrian,
>> I am indexing some text in a java object that is "%772B" with the
>> standard analyser and Lucene 2.
>>
>> Should I be able to search for this with the same text as the query, or
>> do I need to do any escaping of characters?
Besides Luke there are the AnalyzerUtils from the LIA
4. Search for records with filter.
if the filter returns a lot of ids, it willn' t be fast.
Recently I have a test. I customized a filter which get a list of ids from a
mysql database table of size 5000. Then I invoke the search(query, filter,
hitcollector), I took me more than 40s to retrieve th
I have a sample document which has about 4.5MB of text to be stored as
compressed data within the field, and the indexing of this document
seems to
take an inordinate amount of time (over 10 minutes!). When debugging I can
see that it's stuck on the deflate() calls of the Deflater used by Luc
Hi,
I'm facing similar problem. I found a possible way, how to copy a
part of index (w/o copy whole index,delete,optimize), but don't know how
to change/add/remove field (or add term vector in my case) to existing
index.
To copy a part of index override methods in IndexReader
/** Returns
Another thought is to index each paragraph as a separate document,
though you'd of course have to see how that fits with your other
searching needs.
Erik
On Aug 8, 2006, at 12:25 PM, Laurent Hoss wrote:
Hi
Suppose having an Index containing Lucene documents, having
multiple fiel
> On 8/10/06, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Sorting was introduced to Lucene before my time, so I don't know the
> reasons behind it. Maybe it was seen as non-optimial or non-core and
> so was kept out of the IndexReader.
>
> I admit, it does feel like the level of abstraction that Fie
Hey,
you don't actually need to store it, If you store the content of a
field you can later retrieve it like it used to be and display it may
be in a result list. If you have large content you can also store it
compressed (Field.Store.Compress). If you don't need the content in
any way just use Fi
Hi Russel, my apologies for the delayed response. I rather have all
correspondence on the mailing list, but to keep this mail thread readable I
put the files at http://cdoronc.awardspace.com/TfTermQuery . I hope it
helps you and would be interested in your comments.
Regards,
Doron
"Russell M. All
See below...
On 8/10/06, Pillinger, Adrian <[EMAIL PROTECTED]> wrote:
I am indexing some text in a java object that is "%772B" with the
standard analyser and Lucene 2.
Should I be able to search for this with the same text as the query, or
do I need to do any escaping of characters?
probabl
I have "assumed" I can't have two threads writing to the index
concurrently,
so have implemented my own read/write locking system. Are you saying I
don't need to bother with this? My reading of the doco suggests that you
shouldn't have two IndexWriters open on the same index.
I know that if I t
Hi.
I'm investigating a possibility to make a "join" in Lucene/Compass.
Here's the thread:
http://forums.opensymphony.com/thread.jspa?threadID=39685&tstart=0
I have records m:m entities. Entities hold indexed information. Records consist
of entities. One entity may belong to many records.
I w
Are your storing the contents of the fields in the index? That is,
specifying Field.Store.YES when creating the field?
In my experience fields which are not stored are not recoverable from the
index (well.. they can be reconstructed but it's a lossy process). So when
you retrieve the document,
I am indexing some text in a java object that is "%772B" with the
standard analyser and Lucene 2.
Should I be able to search for this with the same text as the query, or
do I need to do any escaping of characters?
Thanks
Adrian
-
This message (including a
Thanks for the Jira issue...
one question on your synchronization comment...
I have "assumed" I can't have two threads writing to the index concurrently,
so have implemented my own read/write locking system. Are you saying I
don't need to bother with this? My reading of the doco suggests that y
I'm not sure if it would help my particular situation, but is there
any way
to provide the option of specifying the compression level? The level
used
by Lucene (level 9) is the maximum possible compression level. Ideally I
would like to be able to alter the compression level on the basis of
I'm hoping I'm doing something wrong, because I've been impressed with
Lucene so far. The basic problem I'm seeing is that when I run the same
search several times against box A (with 1 RemoteSearchable), I see X
for an average search response time. When I run the same search several
times agains
On 8/10/06, Doron Cohen <[EMAIL PROTECTED]> wrote:
I have one more comment on the cache implementation. It feels to me
somewhat not right that a static system wide object (FieldCache.DEFAULT) is
managing the field caching for all the indexReaders in the JVM (possibly of
different indexes), when i
Hello Simon,
I have resolved my problem, I added Store.YES and Index.TOKENIZED, and
it goes.
thank you another time.
thanks.
Simon Willnauer a écrit :
I just tried it out and it worked like expected:
RAMDirectory d = new RAMDirectory();
IndexWriter w = new IndexWriter(d,new WhitespaceA
On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote:
> You say "Those documents that we updated are not searchable now". I've got
> to ask the obvious question, did you close and re-open the *searcher*
> (really, the indexreader you use in your searcher)? I suspect you have, but
> thought I'd a
I'm not sure if it would help my particular situation, but is there any way
to provide the option of specifying the compression level? The level used
by Lucene (level 9) is the maximum possible compression level. Ideally I
would like to be able to alter the compression level on the basis of th
You say "Those documents that we updated are not searchable now". I've got
to ask the obvious question, did you close and re-open the *searcher*
(really, the indexreader you use in your searcher)? I suspect you have, but
thought I'd ask explicitly.
I'd also get a copy of Luke (http://www.getopt.o
You're right, this is strange. I'm afraid that I'm now beyond my competence
so I'll just have to appeal to wiser heads than mine to help...
Best
Erick
On 8/10/06, Marcus Falck <[EMAIL PROTECTED]> wrote:
Hi again Erick.
Yes I know the hits exists in the index at all time.
I will illustrate ex
I just tried it out and it worked like expected:
RAMDirectory d = new RAMDirectory();
IndexWriter w = new IndexWriter(d,new WhitespaceAnalyzer(),true);
Document doc = new Document();
doc.add(new Field("field","title",Field.Store.YES,Field.Index.TOKENIZED ));
doc.add(new Field("fie
Hi Simon,
I see.. just curious about several techniques that come in my mind. Thanks
for your insight Simon.
Regards,
Feris
On 8/10/06, Simon Willnauer <[EMAIL PROTECTED]> wrote:
You can just put your documents in a queue and access the index within
one single thread?! All your analysis can
Hello all,
I am experiencing some performance problems indexing large(ish) amounts of
text using the IndexField.Store.COMPRESS option when creating a Field in
Lucene.
I have a sample document which has about 4.5MB of text to be stored as
compressed data within the field, and the indexing of this
The probl add(new Field( fieldName(), fieldValue, Field.Store,
Field.Index));
and I use the WhiteSpaceAnalyser, but my problem is can I index a field
with value as "title" it goes, and can I index with value as "2006" it
doesn't go.
Why, I don't know
thanks
Simon Willnauer a écrit :
could y
could you provide a bit more info on your index process?
(analyzer,Field, Store, Index)
regards simon
On 8/10/06, ould sid'ahmed <[EMAIL PROTECTED]> wrote:
Hello,
I don't know why it don't index the number values, I look with Luke
Lucene, I founded that values numerics didn't indexed.
can you
Hello,
I don't know why it don't index the number values, I look with Luke
Lucene, I founded that values numerics didn't indexed.
can you know what the problem?
thanks
Simon Willnauer a écrit :
Well your digits might be lost during analysis like Erik said. Check
out with luke whats in your in
Hi Chris,
I investigated that way too, but I don't know how to do it.
I have a query that searches two words. This query finds both words at two
documents, with the difference that one of the words appears twice at the
first document whereas at the second documents the two words appear only
on
Hi Deepan, The steps below seems correct, given that all the fields of the
original document are also stored - the javadoc for
indexReader.document(int n) (which I assume is what you are using) says: "
Returns the stored fields of the nth Document in this index." - so, only
stored fields would exis
Hi all,
we have recently noticed that doing a locale sensitive sort on a field that
is missing from some docs causes an NPE inside the call to Collator#compare
at FieldSortedHitQueue line 320 (Lucene 2.0 src):
static ScoreDocComparator comparatorStringLocale (final IndexReader reader,
final Stri
Hi again Erick.
Yes I know the hits exists in the index at all time.
I will illustrate exactly with approximently values for the hits.length():
Mergefactor 10.
MinMergeDocs 5000.
Searching for a very common Swedish word ("han" which equals to "he" in
English).
Indexing 10 docs.
After 100
> [EMAIL PROTECTED] wrote on 09/08/2006 20:32:20:
> > Heh... interfaces strike again.
> >
> > Well then since we *know* that no one has their own implementation
> > (because they would not have been able to register it), we
> should be
> > able to safely upgrade the interface to a class (anyone
36 matches
Mail list logo