Re: updating document

2006-08-10 Thread Jason Polites
Unfortunately yes. It doesn't really have anything to do with the way you access the index (I don't think). The fact is that the data is simply not in the document. When you add the document again it is effectively "re-indexed", so if the raw data of the field is empty, then it won't be indexed

Re: Field compression too slow

2006-08-10 Thread Jason Polites
I can share the data.. but it would be quicker for you to just pull out some random text from anywhere you like. The issue is that the text was in an email, which was one of about 2,000 and I don't know which one. I got the 4.5MB figure from the number of bytes in the byte array reported in the

Re: updating document

2006-08-10 Thread Deepan Chakravarthy
On Fri, 2006-08-11 at 01:58 +1000, Jason Polites wrote: > Are your storing the contents of the fields in the index? That is, > specifying Field.Store.YES when creating the field? > > In my experience fields which are not stored are not recoverable from the > index (well.. they can be reconstructe

Re: Special characters

2006-08-10 Thread Martin Braun
Hello Adrian, >> I am indexing some text in a java object that is "%772B" with the >> standard analyser and Lucene 2. >> >> Should I be able to search for this with the same text as the query, or >> do I need to do any escaping of characters? Besides Luke there are the AnalyzerUtils from the LIA

Re: SQL-Like Join in Lucene

2006-08-10 Thread hu andy
4. Search for records with filter. if the filter returns a lot of ids, it willn' t be fast. Recently I have a test. I customized a filter which get a list of ids from a mysql database table of size 5000. Then I invoke the search(query, filter, hitcollector), I took me more than 40s to retrieve th

Re: Field compression too slow

2006-08-10 Thread Michael McCandless
I have a sample document which has about 4.5MB of text to be stored as compressed data within the field, and the indexing of this document seems to take an inordinate amount of time (over 10 minutes!). When debugging I can see that it's stuck on the deflate() calls of the Deflater used by Luc

Re: updating document

2006-08-10 Thread Karel Tejnora
Hi, I'm facing similar problem. I found a possible way, how to copy a part of index (w/o copy whole index,delete,optimize), but don't know how to change/add/remove field (or add term vector in my case) to existing index. To copy a part of index override methods in IndexReader /** Returns

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-10 Thread Erik Hatcher
Another thought is to index each paragraph as a separate document, though you'd of course have to see how that fits with your other searching needs. Erik On Aug 8, 2006, at 12:25 PM, Laurent Hoss wrote: Hi Suppose having an Index containing Lucene documents, having multiple fiel

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-10 Thread Doron Cohen
> On 8/10/06, Doron Cohen <[EMAIL PROTECTED]> wrote: > Sorting was introduced to Lucene before my time, so I don't know the > reasons behind it. Maybe it was seen as non-optimial or non-core and > so was kept out of the IndexReader. > > I admit, it does feel like the level of abstraction that Fie

Re: research lucene

2006-08-10 Thread Simon Willnauer
Hey, you don't actually need to store it, If you store the content of a field you can later retrieve it like it used to be and display it may be in a result list. If you have large content you can also store it compressed (Field.Store.Compress). If you don't need the content in any way just use Fi

RE: Scoring a document (count?)

2006-08-10 Thread Doron Cohen
Hi Russel, my apologies for the delayed response. I rather have all correspondence on the mailing list, but to keep this mail thread readable I put the files at http://cdoronc.awardspace.com/TfTermQuery . I hope it helps you and would be interested in your comments. Regards, Doron "Russell M. All

Re: Special characters

2006-08-10 Thread Erick Erickson
See below... On 8/10/06, Pillinger, Adrian <[EMAIL PROTECTED]> wrote: I am indexing some text in a java object that is "%772B" with the standard analyser and Lucene 2. Should I be able to search for this with the same text as the query, or do I need to do any escaping of characters? probabl

Re: Field compression too slow

2006-08-10 Thread Michael McCandless
I have "assumed" I can't have two threads writing to the index concurrently, so have implemented my own read/write locking system. Are you saying I don't need to bother with this? My reading of the doco suggests that you shouldn't have two IndexWriters open on the same index. I know that if I t

SQL-Like Join in Lucene

2006-08-10 Thread Aleksei Valikov
Hi. I'm investigating a possibility to make a "join" in Lucene/Compass. Here's the thread: http://forums.opensymphony.com/thread.jspa?threadID=39685&tstart=0 I have records m:m entities. Entities hold indexed information. Records consist of entities. One entity may belong to many records. I w

Re: updating document

2006-08-10 Thread Jason Polites
Are your storing the contents of the fields in the index? That is, specifying Field.Store.YES when creating the field? In my experience fields which are not stored are not recoverable from the index (well.. they can be reconstructed but it's a lossy process). So when you retrieve the document,

Special characters

2006-08-10 Thread Pillinger, Adrian
I am indexing some text in a java object that is "%772B" with the standard analyser and Lucene 2. Should I be able to search for this with the same text as the query, or do I need to do any escaping of characters? Thanks Adrian - This message (including a

Re: Field compression too slow

2006-08-10 Thread Jason Polites
Thanks for the Jira issue... one question on your synchronization comment... I have "assumed" I can't have two threads writing to the index concurrently, so have implemented my own read/write locking system. Are you saying I don't need to bother with this? My reading of the doco suggests that y

Re: Field compression too slow

2006-08-10 Thread Michael McCandless
I'm not sure if it would help my particular situation, but is there any way to provide the option of specifying the compression level? The level used by Lucene (level 9) is the maximum possible compression level. Ideally I would like to be able to alter the compression level on the basis of

remote multiSearching not scaling well

2006-08-10 Thread Haines, Ronald C. \(LNG-DAY\)
I'm hoping I'm doing something wrong, because I've been impressed with Lucene so far. The basic problem I'm seeing is that when I run the same search several times against box A (with 1 RemoteSearchable), I see X for an average search response time. When I run the same search several times agains

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-10 Thread Yonik Seeley
On 8/10/06, Doron Cohen <[EMAIL PROTECTED]> wrote: I have one more comment on the cache implementation. It feels to me somewhat not right that a static system wide object (FieldCache.DEFAULT) is managing the field caching for all the indexReaders in the JVM (possibly of different indexes), when i

Re: research lucene

2006-08-10 Thread ould sid'ahmed
Hello Simon, I have resolved my problem, I added Store.YES and Index.TOKENIZED, and it goes. thank you another time. thanks. Simon Willnauer a écrit : I just tried it out and it worked like expected: RAMDirectory d = new RAMDirectory(); IndexWriter w = new IndexWriter(d,new WhitespaceA

Re: updating document

2006-08-10 Thread Deepan Chakravarthy
On Thu, 2006-08-10 at 09:16 -0400, Erick Erickson wrote: > You say "Those documents that we updated are not searchable now". I've got > to ask the obvious question, did you close and re-open the *searcher* > (really, the indexreader you use in your searcher)? I suspect you have, but > thought I'd a

Re: Field compression too slow

2006-08-10 Thread Michael McCandless
I'm not sure if it would help my particular situation, but is there any way to provide the option of specifying the compression level? The level used by Lucene (level 9) is the maximum possible compression level. Ideally I would like to be able to alter the compression level on the basis of th

Re: updating document

2006-08-10 Thread Erick Erickson
You say "Those documents that we updated are not searchable now". I've got to ask the obvious question, did you close and re-open the *searcher* (really, the indexreader you use in your searcher)? I suspect you have, but thought I'd ask explicitly. I'd also get a copy of Luke (http://www.getopt.o

Re: Lucene hits.length()

2006-08-10 Thread Erick Erickson
You're right, this is strange. I'm afraid that I'm now beyond my competence so I'll just have to appeal to wiser heads than mine to help... Best Erick On 8/10/06, Marcus Falck <[EMAIL PROTECTED]> wrote: Hi again Erick. Yes I know the hits exists in the index at all time. I will illustrate ex

Re: research lucene

2006-08-10 Thread Simon Willnauer
I just tried it out and it worked like expected: RAMDirectory d = new RAMDirectory(); IndexWriter w = new IndexWriter(d,new WhitespaceAnalyzer(),true); Document doc = new Document(); doc.add(new Field("field","title",Field.Store.YES,Field.Index.TOKENIZED )); doc.add(new Field("fie

Re: Tomcat : Indexing Sample ?

2006-08-10 Thread Feris Thia
Hi Simon, I see.. just curious about several techniques that come in my mind. Thanks for your insight Simon. Regards, Feris On 8/10/06, Simon Willnauer <[EMAIL PROTECTED]> wrote: You can just put your documents in a queue and access the index within one single thread?! All your analysis can

Field compression too slow

2006-08-10 Thread Jason Polites
Hello all, I am experiencing some performance problems indexing large(ish) amounts of text using the IndexField.Store.COMPRESS option when creating a Field in Lucene. I have a sample document which has about 4.5MB of text to be stored as compressed data within the field, and the indexing of this

Re: research lucene

2006-08-10 Thread ould sid'ahmed
The probl add(new Field( fieldName(), fieldValue, Field.Store, Field.Index)); and I use the WhiteSpaceAnalyser, but my problem is can I index a field with value as "title" it goes, and can I index with value as "2006" it doesn't go. Why, I don't know thanks Simon Willnauer a écrit : could y

Re: research lucene

2006-08-10 Thread Simon Willnauer
could you provide a bit more info on your index process? (analyzer,Field, Store, Index) regards simon On 8/10/06, ould sid'ahmed <[EMAIL PROTECTED]> wrote: Hello, I don't know why it don't index the number values, I look with Luke Lucene, I founded that values numerics didn't indexed. can you

Re: research lucene

2006-08-10 Thread ould sid'ahmed
Hello, I don't know why it don't index the number values, I look with Luke Lucene, I founded that values numerics didn't indexed. can you know what the problem? thanks Simon Willnauer a écrit : Well your digits might be lost during analysis like Erik said. Check out with luke whats in your in

Re: custom sort

2006-08-10 Thread Enrique Lamas
Hi Chris, I investigated that way too, but I don't know how to do it. I have a query that searches two words. This query finds both words at two documents, with the difference that one of the words appears twice at the first document whereas at the second documents the two words appear only on

Re: updating document

2006-08-10 Thread Doron Cohen
Hi Deepan, The steps below seems correct, given that all the fields of the original document are also stored - the javadoc for indexReader.document(int n) (which I assume is what you are using) says: " Returns the stored fields of the nth Document in this index." - so, only stored fields would exis

NPE when sorting on a field that is missing from a doc

2006-08-10 Thread Oliver Hutchison
Hi all, we have recently noticed that doing a locale sensitive sort on a field that is missing from some docs causes an NPE inside the call to Collator#compare at FieldSortedHitQueue line 320 (Lucene 2.0 src): static ScoreDocComparator comparatorStringLocale (final IndexReader reader, final Stri

SV: Lucene hits.length()

2006-08-10 Thread Marcus Falck
Hi again Erick. Yes I know the hits exists in the index at all time. I will illustrate exactly with approximently values for the hits.length(): Mergefactor 10. MinMergeDocs 5000. Searching for a very common Swedish word ("han" which equals to "he" in English). Indexing 10 docs. After 100

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-08-10 Thread Oliver Hutchison
> [EMAIL PROTECTED] wrote on 09/08/2006 20:32:20: > > Heh... interfaces strike again. > > > > Well then since we *know* that no one has their own implementation > > (because they would not have been able to register it), we > should be > > able to safely upgrade the interface to a class (anyone