updating document

2006-08-09 Thread Deepan Chakravarthy
Hi, We have to update few documents in our index. We have add a additional field to them. We did as follows 1)read the documents of our interest using IndexReader 2)copy them to a temporary doc object (temp_doc) 3)delete the document in the index 4)close the IndexReader 5)open the IndexWriter 6)

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Doron Cohen
[EMAIL PROTECTED] wrote on 09/08/2006 20:32:20: > Heh... interfaces strike again. > > Well then since we *know* that no one has their own implementation > (because they would not have been able to register it), we should be > able to safely upgrade the interface to a class (anyone want to supply >

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Doron Cohen
[EMAIL PROTECTED] wrote on 09/08/2006 11:22:12: > Assuming "field" wasn't being used to synchronize on something else, > this would still block *all* IndexReaders/Searchers trying to sort on > that field. > > In Solr, it would make the situation worse. If I had my warmed-up > IndexSearcher serving

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Yonik Seeley
On 8/9/06, Oliver Hutchison <[EMAIL PROTECTED]> wrote: > Well, there's FieldCache.DEFAULT I thought the exact same thing but what I'd forgotten was that all fields on an interface are implicitly final. Heh... interfaces strike again. Well then since we *know* that no one has their own impleme

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Oliver Hutchison
> Ah, right... I browsed your code a bit too fast. It looks fine. Great. > > On a related note it would be great if there was a way to plug a > > custom FieldCache implementation into Lucene, given there is a > > FieldCache interface it's a shame there's no way to > actually provide > > an

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Yonik Seeley
On 8/9/06, Oliver Hutchison <[EMAIL PROTECTED]> wrote: Yonik, > most easily implemented in Java5 via Future. I didn't use Java5 as I had a feeling that code is Lucene needs to compile on Java1.3 right? Lucene 2 currently requires Java 1.4 It was really just a side comment - people have imple

Re: Lucene hits.length()

2006-08-09 Thread Chris Hostetter
: I think, but am not certain (chime in here guys) that this is expected : behavior. As I remember from various threads, internally indexing uses a : RAMdir to accumulate data until it merges it with the FSDir. Since the : searcher and indexer are separate, I assume that the searcher is looking at

Re: custom sort

2006-08-09 Thread Chris Hostetter
what you want is not a customized sort as much as a customized Score .. scores can be customized by modifying your Similarity, class -- LIA has some good info on this, but the best way to figure out what you want may be to start by creating your own Similarity class and then look at the search.exp

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-09 Thread Chris Hostetter
: > You could do this with the current query parser by putting large : > position increment gaps between paragraphs that is guaranteed to be : > larger than the largest paragraph. Then you could use a sloppy phrase : > query : > "word1 word2"~1 for instance. : Unfortunatelly this only makes s

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Oliver Hutchison
Yonik, > most easily implemented in Java5 via Future. I didn't use Java5 as I had a feeling that code is Lucene needs to compile on Java1.3 right? > I don't think you need two maps though, right? just stick a > placeholder in the outer map. I'm using 2 maps mainly because it simplifies the

Re: Tomcat : Indexing Sample ?

2006-08-09 Thread Simon Willnauer
You can just put your documents in a queue and access the index within one single thread?! All your analysis can take part in other threads, If one has finished dump the Document in the queue and keep you index writer busy, that's a good Idea anyway. I guess you don't need an example for that don

Re: IndexReader.getTermFreqVector penality

2006-08-09 Thread Amit Kumar
Yes thanks Grant I realize that if I need the term freq in all the documents I could use TermEnum, but I have a use case where I may need term frequencies of only selected documents, and the worst case scenario might be term freq for n-1 documents, where n is the total number of documents in

Re: IndexReader.getTermFreqVector penality

2006-08-09 Thread Grant Ingersoll
Hi Amit, If you want all the freqs of all the terms (or even just some of the terms) in all documents, you don't need to use Term Vectors, take a look at TermEnum and TermDocs. If you want for specific documents, then you do need Term Vectors. You may get some CPU ticks by only keeping P

Tomcat : Indexing Sample ?

2006-08-09 Thread Feris Thia
Hi All, I'm a newbie to Lucene and would like to use thread instance to index my office document I've uploaded within web application. Is Lucene can face concurrent indexing issue ? Is there any jsp/servlet sample that can show me how to do that ? Regards, Feris

IndexReader.getTermFreqVector penality

2006-08-09 Thread Amit Kumar
Hi Lucene Users, I am using the lucene indices to get term frequencies. I just wanted to check with you about the time it is taking to retrieve these term freq. Please suggest if I can improve the code/index or if this is expected. It takes 8 to 9 seconds to retrieve the term freq values of

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Yonik Seeley
Definitely the right track Oliver... it's called a blocking map (most easily implemented in Java5 via Future). I don't think you need two maps though, right? just stick a placeholder in the outer map. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server On 8/9/06,

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Yonik Seeley
On 8/9/06, Doron Cohen <[EMAIL PROTECTED]> wrote: public StringIndex getStringIndex (IndexReader reader, String field) throws IOException { field = field.intern(); synchronize(field) { // < --- line added Object ret = lookup (reader, field, STRING_INDEX, null); if

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Yonik Seeley
On 8/8/06, Oliver Hutchison <[EMAIL PROTECTED]> wrote: > The nature of the field cache itself means that the first > sort on a particular field can take a long, long time. > Synchronization won't really help that much. I think you may be misunderstanding my description (probably because it was n

Re: Lucene hits.length()

2006-08-09 Thread Erick Erickson
I think, but am not certain (chime in here guys) that this is expected behavior. As I remember from various threads, internally indexing uses a RAMdir to accumulate data until it merges it with the FSDir. Since the searcher and indexer are separate, I assume that the searcher is looking at the sna

Re: research lucene

2006-08-09 Thread Simon Willnauer
Well your digits might be lost during analysis like Erik said. Check out with luke whats in your index (Field.Store.Yes) and see if your analyzer removes the digits. SimpleAnalyzer removes them but StandartAnalyzer keeps the digits. regards simon On 8/9/06, ould sid'ahmed <[EMAIL PROTECTED]> wro

Re: research lucene

2006-08-09 Thread ould sid'ahmed
Simon Willnauer a écrit : You should rather explain what you expect from indexing your number not as string values. best regards simon On 8/9/06, ould sid'ahmed <[EMAIL PROTECTED]> wrote: Erick Erickson a écrit : > What analyzers are you using for both indexing and searching? Some > analyzers

Re: research lucene

2006-08-09 Thread Simon Willnauer
You should rather explain what you expect from indexing your number not as string values. best regards simon On 8/9/06, ould sid'ahmed <[EMAIL PROTECTED]> wrote: Erick Erickson a écrit : > What analyzers are you using for both indexing and searching? Some > analyzers > strip out numbers and som

Re: research lucene

2006-08-09 Thread ould sid'ahmed
Erick Erickson a écrit : What analyzers are you using for both indexing and searching? Some analyzers strip out numbers and some don't. I'd start with WhitespaceAnalyzer, and index your fields UN_TOKENIZED and work up to the other analyzers and/or tokenizations from there. Under any circumstanc

SV: Lucene hits.length()

2006-08-09 Thread Marcus Falck
Still worried =) You see it doesn't update the hits.length() in a correct way when I create a new searcher. The correct update does just occur in the merges. =/ -Ursprungligt meddelande- Från: Erick Erickson [mailto:[EMAIL PROTECTED] Skickat: den 9 augusti 2006 15:34 Till: java-user@luce

Re: Lucene hits.length()

2006-08-09 Thread Erick Erickson
Then you won't see anything added to your index between times. Does this identify your problem or are you still worried? Erick On 8/9/06, Marcus Falck <[EMAIL PROTECTED]> wrote: I'm opening a new searcher every 3:rd minute. -Ursprungligt meddelande- Från: Erick Erickson [mailto:[EMAIL

Re: research lucene

2006-08-09 Thread Erick Erickson
What analyzers are you using for both indexing and searching? Some analyzers strip out numbers and some don't. I'd start with WhitespaceAnalyzer, and index your fields UN_TOKENIZED and work up to the other analyzers and/or tokenizations from there. Under any circumstances, you really, really, rea

SV: Lucene hits.length()

2006-08-09 Thread Marcus Falck
I'm opening a new searcher every 3:rd minute. -Ursprungligt meddelande- Från: Erick Erickson [mailto:[EMAIL PROTECTED] Skickat: den 8 augusti 2006 18:58 Till: java-user@lucene.apache.org Ämne: Re: Lucene hits.length() I'll take a stab at it When are you opening/closing your searcher

custom sort

2006-08-09 Thread Enrique Lamas
Hi, I want to execute a query searching a few terms QueryParser queryParser = new MultiFieldQueryParser(new String[] {"tags", "title"}, ProcessConstants.analyzer); Query query = queryParser.parse("rocio ortega"); and I want to obtain the results sorted by the number of founded terms, but not c

research lucene

2006-08-09 Thread ould sid'ahmed
Hello, I don't arrive to get result from the field have a value numeric for example "date=2005" or "title=900", I have indexed the fields "date" with String value. I want know why? Can you help me? thanks. - To unsubscribe,

RE: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Oliver Hutchison
Otis, Doron, thanks for the feedback. First up I'd just like to say that I totally agree with Doron on this - any attempt to fix this issue needs to be done using as fine grain synchronization as is possible or you'd just be introducing a new bottle neck. It terms of the level of granularity, t

Re: "Field Grouping" query restrained to same field on a 'multi'-field'

2006-08-09 Thread Laurent Hoss
Yonik Seeley wrote: On 8/8/06, Laurent Hoss wrote: Suppose having an Index containing Lucene documents, having multiple fields (equally) named 'paragraph'. Now I want to make a "Field Grouping" query (described in: http://lucene.apache.org/java/docs/queryparsersyntax.html ) "paragraph:( word1 A

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-09 Thread Doron Cohen
Hi Otis, I think that synchronizing the entire method would be an overkill - instead it would be sufficient to synchronize on a "by field" object so that only if two requests for the same "cold/missing" field are racing, one of them would wait for the other to complete loading that field. I think