Benchmarking my indexer

2008-10-31 Thread Rafael Cunha de Almeida
Hello, I did an indexer that parses some files and indexes them using lucene. I want to benchmark the whole thing, so I'd like to count the tokens being indexed so I can calculate the average number of indexed tokens per second. Is there a way to count the number of tokens on a document? While I'm

Re: Lucene Payload

2008-10-31 Thread Grant Ingersoll
On Oct 30, 2008, at 7:28 PM, Anshul jain wrote: I want to give more weight to some terms in the document. Like title of the book should be given more weight than the contents. And we are testing over a wide varieties of lucene queries, with quotes, w/o quotes, phrase, span etc. If the w

Re: Exact Phrase Query

2008-10-31 Thread semelak ss
For indexing, I use the following: === writer = new IndexWriter(INDEX_DIR,new WhitespaceAnalyzer(),true ,IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); String tmpword = this.getProperForm(word1, word2); doc.add(new Field("WORDS", tmpword, Field.Store.YES,

Re: Document thread safe?

2008-10-31 Thread Glen Newton
Yes, the problem goes away when I do the following: synchronized(doc) { doc.add(field); } Thanks. [I'll use a Lock to do this properly] -glen 2008/10/31 Yonik Seeley <[EMAIL PROTECTED]>: > On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton <[EMAIL PROTECTED]> wrote: >> I have concurrent threads

Re: Document thread safe?

2008-10-31 Thread Yonik Seeley
On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton <[EMAIL PROTECTED]> wrote: > I have concurrent threads adding Fields to the same Document, but > getting some odd behaviour. > Before going into too much depth, is Document thread-safe? No, it's not. synchronizing on Document when adding a new field wo

Document thread safe?

2008-10-31 Thread Glen Newton
Hello, I am using Lucene 2.3.1. I have concurrent threads adding Fields to the same Document, but getting some odd behaviour. Before going into too much depth, is Document thread-safe? thanks, Glen http://zzzoot.blogspot.com/ -- - ---

RE: Any Spanish analyzer available?

2008-10-31 Thread Zhang, Lisheng
Thanks very much for your helps, I will inform if we can improve later in any way. Best regards, Lisheng -Original Message- From: Albert Juhe [mailto:[EMAIL PROTECTED] Sent: Friday, October 31, 2008 5:49 AM To: java-user@lucene.apache.org Subject: Re: Any Spanish analyzer available? Hi

Re: wizard for search in Lucene

2008-10-31 Thread Albert Juhe
Hi, This is my first version, it isn't fast, because I want to get this information without modifying index. Now I'm working to improve it (including freeling). public String docsTerme(IndexReader reader, String terme) { String resultat = ""; TermPositions tP; ArrayList a

Re: Exact Phrase Query

2008-10-31 Thread Erick Erickson
You need to give us more information for meaningful replies, like the analyzers you use when indexing and searching, the exact query you use, perhaps the snippets of the code, etc. That said, things to check: Get a copy of Luke and examine your index. You can even run queries through that tool and

Re: Exact Phrase Query

2008-10-31 Thread semelak ss
Was my message sent successfully ? I received this automated response from [EMAIL PROTECTED] right after sending the message !! === Dear sender, Delivery of your message has failed. This is an automatic reply. The domain magentanews.com has been changed and is longer in use. Please rese

Re: Any Spanish analyzer available?

2008-10-31 Thread Albert Juhe
Hi, Actually I'm using a Spanish analyzer for my search engine, I don't know if it's the best, but its useful for my purpose. http://www.nabble.com/file/p20265229/SpanishAnalyzer.java SpanishAnalyzer.java http://www.nabble.com/file/p20265229/SpanishStemFilter.java SpanishStemFilter.java http:/

Exact Phrase Query

2008-10-31 Thread semelak ss
I have documents containing multiple words in the the field "word" for example, one of the documents contain in the field "word" the following: homeowners work When searching for single words (i.e. homewoners ) I get hits. However, searching for the exact phrase "homeowners work" gives me no hits

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread PabloS
Thanks for the quick reply :). For now, I'd settle with just storing cache values in soft references so at least the GC would be able to free up some space when it needs to. I think I'll just try to override the default sorting mechanism by subclassing FieldSortedHitQueue. I'll let you know how i

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread Mark Miller
20 fields on a huge index? Wow - not sure there is a ton you can do with that...anyone have any suggestions for that one? Distributed should help I suppose, but thats a lot of sort fields for a large index. If LUCENE-831 ever gets off the ground you will be able to change the cache used, and p

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread PabloS
Hi, I'm having a similar problem with my application, although we are using lucene 2.3.2. The problem we have is that we are required to sort on most of the fields (20 at least). Is there any way of changing the cache being used? I can't seem to find a way, since the cache is being accessed using

Re: Read all the data from an index

2008-10-31 Thread Andrzej Bialecki
Erick Erickson wrote: I'm not sure what *could* be easier than looping with IndexSearcher.doc(), looping from 1 to maxDoc. Of course you'll have to pay some attention to whether you get a document back or not, and I'm not quite sure whether you'd have to worry about getting deleted documents. But