What are the best document edit options?

2008-12-16 Thread Thomas J. Buhr
Hello Lucene, Looking at the document object it seems like each time I want to edit its contents I need to do the following: 1 - fetch the document 2 - dump its contents into a temp container 3 - update field values in the temp container 4 - create a new document 5 - transfer my updated field

Re: Unique results in BooleanQuery

2008-12-16 Thread Jay Joel Malaluan
Let me expound more on the question. Will the q1 be run on the BooleanQuery q2 and append the results that are not equal to the result of the first query of q2? From: Jay Joel Malaluan To: java-user@lucene.apache.org Sent: Wednesday, December 17, 2008 2:42:06

Re: Unique results in BooleanQuery

2008-12-16 Thread Jay Joel Malaluan
Hi Paul, But will the q1 be run on the BooleanQuery q2 or q1 is just used for filtering? Regards, Jay Malaluan From: Paul Cowan To: java-user@lucene.apache.org Sent: Wednesday, December 17, 2008 1:37:15 PM Subject: Re: Unique results in BooleanQuery Hi

Combining results of multiple indexes

2008-12-16 Thread Preetham Kajekar
Hi, I am new to Lucene. I am not using it as a pure text indexer. I am trying to index a Java object which has about 10 fields (like id, time, srcIp, dstIp) - most of them being numerical values. In order to speed up indexing, I figured that having two separate indexers, each of them indexing d

Re: Unique results in BooleanQuery

2008-12-16 Thread Paul Cowan
Hi Jay, Anyone knowledgeable on how to get unique hits using the BooleanQuery? If I have 2 queries so the when the 1st query is processed then the 2nd query will not anymore return the same results from the 1st query. Do you mean you want to run two separate queries -- get all the results fr

Unique results in BooleanQuery

2008-12-16 Thread Jay Malaluan
Hi, Anyone knowledgeable on how to get unique hits using the BooleanQuery? If I have 2 queries so the when the 1st query is processed then the 2nd query will not anymore return the same results from the 1st query. Regards, Jay Malaluan -- View this message in context: http://www.nabble.com/

Re: IDF scoring issue

2008-12-16 Thread Anshum
Hi Rajiv, If 'm interpreting your problem correctly, I'd suggest you to try using a phraseQuery with an appropriate slop value. Though again it depends on what is it that you exactly are trying to fetch. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to e

Re: IDF scoring issue

2008-12-16 Thread Rajiv2
To answer your questions, 1. there are only two words in the document I'm searching -- city and state abbrev. lowercased and analyzed by whitespaceanalyzer 2. the only field and default field is text, so the query becomes text: fleming text:roofing txt:inc. ...etc. Using query operator AND inst

Re: IDF scoring issue

2008-12-16 Thread Erick Erickson
Note a couple of things: 1> how a doc scores also takes into account how many other words are in the field you're querying on. 2> Is "text" your default field? Because what you posted is really searching text:fleming :roofing :inc.. Not also the implicit OR between each of them.

IDF scoring issue

2008-12-16 Thread Rajiv2
Hello, I'm using the default lucene Queryparser on the search text : fleming roofing inc., marietta ga These items are in my index. doc 1: fleming ga doc 2: marietta ga doc 3: marietta il doc 4: marietta ok doc 5: marietta ok doc 6: fleming pa The first match is always "fleming ga" even thoug

Re: Inquiry on Lucene Stemming

2008-12-16 Thread Erick Erickson
I'd ask the client why stemming wouldn't work . I've spent fr too much time in my life doing useless things "because the client asked". Really, ask for the use cases where that is really required and that stemming wouldn't cover. But you're right that Lucene doesn't have such a facility or API

Cache Used by IndexReader/IndexSearcher

2008-12-16 Thread Sangrish
Hi All, I have a 50 GB index of about 40 million documents. I need to query it around 40,000 times(different queries) one by one. I saw that the query times are negligible for the first, say 25,000 queries, but it degrades later on. For example, the time for 200 sequential queries chang

Re: Inquiry on Lucene Stemming

2008-12-16 Thread Jay Joel Malaluan
Hi Erick, Well some client inquiries if it's possible to expand such simple words and does Lucene have an API for this logic? Because all I read was the stemming logic for Lucene was the other way around which is, example "flashing" it will be trimmed to the root word "flash" when searched.

Re: Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread Andrzej Bialecki
rahul_k123 wrote: Thanks for the response. I guess this is the problem, but not sure whether it happens on optimize. This is what happening exactly the field is still present (not null) and is marked as binary, but the data is not there - Field.getBinaryLength() returns 0. It may or it may no

Re: Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread rahul_k123
Thanks for the response. I guess this is the problem, but not sure whether it happens on optimize. This is what happening exactly the field is still present (not null) and is marked as binary, but the data is not there - Field.getBinaryLength() returns 0. Andrzej Bialecki wrote: > > rahul_k

Re: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-16 Thread 1world1love
OK, a little more information: I run this query via a java stored procedure within Oracle. However, I just ran the same query using the same code compiled in a separate class from a CL on a different server that has the same filesystem mounted. The queries ran fine from there. So I am wondering

Re: replication question

2008-12-16 Thread Michael McCandless
It's better to use SnapshotDeletionPolicy to grab a consistent image of the index. You don't need to close the IndexWriter, nor stop making changes through IndexWriter, and it lets you capture a given segments_N (and all index files it needs) and then take your time making a copy/backup/

Re: Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread Andrzej Bialecki
rahul_k123 wrote: The data was indexed using 2.3.1 as follows doc.add(new Field(Fields.DETAILS, byte[] bytes, Field.Store.YES)); When i reindex this particular item using 2.4 and when i try to retrive it, it works fine. Then for the items which were indexed using 2.3.1 and not rei

Re: replication question

2008-12-16 Thread Michael Stoppelman
Hi Yonik, Thanks for the response. reply inline. On Tue, Dec 16, 2008 at 6:44 AM, Yonik Seeley wrote: > On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman > wrote: > > I've got a question from Doug's original email about replication ( > > http://www.mail-archive.com/lucene-u...@jakarta.apach

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Ian Lea
A follow up message to the one you mention suggests that something be added to the Lucene FAQ and it was, with a link to the 1.9 apidocs which shows the deprecated methods and alternatives. http://lucene.apache.org/java/1_9_1/api/ -- Ian. On Tue, Dec 16, 2008 at 7:09 PM, Oleg Oltar wrote: > I

Document.getBinaryValue returning null after upgrading to 2.4 for the data which was indexed using 2.3.1

2008-12-16 Thread rahul_k123
The data was indexed using 2.3.1 as follows doc.add(new Field(Fields.DETAILS, byte[] bytes, Field.Store.YES)); When i reindex this particular item using 2.4 and when i try to retrive it, it works fine. Then for the items which were indexed using 2.3.1 and not reindexed using 2.4 t

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Oleg Oltar
I am trying to fix my code now. I am using the http://markmail.org/message/4jupw4wnjn3gv7wh Replace all Field.Keyword/UnStored/Text/UnIndexed with the enumerated types, e.g.: - doc.add(Field.Keyword("animal", animal)); + doc.add(new Field("animal", animal, Field.Store.YES, Field.Index.UN_TOKENIZE

help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-16 Thread 1world1love
Greetings all. I am having an issue that is driving me mad. I have many indexes ranging in size from 500K docs to 40mil docs. When I do a simple query containing multiple terms on any of the indexes, I get this: java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.util.ScorerDoc

Default and optimal use of RAMDirectory

2008-12-16 Thread Joseph.Syjuco
Hi all, First of I'd like to say I'm quite pleased to be a part of this mailing list - its even more exciting to know that we have Otis G. and Erik H., authors of (at least in my opinion) the Lucene Bible - Lucene in Action, actively answering all these inquiries =) We're currently in the initia

Re: newbie question on querying on multiple attributes

2008-12-16 Thread Hardy Ferentschik
Hi, instead of the ClassBridge you can just annotate all the properties you want to index with @Field and build a BooleanQuery out of the input field. Indexing the properties into separate document fields is probably more extendable in the future when you for example only want to search on

Re: newbie question on querying on multiple attributes

2008-12-16 Thread Stephane Nicoll
Consider the use of the ClassBridge in Hibernate Search. Very useful. It basically allows you to merge multiple fields of your hibernate entity into a single lucene field. Once this is done, you can query this single field from lucene without the need for BooleanQuery. HTH, Stéphane On Tue, Dec

newbie question on querying on multiple attributes

2008-12-16 Thread Doug Leeper
I am using Hibernate as my persistence layer and have recently found Hibernate Search and Lucene as a possible solution to my full text search. However, I am a little fuzzy on what exactly needs to be done in my situation. In a nutshell, I have a Business object that has multiple of attributes t

Order of fields returned by Document.getFields()

2008-12-16 Thread Patrick Johnstone
I'm using Lucene via Solr and recently upgraded from an early Summer nightly build to the released version of Solr 1.3 (which seems to use something in the neighborhood of Lucene 2.3). I'm posting this here because I believe that my issue is with Lucene, not Solr. After the upgrade, I noticed th

Re: replication question

2008-12-16 Thread Yonik Seeley
On Tue, Dec 16, 2008 at 1:04 AM, Michael Stoppelman wrote: > I've got a question from Doug's original email about replication ( > http://www.mail-archive.com/lucene-u...@jakarta.apache.org/msg12709.html): > > "1. On the index master, periodically checkpoint the index. Every minute or > so the Inde

Re: Inquiry on Lucene Stemming

2008-12-16 Thread Erick Erickson
Why do you want to do this? The reason I ask is that you're making each clause very complex. For a single term, it's not very complex, but for something like ((A AND B) OR (C AND D)) NOT X expanding A, B, C, D and X to, possibly many terms is...er...ugly. You could think about ngrams, althou

Re: Lucene Data Structures

2008-12-16 Thread Erick Erickson
I question whether you *can* make this decision based upon the data structure being used. I can code such that *any* data structure you care to name will not perform well under some conditions . Not to mention the other characteristics of a search engine that get in the way of even the very most e

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher
On Dec 16, 2008, at 6:57 AM, Oleg Oltar wrote: Also maybe there are some free manuals/articles that you can recommend for starters? There's a bunch of stuff listed here: Lucene has been changing so rapidly lately that I'm not aware of any

Re: searched terms frequency question

2008-12-16 Thread Michael McCandless
There was a similar question recently on java-user (I haven't tried to find it). I think to do this efficiently it'd be best to make your own Query impl that tracks this information as its scoring. Mike john smith wrote: Hi Each document found in a Lucene index contains scoring inform

searched terms frequency question

2008-12-16 Thread john smith
Hi Each document found in a Lucene index contains scoring information however it doesn't provide (in the same easy way as scoring) an information about a number of occurrences of searched terms in its contents. Using Lucene API I can check each searched term against its frequecies in each found do

Re: Inquiry on Lucene Stemming

2008-12-16 Thread mathieu
you stem the search query and while indexing, so only "flash" is indexed when "flashing" is read. If you don't wont to hurt your index with half word, you can use a second index, just like for spelling : http://blog.garambrogne.net/index.php?post/2008/03/07/A-lexicon-approach-for-Lucene-index M.

searched terms frequency question

2008-12-16 Thread john smith
Hi Each document found in a Lucene index contains scoring information however it doesn't provide (in the same easy way as scoring) an information about a number of occurrences of searched terms in its contents. Using Lucene API I can check each searched term against its frequecies in each foun

Inquiry on Lucene Stemming

2008-12-16 Thread Jay Joel Malaluan
Hi, Can anyone comment if my understanding of the stemming process in Lucene is correct. From my testing using the SnowballAnalyzer, if I passed this word "flashing" it will be trimmed to a root word "flash" and this root word ("flash") will be the one searched not the original word "flashing"

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Oleg Oltar
Also maybe there are some free manuals/articles that you can recommend for starters? On Tue, Dec 16, 2008 at 1:08 PM, Oleg Oltar wrote: > Thanks!!! > I didn't expect to get such quick answers. Just let me try to fix it :) > > > On Tue, Dec 16, 2008 at 12:56 PM, Erik Hatcher > wrote: > >> >> On

Re: Need Opinion!!

2008-12-16 Thread Grant Ingersoll
Hi Shardul, I just was w/ a client who pretty much had the exact same scenario and Lucene (actually Solr) was just fine. In fact, I think you could have a prototype of this up and running in Solr (Lucene-based Search Server) in a day or two. As for performance, given each record is like

Re: Need Opinion!!

2008-12-16 Thread Ian Lea
Hi Would using lucene help improve performance? Probably. Lucene is blindingly fast and 5,000,000 docs is not huge by lucene standards. But we don't know how fast the existing implementation is. Should you move to lucene? Your call, to balance the expected performance gain over the work invol

Re: process dies with OOM after processing 10k docs

2008-12-16 Thread jm
right...I was forgetting the 30MB flush by ram is PER writerI'll make some tests to verify this and fix accordingly... Thanks!! On Tue, Dec 16, 2008 at 12:06 PM, Michael McCandless wrote: > > That class is what's used to buffer the added docs in IndexWriter. The heap > dump seems to indicat

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Oleg Oltar
Thanks!!! I didn't expect to get such quick answers. Just let me try to fix it :) On Tue, Dec 16, 2008 at 12:56 PM, Erik Hatcher wrote: > > On Dec 16, 2008, at 5:53 AM, Oleg Oltar wrote: > >> So is there another manual which I can use to start? (Seems that examples >> in >> the book, are carefull

Re: process dies with OOM after processing 10k docs

2008-12-16 Thread Michael McCandless
That class is what's used to buffer the added docs in IndexWriter. The heap dump seems to indicate you've got ~55 MB worth of buffered docs pending. Since you allow a 30MB RAM buffer for each writer, and it seems like you allow up to 60 writers to be opened at once, it seems like in the

Re: process dies with OOM after processing 10k docs

2008-12-16 Thread Ian Lea
I'm no expert on lucene internals, but maybe the posting lists just happen to be what is around when your program hits the OOM error. It seems more likely that you are getting OOM because of all the caches and other stuff you are doing. I suggest giving it more memory or cache fewer indexes, or t

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher
On Dec 16, 2008, at 5:53 AM, Oleg Oltar wrote: So is there another manual which I can use to start? (Seems that examples in the book, are carefully chosen for starters, and quite easy to understand) The API differences are all quite minor to adjust to the latest - hopefully the post I poi

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Oleg Oltar
I posted errors as comments in the provided code.. Yes seems that the version in the book is a little bit old (as the book was created in 2005) So is there another manual which I can use to start? (Seems that examples in the book, are carefully chosen for starters, and quite easy to understand)

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Erik Hatcher
The first edition of Lucene in Action was written for Lucene 1.4. Lots has changed since then in the API, but the fundamentals are still sound. The code can be easily updated to the newer API following the details I posted here: Do note t

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Michael McCandless
Lucene in Action is based on the 1.4.x release of Lucene, which is quite old by now and unfortunately some of the APIs have since been removed. We are working on the 2nd edition to fix this, but in the mean-time you need to migrate to the new APIs when you see the errors. Eg, if you loo

Re: Lucene in Action book. Problems with first example

2008-12-16 Thread Joseph.Syjuco
Hi, What were the errors? Just a guess ... it may be possible that you are using the wrong lucene version - the one in the book is not the most updated one avbl today "XP is making a bet. It is betting that it is better to do a simple thing today and pay a little more tomorrow to change it if

Lucene in Action book. Problems with first example

2008-12-16 Thread Oleg Oltar
Hi! I am starting to learn Lucene. I am using Lucene in Action book for startup (It was recommended to me). I tried to compile first example from that book, but my ide (I use eclipse, shows there are some errors in my code). I am just the beginner here, and I really need to compile at least few pro

Re: process dies with OOM after processing 10k docs

2008-12-16 Thread jm
yes I have tested with up to 512MB, althought I dont have the hprof dump file of those tests, they also got the OOM. I was just wondering whether having so many instances of FreqProxTermsWriter$PostingList around is a clear indicator of something I am not releasing or something. javier On Tue, De

Re: process dies with OOM after processing 10k docs

2008-12-16 Thread Ian Lea
Can you not just give the process some more memory? 128Mb seems very low for what you are doing. -- Ian. On Mon, Dec 15, 2008 at 6:28 PM, jm wrote: > Hi, > > I am having a memory issue with Lucene2.4. I am strating a process > with 128MB of ram, this process handles incoming request from othe

Re: Singleton and Lucene: org.apache.lucene.store.AlreadyClosed

2008-12-16 Thread Ian Lea
If the reopen suggestion doesn't fix it, I suggest that you cut down your singleton class to the absolute minimum, wrap it in a junit test case that demonstrates the problem and post the code here. -- Ian. On Mon, Dec 15, 2008 at 10:06 PM, Zender00 wrote: > > Hi Paul, > thanks for your reply.

Re: Singleton and Lucene: org.apache.lucene.store.AlreadyClosed

2008-12-16 Thread Baozhen Jia
some docs from Lucene 2.4.0: * IndexReader new = r.reopen(); * if (new != reader) { * ... // reader was reopened * reader.close(); * } * reader = new; Did you follow this instruction? It's bad in case of concurrent access in my opinion. Do not close the reader, and try