Re: share some numbers for range queries

2009-11-15 Thread Jake Mannix
On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler wrote: > the second approach is slower, when deleted docs > are involved and 0 is inside the range (need to consult TermDocs). > This is a good point (and should be mentioned in your blog, John) - for while custom FieldCache-like implementations (

Re: Can Lucene unite multiple instances run as one ?

2009-11-15 Thread Wenbo Zhao
Yes, exactly 'distributed'... >From maintenance point of view, the 'horizontal' expandable is very important. For my case, the data file is a kind of 'history' file, categorized by date. Once the data file is indexed, it will not change, unless the searching fields changed. Say I make whole ten ye

RE: share some numbers for range queries

2009-11-15 Thread Uwe Schindler
I wanted to say the same, like Yonik... One addition, the FieldCache only supports one value/doc and the second approach is slower, when deleted docs are involved and 0 is inside the range (need to consult TermDocs). By the way, the numbers are similar to mine from the FCRF issue and the explainat

Re: share some numbers for range queries

2009-11-15 Thread Yonik Seeley
On Mon, Nov 16, 2009 at 1:02 AM, John Wang wrote: >   I did some performance analysis for different ways of doing numeric > ranging with lucene. Thought I'd share: FYI, the second approach is already implemented in both Lucene and Solr. http://lucene.apache.org/java/2_9_1/api/core/org/apache/luce

share some numbers for range queries

2009-11-15 Thread John Wang
Hi: I did some performance analysis for different ways of doing numeric ranging with lucene. Thought I'd share: http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html -John

Re: Max number of open IndexWriters

2009-11-15 Thread Ganesh
You could keep multiple writers opened and it will do no harm. I am doing this. Its good to reopen (close and open) the writer at certain interval. This will release the memory it holds. Whenever i create a new database, I reopen writers belong to all database. Regards Ganesh - Original

Re: Can Lucene unite multiple instances run as one ?

2009-11-15 Thread Jacob Rhoden
Sounds like you may need to have some sort of distributed system, I just wanted to make sure you were aware of the cost/benifits of just buying a big 62bit/8Gb ram machine, vs having to not only maintain and power several 32 bit machines, but also maintain and support your now more complica

Re: Can Lucene unite multiple instances run as one ?

2009-11-15 Thread Wenbo Zhao
My data is categorized by date. About 14M+ docs per month, 37M+ terms. When I use 1G heap size to do search of 10 month index, I got OOM. The problem is I can't increase heap size in an easy way. I have several machines, all 32bit windows, 4G ram. And my goal is to index 10 year's data, plus more

Re: Polishing up my Lucene integration, customizing analyzer

2009-11-15 Thread Robert Muir
Hi scott, I think only the first two are related to lucene analysis. You can create an analyzer easily that does what you want, just make it look like StandardAnalyzer, but instead also add the CommonGramsFilter (this is in solr) to your tokenstream chain. On Sun, Nov 15, 2009 at 4:58 PM, Scott R

RE: Max number of open IndexWriters

2009-11-15 Thread Hrishikesh Agashe
Simon and Eric, thanks for the reply. I want to create multiple indexes depending on the data present. Like if month of record is Nov, I want to add it in index for November. If it's October, add it in index for October. I don't want to open and close indexes so many times so just maintaining I

Re: Can Lucene unite multiple instances run as one ?

2009-11-15 Thread Jacob Rhoden
Not sure how large your index is, but it might be easier (if possible to increase your memory) than to develop a fairly complicated alternative strategy. On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote: Hi, all I'm facing a large index, on a x86 win platform which may not have big enough jvm h

Re: How to limit fields being loaded into the FieldCache ?

2009-11-15 Thread Wenbo Zhao
Sorry, all folks, please ignore this thread. I found the section in doc, just start to read that. I just used wrong term to search before :-) I searched for 'FieldCache' but in the book it's 'Field cache' Anyway just ignore this. 2009/11/15 Wenbo Zhao : > Hi all, > In 'Lucene in Action', I read "L

Can Lucene unite multiple instances run as one ?

2009-11-15 Thread Wenbo Zhao
Hi, all I'm facing a large index, on a x86 win platform which may not have big enough jvm heap space to hold the entire index. So, I think it's possible to split the index into several smaller indexes, run them in different jvm instances on different machine. Then for each query, I can concurrently

[OT] Webinar on spatial search using Lucene and Solr

2009-11-15 Thread Grant Ingersoll
From Here to There, You Can Find it Anywhere: > Building Local/Geo-Search > with Apache Lucene and Solr Join us for a free webinar hosted by TechTarget / TheServerSide.com > Wednesday, November 18th 2009 > 10:00 AM PST / 1:00 PM EST Click here to sign up > http://theserversidecom.bitpipe.com/de

Re: Polishing up my Lucene integration, customizing analyzer

2009-11-15 Thread Erick Erickson
I'm missing something here. The first two points seem incompatible. A single analyzer that works like StandardAnalyzer in any way won't product bigrams of any sort. It seems to me like you'd have to copy the input into two fields analyzed two different ways. Or do you ONLY want bigrams on the sto

Re: OutofMemory in large index

2009-11-15 Thread vsevel
Hi Otis, this is 3Gb of heap (-Xmx). I am running on a multicore 32 bits machine and I am concerned about the 4Gb limit. cpu is not a problem, however I am wondering about memory requirements as I will be scaling up. I mostly use term queries on multiple fields (about 30 fields); so no fuzzy or s

Polishing up my Lucene integration, customizing analyzer

2009-11-15 Thread Scott Ribe
I bought the original Lucene in Action, read it, set up integration with my system--a small Java daemon that monitors db for changes & updates the index, and listens for queries and processes them... Now I'd like to customize query parsing to better fit the particular application and users. I'm th

How to limit fields being loaded into the FieldCache ?

2009-11-15 Thread Wenbo Zhao
Hi all, In 'Lucene in Action', I read "Limit how many fields you directly load into the FieldCache" to reduce RAM usage during searching. But I can't find how. In javadoc, I can't find any note about how FieldCache is used in IndexReader or IndexSearcher. Can somebody teach me that ? Thanks . --