On Sun, Nov 15, 2009 at 11:02 PM, Uwe Schindler wrote:
> the second approach is slower, when deleted docs
> are involved and 0 is inside the range (need to consult TermDocs).
>
This is a good point (and should be mentioned in your blog, John) - for
while
custom FieldCache-like implementations (
Yes, exactly 'distributed'...
>From maintenance point of view, the 'horizontal' expandable is very important.
For my case, the data file is a kind of 'history' file, categorized
by date. Once the data file is indexed, it will not change, unless
the searching fields changed.
Say I make whole ten ye
I wanted to say the same, like Yonik... One addition, the FieldCache only
supports one value/doc and the second approach is slower, when deleted docs
are involved and 0 is inside the range (need to consult TermDocs).
By the way, the numbers are similar to mine from the FCRF issue and the
explainat
On Mon, Nov 16, 2009 at 1:02 AM, John Wang wrote:
> I did some performance analysis for different ways of doing numeric
> ranging with lucene. Thought I'd share:
FYI, the second approach is already implemented in both Lucene and Solr.
http://lucene.apache.org/java/2_9_1/api/core/org/apache/luce
Hi:
I did some performance analysis for different ways of doing numeric
ranging with lucene. Thought I'd share:
http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html
-John
You could keep multiple writers opened and it will do no harm. I am doing this.
Its good to reopen (close and open) the writer at certain interval. This will
release the memory it holds. Whenever i create a new database, I reopen writers
belong to all database.
Regards
Ganesh
- Original
Sounds like you may need to have some sort of distributed system, I
just wanted to make sure you were aware of the cost/benifits of just
buying a big 62bit/8Gb ram machine, vs having to not only maintain and
power several 32 bit machines, but also maintain and support your now
more complica
My data is categorized by date. About 14M+ docs per month, 37M+ terms.
When I use 1G heap size to do search of 10 month index, I got OOM.
The problem is I can't increase heap size in an easy way.
I have several machines, all 32bit windows, 4G ram.
And my goal is to index 10 year's data, plus more
Hi scott, I think only the first two are related to lucene analysis.
You can create an analyzer easily that does what you want, just make it look
like StandardAnalyzer, but instead also add the CommonGramsFilter (this is
in solr) to your tokenstream chain.
On Sun, Nov 15, 2009 at 4:58 PM, Scott R
Simon and Eric, thanks for the reply.
I want to create multiple indexes depending on the data present. Like if month
of record is Nov, I want to add it in index for November. If it's October, add
it in index for October. I don't want to open and close indexes so many times
so just maintaining I
Not sure how large your index is, but it might be easier (if possible
to increase your memory) than to develop a fairly complicated
alternative strategy.
On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
Hi, all
I'm facing a large index, on a x86 win platform which may not have big
enough jvm h
Sorry, all folks, please ignore this thread.
I found the section in doc, just start to read that.
I just used wrong term to search before :-)
I searched for 'FieldCache' but in the book it's 'Field cache'
Anyway just ignore this.
2009/11/15 Wenbo Zhao :
> Hi all,
> In 'Lucene in Action', I read "L
Hi, all
I'm facing a large index, on a x86 win platform which may not have big
enough jvm heap space to hold the entire index.
So, I think it's possible to split the index into several smaller
indexes, run them in different jvm instances on different machine.
Then for each query, I can concurrently
From Here to There, You Can Find it Anywhere:
> Building Local/Geo-Search
> with Apache Lucene and Solr
Join us for a free webinar hosted by TechTarget / TheServerSide.com
> Wednesday, November 18th 2009
> 10:00 AM PST / 1:00 PM EST
Click here to sign up
> http://theserversidecom.bitpipe.com/de
I'm missing something here. The first two points seem incompatible.
A single analyzer that works like StandardAnalyzer in any way won't
product bigrams of any sort. It seems to me like you'd have to copy the
input into two fields analyzed two different ways.
Or do you ONLY want bigrams on the sto
Hi Otis,
this is 3Gb of heap (-Xmx). I am running on a multicore 32 bits machine and
I am concerned about the 4Gb limit. cpu is not a problem, however I am
wondering about memory requirements as I will be scaling up. I mostly use
term queries on multiple fields (about 30 fields); so no fuzzy or s
I bought the original Lucene in Action, read it, set up integration with my
system--a small Java daemon that monitors db for changes & updates the
index, and listens for queries and processes them...
Now I'd like to customize query parsing to better fit the particular
application and users. I'm th
Hi all,
In 'Lucene in Action', I read "Limit how many fields you directly load
into the FieldCache" to reduce RAM usage during searching.
But I can't find how. In javadoc, I can't find any note about how
FieldCache is used in IndexReader or IndexSearcher.
Can somebody teach me that ?
Thanks .
--
18 matches
Mail list logo