Modifying Length Normalization calculation

2011-06-12 Thread Lahiru Samarakoon
Hi All, I want to change the length normalization calculation specific to my application. By changing the "*number of terms*" according to my requirement. The "*StandardTokenizer*" works perfectly for my application, However, the *number of terms* calculated by the tokenizer is not the effective n

Re: Index size and performance degradation

2011-06-12 Thread Shai Erera
> > I'm not sure I understood the filters approach you described. Can you give > an example? > A Language filter is one -- different users search in different languages and want to view pages in those languages only. If you have a field attach to your documents that identifies the language of the

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Thanks for your detailed answer. We'll have to tackle this and see whats more important to us then. I'd definitely love to hear Zoie has overcame all that... Any pointers to Michael Busch's approach? I take this has something to do with the core itself or index format, probably using the Flex

Re: Index size and performance degradation

2011-06-12 Thread Michael McCandless
>From what I understand of Zoie (and it's been some time since I last looked... so this could be wrong now), the biggest difference vs NRT is that Zoie aims for "immediate consistency", ie index changes are always made visible to the very next query, vs NRT which is "controlled consistency", a blen

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Our problem is a bit different. There aren't always common searches so if we cache blindly we could end up having too much RAM allocated for virtually nothing. And we need to allow for real-time search so caching will hardly help. We enforce some client-side caching, but again - the real-time r

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Mike, Speaking of NRT, and completely off-topic, I know: Lucene's NRT apparently isn't fast enough if Zoie was needed, and now that Zoie is around are there any plans to make it Lucene's default? or: why would one still use NRT when Zoie seem to work much better? Itamar. On 12/06/2011 13

Re: Index size and performance degradation

2011-06-12 Thread Shai Erera
> > Shai, what would you call a smart app-level cache? remembering frequent > searches and storing them handy? Remembering frequent searches is good. If you do this, you can warm up the cache whenever a new IndexSearcher is opened (e.g., if you use SearcherManager from LIA2) and besides keeping t

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Andrew, no particular hardware setup I'm afraid. That is a general product which we can't assume anything about the hardware it would run on. Thanks for the tip on multi-core tho. On 12/06/2011 11:45, Andrew Kane wrote: In the literature there is some evidence that sharding of in-memory index

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Shai, what would you call a smart app-level cache? remembering frequent searches and storing them handy? or are there more advanced techniques for that? any pointers appreciated... Thanks for all the advice! On 12/06/2011 11:42, Shai Erera wrote: isn't there anything that we can do to avoi

Re: Index size and performance degradation

2011-06-12 Thread Michael McCandless
Remember that memory-mapping is not a panacea: at the end of the day, if there just isn't enough RAM on the machine to keep your full "working set" hot, then the OS will have to hit the disk, regardless of whether the access is through MMap or a "traditional" IO request. That said, on Fedora Linux

Re: Index size and performance degradation

2011-06-12 Thread Andrew Kane
In the literature there is some evidence that sharding of in-memory indexes on multi-core machines might be better. Has anyone tried this lately? http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4228359 Single disk machines (HDD or SSD) would be slower. Multi-disk or RAID type setups

Re: Index size and performance degradation

2011-06-12 Thread Shai Erera
> > isn't there anything that we can do to avoid that? > That was my point :) --> you can optimize your search application, use mmap files, smart caches etc., until it reaches a point where you need to shard. But it's still application dependent, not much of an OS thing. You can count on the OS to

Re: Index size and performance degradation

2011-06-12 Thread Itamar Syn-Hershko
Thanks. The whole point of my question was to find out if and how to make balancing on the SAME machine. Apparently thats not going to help and at a certain point we will just have to prompt the user to buy more hardware... Out of curiosity, isn't there anything that we can do to avoid that