Re: RAM or SSD...

2012-07-18 Thread Vitaly Funstein
I was referring to *RAMDirectory*. On Wed, Jul 18, 2012 at 11:04 PM, Lance Norskog wrote: >> You do not want to store 30 G of data in the JVM heap, no matter what library does this. > MMapDirectory does not store data in the JVM heap. It lets the > operating system manage the disk buffer space. E

RE: In memory Lucene configuration

2012-07-18 Thread Doron Yaacoby
Thanks for the input. I am not using Solr. Also, my index has a fixed size, I am not going to update it. -Original Message- From: googoo [mailto:liu...@gmail.com] Sent: 18 July 2012 15:21 To: java-user@lucene.apache.org Subject: Re: In memory Lucene configuration Doron, To verify actual

Re: RAM or SSD...

2012-07-18 Thread Dawid Weiss
> Why anyone buys computers without SSD's is a mystery to me. Use SSDs for On topic and highly recommended: http://www.youtube.com/watch?v=H7PJ1oeEyGg Dawid - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For a

RE: In memory Lucene configuration

2012-07-18 Thread Doron Yaacoby
I had a threading issue in the client code calling Lucene, really nothing that has anything to do with this list :) -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: 18 July 2012 21:48 To: java-user@lucene.apache.org Subject: Re: In memory Lucene configura

Re: RAM or SSD...

2012-07-18 Thread Lance Norskog
> You do not want to store 30 G of data in the JVM heap, no matter what library > does this. MMapDirectory does not store data in the JVM heap. It lets the operating system manage the disk buffer space. Even if the JVM says "I have 30G of memory space", it really does not. It only has address spac

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Robert Muir
On Thu, Jul 19, 2012 at 1:53 AM, Bernd Fehling wrote: > ... > Robert Muir added a comment - 12/Apr/12 16:24 > > We can save 10MB with this patch, which nukes the 'index'. > I guarantee you nobody will miss it. Just click this thing and see how > useless it is (since its every method etc in all of

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
... Robert Muir added a comment - 12/Apr/12 16:24 We can save 10MB with this patch, which nukes the 'index'. I guarantee you nobody will miss it. Just click this thing and see how useless it is (since its every method etc in all of lucene). ... Yeah, "nobody will miss it" and "see how useless it i

Re: RAM or SSD...

2012-07-18 Thread Toke Eskildsen
On Wed, 2012-07-18 at 17:50 +0200, Dragon Fly wrote: > If I want to improve performance, which of the following is better and why? > > 1. Buy a machine with a lot of RAM and use a RAMDirectory for the index. As others has pointed out, MMapDirectory should work better than RAMDirectory. I am sure

Re: TermEnum.docFreq() includes deleted docs

2012-07-18 Thread Michael McCandless
On Tue, Jul 17, 2012 at 12:44 PM, Roman Chyla wrote: > Hi, > > Tests show that TermEnum.docFreq() returns sum of all docs, including > the deleted ones. Which seems to (indirectly) contradict the javadoc That's right; fixing it to reflect deleted documents would be prohibitively costly. Hmm whic

RE: In memory Lucene configuration

2012-07-18 Thread Uwe Schindler
Hi, just to clarify: > In additional, i don't think load whole index to memory is good idea. Since the > index size will always increase. > For me, i change lucene code to disable MMapDirectory, since the index size is > bigger and bigger. > And MMapDirectory will call something like c++ share me

Re: RAM or SSD...

2012-07-18 Thread Dawid Weiss
> Rum is an essential ingredient in all software systems :-) You probably meant "social systems". D. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache

Re: RAM or SSD...

2012-07-18 Thread Simon Willnauer
On Wed, Jul 18, 2012 at 9:05 PM, Tim Eck wrote: > Rum is an essential ingredient in all software systems :-) Absolutely! :) simon > > -Original Message- > From: Simon Willnauer [mailto:simon.willna...@gmail.com] > Sent: Wednesday, July 18, 2012 11:49 AM > To: java-user@lucene.apache.org >

RE: RAM or SSD...

2012-07-18 Thread Tim Eck
Rum is an essential ingredient in all software systems :-) -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Wednesday, July 18, 2012 11:49 AM To: java-user@lucene.apache.org Subject: Re: RAM or SSD... 1. use mmap directory 2. buy rum 3. get an SSD simon

Re: RAM or SSD...

2012-07-18 Thread Simon Willnauer
1. use mmap directory 2. buy rum 3. get an SSD simon :) On Wed, Jul 18, 2012 at 8:36 PM, Vitaly Funstein wrote: > You do not want to store 30 G of data in the JVM heap, no matter what > library does this. > > On Wed, Jul 18, 2012 at 10:44 AM, Paul Jakubik wrote: >> If only 30GB, go with RAM and

Re: In memory Lucene configuration

2012-07-18 Thread Simon Willnauer
doron, enlighten me please! On Wed, Jul 18, 2012 at 1:32 PM, Doron Yaacoby wrote: > Glad to announce the problem was on my side, and had nothing to do with > Lucene. Indeed, looks like that MMapDirectory is the best choice for me. > > Thanks again. > > -Original Message- > From: Doron Ya

Re: RAM or SSD...

2012-07-18 Thread Vitaly Funstein
You do not want to store 30 G of data in the JVM heap, no matter what library does this. On Wed, Jul 18, 2012 at 10:44 AM, Paul Jakubik wrote: > If only 30GB, go with RAM and MMAPDirectory (as long as you have the budget > for that hardware). > > My understanding is that RAMDirectory is intended

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Chris Hostetter
: What is the sense of removing the "Index" from the API Javadoc for Lucene and Solr? It was heavily bloating the size of the releases... https://issues.apache.org/jira/browse/LUCENE-3977 It's pretty easy to turn this back on and rebuild the docs locally. Feel free to open a jira and submit

Re: RAM or SSD...

2012-07-18 Thread Paul Jakubik
If only 30GB, go with RAM and MMAPDirectory (as long as you have the budget for that hardware). My understanding is that RAMDirectory is intended for unit tests, not for production indexes. On Wed, Jul 18, 2012 at 10:50 AM, Dragon Fly wrote: > > Hi, > > If I want to improve performance, which of

Re: Multiple sort field

2012-07-18 Thread Erick Erickson
Lucene certainly supports multiple sort criteria, see IndexSearcher.search, any one that takes a Sort object. The Sort object can contain a list of fields where any ties in the first N field(s) are decided by looking at field N+1. But, Ganesh, be a little careful about resolving by internal Lucene

Re: Lucene reorganizing indexes

2012-07-18 Thread googoo
Optimize will release disk space if have lots of delete. (Merge will do same thing). For me, I think optimize will little bit speed up search. Which JRE are you using? for windows, if you are using 64bit JRE, then lucene try to map index to memory. that will use lots of memory and also involve lot

Re: how to implement a search engine like gmail?

2012-07-18 Thread googoo
it always add one more search conditional. like you search by subject:hello. the back end will search subject:hello AND accound:齐保元 -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-implement-a-search-engine-like-gmail-tp3995675p3995700.html Sent from the Lucene - Java Us

Re: Multiple sort field

2012-07-18 Thread googoo
I don't think lucene will support multi sort. If you look into org.apache.lucene.search.TopScoreDocCollector you may get some feeling. It use max heap to sort the document, and the score is one time calculate, it it not first sort by time, then sort again by id. When lucene sort below documents:

Re: In memory Lucene configuration

2012-07-18 Thread googoo
Doron, To verify actual query speed, i think you may need: 1) do not run index job 2) in solrconfig.xml, set filterCache and queryResultCache value to 0 3) restart solr 4) run the query and check the qtime result That may give you some idea what is actual query time. To break down query time, yo

Re: RAM or SSD...

2012-07-18 Thread Stephen Howe
What metrics are you measuring performance by? Also, what is your current setup? You might be able to speed up your current setup by tweaking configuration settings without needing more hardware. On Wed, Jul 18, 2012 at 11:50 AM, Dragon Fly wrote: > > Hi, > > If I want to improve performance, whi

RE: Indexed BytesRef

2012-07-18 Thread Simon McDuff
Thank you Robert, Thank you! It solves my problems! > From: rcm...@gmail.com > Date: Wed, 18 Jul 2012 10:40:08 -0400 > Subject: Re: Indexed BytesRef > To: java-user@lucene.apache.org > > Here's a test indexing some binary terms > > http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/c

RAM or SSD...

2012-07-18 Thread Dragon Fly
Hi, If I want to improve performance, which of the following is better and why? 1. Buy a machine with a lot of RAM and use a RAMDirectory for the index. 2. Put the index on a solid state drive. By the way, my index is about 30 GB. Thank you.

Re: how to implement a search engine like gmail?

2012-07-18 Thread Ian Lea
That is one option. See recent thread (yesterday?) about possible problems with that approach, and an alternative or two. I've no idea how Google do it. And I've no idea what you mean by problem with different subjects. -- Ian. On Wed, Jul 18, 2012 at 4:27 PM, 许超前 wrote: > Maybe everyone ha

Re: how to implement a search engine like gmail?

2012-07-18 Thread 许超前
Maybe everyone has his/her own index. 2012/7/18 齐保元 > HI buddy, >In gmail,there are many accounts,how google manage to > search individual email without the risk of search other accounts email?If > there are *huge* account,small index may knock down the server,any good > idea?and

Re: Indexed BytesRef

2012-07-18 Thread Robert Muir
Here's a test indexing some binary terms http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/test/org/apache/lucene/index/TestBinaryTerms.java It uses BinaryTokenStream (http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/test/org/apache/lucene/index/BinaryTokenStream.ja

Indexed BytesRef

2012-07-18 Thread Simon McDuff
Hi, I'm using Lucene 4.0. I would like to index String, but since my system required High volume I need to reuse always the same memory. No question to use String. My process receives bytes and I can transform it in BytesRef (representing a String) At the moment, it seems that when I use fiel

Re: Boolean Query: Knowing Which Clauses Matched

2012-07-18 Thread Ashish Jaen
Will be great if someone can show how to do it.. For my application, I donot care about any score (just vanilla boolean search is sufficient) In the mean while, I experimented with some workaround and would like to share the findings: Problem details: On a collection on 10 million documents, I wa

Re: Boolean Query: Knowing Which Clauses Matched

2012-07-18 Thread Michael McCandless
This is possible, using the ScorerVisitor (3.6) / getChildren (4.0). You need a custom collector that when it collects a competitive hit, visits the sub-scorers of your BooleanQuery and saves away which ones matched the current doc. But this is very expert and there are real challenges (eg not all

change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
Dear developers, while upgrading from 3.6.x to 4.x I have to rewrite some of my code and search for the new methods and/or classes. In 3.6.x and older versions the API Javadoc interface had an "Index" which made it easy to find the appropriate methods. The button to call the "Index" was located in

RE: In memory Lucene configuration

2012-07-18 Thread Doron Yaacoby
Glad to announce the problem was on my side, and had nothing to do with Lucene. Indeed, looks like that MMapDirectory is the best choice for me. Thanks again. -Original Message- From: Doron Yaacoby [mailto:dor...@gingersoftware.com] Sent: 16 July 2012 09:43 To: java-user@lucene.apache.

Re: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Ian Lea
I'd forgotten about IndexUpgrader, but I'd still go for 3.6. I wouldn't want the complexity of shipping two versions of lucene and having to get customers to run an upgrade script. And probably wouldn't want to ship the first stable version of 4.0, even though lucene is very stable and reliable.

RE: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Uwe Schindler
The tool docs can be found here: http://goo.gl/TbbxC - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Wednesday, July 18, 2012 11:13 AM > To: java-user@luce

RE: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Uwe Schindler
Hi, You have to first "convert" your indexes with version 3.x to migrate from 2.x to 4.0. This can be done with the new tool called "IndexUpgrader" (available since Lucene 3.2 or like that). You can call it from command line, it will upgrade all index segments to the latest version you are using t

Re: Lucene 2.x to 4.x upgrade possible?

2012-07-18 Thread Ian Lea
The release notice for 4.0-alpha sent to this list says "file format backwards compatibility is provided for indexes from the 3.0 series" so you won't be able to go straight from 2.x to 4.0. I'm sure that will remain true for all 4.x releases. The comments about waiting for a stable release of 4.

Re: Multiple sort field

2012-07-18 Thread Ian Lea
> Any thoughts on this? Patience ... > Is it good to use multiple sort fields? Absolutely, if that's what you need. On the other hand, if you don't need it then it's a bad idea. > Using sort on docid will consume any memory? Don't know. Certainly won't use less than not sorting this way. >