Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
OK, I see the issue - SingleFile doesn't have it's own filepointer. I'll update the original issue. (for large files, this shouldn't change the times any). -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 4:13 PM, Yonik Seeley wrote: > On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Okay - using a smaller file, I get better results. I had about 2+ gig available to cache the 700mb file, but I probably had fragmentation issues - I just grabbed the first big file I had. So its gets a little better for ChannelPread with the smaller file (approx 160mb vs approx 700mb for the old t

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Disturbed reminds me of the owl from sword in the stone ;) Thats a great one liner - now I am completely disturbed. Sorry - I've been known to do that - The two results that I say specifically are from the harddisk - those are from the harddisk and are ext4. They are a tad slower than the ramdis

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
Note that when nthreads>1 I sometimes get wrong answers for SimpleFile... hopefully it's just a bug in the test... I'll look into it a little. -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 4:00 PM, Mark Miller wrote: > I'm jealous of your 4 3.0Ghz to my 2.0Ghz. > > I was on dy

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley wrote: > Note that when nthreads>1 I sometimes get wrong answers for SimpleFile... s/SimpleFile/SingleFile/g - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For add

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
I'm jealous of your 4 3.0Ghz to my 2.0Ghz. I was on dynamic scaling frequency and switched to 2.0Ghz hard. On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s with SeparateFile. I'm smoked on PooledPread and ChannelPread though. Still sub 500 for both, even on the ramdisk. It

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
Here's my results in my quad core phenom, with ondemand CPU freq scaling disabled (clocks locked at 3GHz) Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached. Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC 2009 x86_64 GNU/Linux Java(TM) SE Runtime En

RE: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Uwe Schindler
Now I am completely disturbed. Which numbers come from which filesystem? Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ysee...

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
Remember to disable CPU frequency scaling when benchmarking... some things with IO cause the freq to drop, and when it's CPU bound again it takes a while for Linux to scale up the freq again. For example, on my ubuntu box, ChannelFile went from 100MB/sec to 388MB/sec. This effect probably won't b

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
I just really I hadn't sent this one. Here are results from the harddrive: It looks like its closer to the same speed on the hardrive once everything is loaded in the system cache (as you'd expect). SeparateFile was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though. - Mark

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Michael McCandless wrote: > I don't like that the answer is different... but it's really really > odd that it's different-yet-almost-the-same. > > Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs? > (Because the top 2 entries of your 4 results match the first set of 2 > entries yo

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Its the same test file - everything the same except one file is on local ext4 hd and the copy is on ramdisk. I havn't yet looked into what the answer corresponds to. I wonder if the RAM disk is getting made as ext3? note:also I give the JVM RAM a bit larger than the file size, and the OS has plen

Re: Field with reader limitation arbitrary

2009-09-15 Thread Glen Newton
I appreciate your explanation, but I think that the use case I described merits a deeper exploration: Scenario 1: 16 threads indexing; queue size = 1000; present api; need to store In this scenario, there are always 1000 Strings with all the contents of their respective files. Averaging 50k per do

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Michael McCandless
I don't like that the answer is different... but it's really really odd that it's different-yet-almost-the-same. Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs? (Because the top 2 entries of your 4 results match the first set of 2 entries you sent... so I'm thinking these 4 wer

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Yonik Seeley
It's been a while since I wrote that benchmarker... is it OK that the answer is different? Did you use the same test file? -Yonik http://www.lucidimagination.com On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote: > The results: > > config: impl=SeparateFile serial=false nThreads=4 iterations

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
The results: config: impl=SeparateFile serial=false nThreads=4 iterations=100 bufsize=1024 poolsize=2 filelen=730554368 answer=-282295611, ms=173550, MB/sec=1683.7899579371938 config: impl=ChannelFile serial=false nThreads=4 iterations=100 bufsize=1024 poolsize=2 filelen=730554368 answer=-2822953

Re: Field with reader limitation arbitrary

2009-09-15 Thread Chris Hostetter
: Someone has made the decision that we will not be interested in : storing files read using a Reader (at least not with these : constructors). : This is rather arbitrary. No, it was not arbitrary at all. The javadocs there are not a "decree" of what shall or shan't be supported, they are an ex

RE: New "Stream closed" exception with Java 6

2009-09-15 Thread Chris Hostetter
: "it's possibly you just have a simple bug where you are closing the reader before you pass it to Lucene, : : or maybe you are mistakenly adding the same field twice : : (or in two different documents)" : : Are you saying that if I were attempting to delete a doc and then add it : aga

Re: NumberFormatException when creating field cache

2009-09-15 Thread Chris Hostetter
: Would it be useful to allow some sort of data tolerance when creating these : caches? At least now the only solution is to delete that Document. Perhaps : the values could then be returned as 0 in the Parser implementations for : numeric failures. picking an artibtrary number wouldn't be very

RE: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Uwe Schindler
How does a conventional file system compare? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Tuesday, September 15, 2009 7:15 PM > To: java-user@lucene.a

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Mark Miller wrote: > Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with > SeparateFile all 4 of my cores are immediately pinned and remain so. > With ChannelFile, all 4 cores hover 20-30%. > > It would appear it may not be a good idea to use NIOFSDirectory on ramdisks. > > Even

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with SeparateFile all 4 of my cores are immediately pinned and remain so. With ChannelFile, all 4 cores hover 20-30%. It would appear it may not be a good idea to use NIOFSDirectory on ramdisks. Even still though - it looks like yo

Concurrent indexing thread/process safety issue - can we have document-lock instead of directory-lock in future?

2009-09-15 Thread Zhang, Lisheng
Hi, I read through the lucene thread/process safety issue for concurrent indexing, my understanding is that each indexing through IndexWriter will lock the whole index directory. Now we need to index a community blog where many people add/update, so queuing all those indexing requests would be a

RE: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Uwe Schindler
Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux do you use, 64bit or 32bit JVM and kernel, ram fs type? If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly used when mapped

Re: Run your Lucene Applications on Google AppEngine with GAELucene

2009-09-15 Thread Kerang Lv
>Do you plan to support in memory indexes using the memcache api? I'm afraid not, I prefer to do indexing on another machine before I got a plan that can finish indexing within 30s. - Original Message From: Erdinc Yilmazel To: java-user@lucene.apache.org Sent: Tuesday, September 15, 2

Re: Run your Lucene Applications on Google AppEngine with GAELucene

2009-09-15 Thread Kerang Lv
>I think I will try this today evening. Remember to update your local project from the svn, I fixed some mistakes just now. I apologize for my negligence. >I think we should put this as one of component in lucene-contrib. What do you >say? Yes, that's a good news. - Original Message

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Hi Uwe, already done. See my last message. Cheers, Thomas Uwe Schindler wrote: > On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of > FSDirectory.getDirectory (Deprecated). Can you compare when you use instead > of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Mark Miller wrote: > Thomas Becker wrote: >> Hey Mark, >> >> yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 >> here: >> >> http://ankeschwarzer.de/tmp/graph.jpg >> > Right - I know your measurements showed a difference (and will keep that > in mind) - but the pro

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Will do, tomorrow. Mark Miller wrote: > Can you run the following test on your RAMDISK? > > http://people.apache.org/~markrmiller/FileReadTest.java > > I've taken it from the following issue (in which NIOFSDirectory was > developed): > https://issues.apache.org/jira/browse/LUCENE-753 > -- Tho

RE: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Uwe Schindler
On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of FSDirectory.getDirectory (Deprecated). Can you compare when you use instead of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs. MMapDir and compare? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http:

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Can you run the following test on your RAMDISK? http://people.apache.org/~markrmiller/FileReadTest.java I've taken it from the following issue (in which NIOFSDirectory was developed): https://issues.apache.org/jira/browse/LUCENE-753 -- - Mark http://www.lucidimagination.com ---

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Thomas Becker wrote: > Hey Mark, > > yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 > here: > > http://ankeschwarzer.de/tmp/graph.jpg > Right - I know your measurements showed a difference (and will keep that in mind) - but the profiling results then seem oddly sim

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Hey Mark, yes. I'm running the app on unix. You see the difference between 2.9 and 2.4 here: http://ankeschwarzer.de/tmp/graph.jpg 2.4 responds much quicker thus increasing throughput severly. I'm having a single segment only: -rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen -r

Re: Counting search results

2009-09-15 Thread Simon Willnauer
Hmm, so if you wanna use the Filter to narrow down the search results you could use it in the while loop like this: BitSet set = filter.bits(reader); int numDocs TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm")); while (termDocs.next()) { if(set.get(termDocs.doc())) numDocs

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
A few quick notes - Lucene 2.9 old api doesn't appear much worse than Lucene 2.4? You save a lot with the new Intern impl, because thats not a hotspot anymore. But then, RandomAccessFile seeks end up being a lot more of the pie. They look fairly similar in speed overall? It looks like the major

Re: Counting search results

2009-09-15 Thread Mathias Bank
Hello, This seams to be a similar solution like: Term t = new Term(fieldname, term); int count = searcher.docFreq(t); The problem is, that in this situation it is not possible to apply a filter object. If I don't wanna use this filter object, I would have to use a complex search query, wich is -

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Thomas Becker wrote: > Here's the results of profiling 10 different search requests: > > http://ankeschwarzer.de/tmp/lucene_24_oldapi.png > http://ankeschwarzer.de/tmp/lucene_29_oldapi.png > http://ankeschwarzer.de/tmp/lucene_29_newapi.png > > But you already gave me a good hint. The index being us

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Here's the results of profiling 10 different search requests: http://ankeschwarzer.de/tmp/lucene_24_oldapi.png http://ankeschwarzer.de/tmp/lucene_29_oldapi.png http://ankeschwarzer.de/tmp/lucene_29_newapi.png But you already gave me a good hint. The index being used is an old one build with lucen

Re: New "Stream closed" exception with Java 6

2009-09-15 Thread Grant Ingersoll
On Sep 15, 2009, at 9:26 AM, Chris Bamford wrote: Mark It appears you are right - it *IS* something tricky. My code is single threaded, so there is no contention. I still get intermittent "Stream Close" exceptions (about 1 in every 800 indexWriter.addDocument() calls) which I cannot ex

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Thomas Becker wrote: > Hey Mark, > > thanks for your reply. Will do. Results will follow in a couple of minutes. > > > Thanks, awesome. Also, how many segments (approx) are in your index? If there are a lot, have you/can you try the same tests on an optimized index? Don't want to get ahead of t

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Hey Mark, thanks for your reply. Will do. Results will follow in a couple of minutes. Yes the custom sorts are doing something tricky. :) I'll try to explain them in few words and paste the code. But even w/o them 2.9 is slower. Testcase 2 and 3 have only different lucene jars. CustomFieldComp

Re: Displaying search result data - stored fields vs external source

2009-09-15 Thread Erick Erickson
Categorically I store everything in the index unless/until I *know* it doesn'twork. With some things, it's easy to know from the outset, like if I have 20T of data to store. First, storing fields has minimal impact on the search speed, the stored text isn't interleaved with the search tokens, so t

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Mark Miller
Hey Thomas - any chance you can do some quick profiling and grab the hotspots from the 3 configurations? Are your custom sorts doing anything tricky? -- - Mark http://www.lucidimagination.com Thomas Becker wrote: > Urm and uploaded here: > http://ankeschwarzer.de/tmp/graph.jpg > > Sorry. > >

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Urm and uploaded here: http://ankeschwarzer.de/tmp/graph.jpg Sorry. Thomas Becker wrote: > Missed the attachment, sorry. > > Thomas Becker wrote: >> Hi all, >> >> I'm experiencing a performance degradation after migrating to 2.9 and running >> some tests. I'm getting out of ideas and any help to

Re: lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Missed the attachment, sorry. Thomas Becker wrote: > Hi all, > > I'm experiencing a performance degradation after migrating to 2.9 and running > some tests. I'm getting out of ideas and any help to identify the reasons why > 2.9 is slower than 2.4 are highly appreciated. > > We've had some issue

lucene 2.9.0RC4 slower than 2.4.1?

2009-09-15 Thread Thomas Becker
Hi all, I'm experiencing a performance degradation after migrating to 2.9 and running some tests. I'm getting out of ideas and any help to identify the reasons why 2.9 is slower than 2.4 are highly appreciated. We've had some issues with custom sorting in lucene 2.4.1. We worked around them by so

RE: New "Stream closed" exception with Java 6

2009-09-15 Thread Chris Bamford
Mark It appears you are right - it *IS* something tricky. My code is single threaded, so there is no contention. I still get intermittent "Stream Close" exceptions (about 1 in every 800 indexWriter.addDocument() calls) which I cannot explain. By moving code around / recompiling, I have manag

Re: Counting search results

2009-09-15 Thread Simon Willnauer
Did you try: int numDocs TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm")); while (termDocs.next()) { numDocs++; } simon On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank wrote: > Hello, > > I'm trying to find the number of documents for a specific term to > create text statistics.

Re: Field with reader limitation arbitrary

2009-09-15 Thread Glen Newton
OK, thanks. :-) Glen 2009/9/14 Anthony Urso : > It's best to file a feature request on the Lucene issue tracker if you > are interested in seeing this implemented. > > http://issues.apache.org/jira/browse/LUCENE > > Just cut and paste your description and attach a patch and/or tests if > you have

RE: Run your Lucene Applications on Google AppEngine with GAELucene

2009-09-15 Thread Allahbaksh Mohammedali Asadullah
HI Mike, I think adding this in Lucene 3.0 contrib would be the best we could do. I think we could add it in Lucene 2.9 Release as it would grow the community and we would also able to find some nice practices, bugs, improvement and that would make it better in upcoming release. Regards, Alla

Counting search results

2009-09-15 Thread Mathias Bank
Hello, I'm trying to find the number of documents for a specific term to create text statistics. I'm not interested in ordering the results or even recieving the first result. I just need the number of results. Currently, I'm trying to do this by using the lucene searcher class: IndexSearcher se

Re: Run your Lucene Applications on Google AppEngine with GAELucene

2009-09-15 Thread Erdinc Yilmazel
This is great news! Are you happy with the the performance of the google data store? Do you plan to support in memory indexes using the memcache api? Thanks On Mon, Sep 14, 2009 at 5:04 PM, Kerang Lv wrote: > Hi Lucene users, > > Enlightened by the discussion "Can I run Lucene in google app eng

Re: Run your Lucene Applications on Google AppEngine with GAELucene

2009-09-15 Thread Michael McCandless
On Tue, Sep 15, 2009 at 12:39 AM, Allahbaksh Mohammedali Asadullah wrote: > I think we should put this as one of component in lucene-contrib. +1, this looks like a great contribution! Mike - To unsubscribe, e-mail: java-user-u

Displaying search result data - stored fields vs external source

2009-09-15 Thread Joel Halbert
Hi, When using Lucene I always consider two approaches to displaying search result data to users: 1. Store any fields that we index and display to users in the Lucene Documents themselves. When we perform a search simply retrieve the data to be displayed from the Lucence documents themselves. or