OK, I see the issue - SingleFile doesn't have it's own filepointer.
I'll update the original issue. (for large files, this shouldn't
change the times any).
-Yonik
http://www.lucidimagination.com
On Tue, Sep 15, 2009 at 4:13 PM, Yonik Seeley
wrote:
> On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley
Okay - using a smaller file, I get better results. I had about 2+ gig
available to cache the 700mb file, but I probably had fragmentation
issues - I just grabbed the first big file I had.
So its gets a little better for ChannelPread with the smaller file
(approx 160mb vs approx 700mb for the old t
Disturbed reminds me of the owl from sword in the stone ;)
Thats a great one liner - now I am completely disturbed.
Sorry - I've been known to do that -
The two results that I say specifically are from the harddisk - those
are from the harddisk and are ext4. They are a tad slower than the
ramdis
Note that when nthreads>1 I sometimes get wrong answers for SimpleFile...
hopefully it's just a bug in the test... I'll look into it a little.
-Yonik
http://www.lucidimagination.com
On Tue, Sep 15, 2009 at 4:00 PM, Mark Miller wrote:
> I'm jealous of your 4 3.0Ghz to my 2.0Ghz.
>
> I was on dy
On Tue, Sep 15, 2009 at 4:12 PM, Yonik Seeley
wrote:
> Note that when nthreads>1 I sometimes get wrong answers for SimpleFile...
s/SimpleFile/SingleFile/g
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For add
I'm jealous of your 4 3.0Ghz to my 2.0Ghz.
I was on dynamic scaling frequency and switched to 2.0Ghz hard.
On ramdisk, my puny 2.0's almost catch you and get a bit over 1800MB/s
with SeparateFile.
I'm smoked on PooledPread and ChannelPread though. Still sub 500 for
both, even
on the ramdisk.
It
Here's my results in my quad core phenom, with ondemand CPU freq
scaling disabled (clocks locked at 3GHz)
Ubuntu 9.04, filesystem=ext4 on 7200RPM IDE drive, testfile=95MB fully cached.
Linux odin 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC
2009 x86_64 GNU/Linux
Java(TM) SE Runtime En
Now I am completely disturbed. Which numbers come from which filesystem?
Ext4 on HDD, tmpfs (which is a filesystem of its own), ext3 on HDD,...
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: ysee...
Remember to disable CPU frequency scaling when benchmarking... some
things with IO cause the freq to drop, and when it's CPU bound again
it takes a while for Linux to scale up the freq again.
For example, on my ubuntu box, ChannelFile went from 100MB/sec to
388MB/sec. This effect probably won't b
I just really I hadn't sent this one. Here are results from the harddrive:
It looks like its closer to the same speed on the hardrive once
everything is loaded in the system cache (as you'd expect). SeparateFile
was 1200 vs almost 1700 on RAMDISK. ChannelPread looked a lot closer though.
- Mark
Michael McCandless wrote:
> I don't like that the answer is different... but it's really really
> odd that it's different-yet-almost-the-same.
>
> Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
> (Because the top 2 entries of your 4 results match the first set of 2
> entries yo
Its the same test file - everything the same except one file is on local
ext4 hd and the copy is on ramdisk.
I havn't yet looked into what the answer corresponds to. I wonder if the
RAM disk is getting made as ext3?
note:also I give the JVM RAM a bit larger than the file size, and the OS
has plen
I appreciate your explanation, but I think that the use case I
described merits a deeper exploration:
Scenario 1: 16 threads indexing; queue size = 1000; present api; need to store
In this scenario, there are always 1000 Strings with all the contents
of their respective files.
Averaging 50k per do
I don't like that the answer is different... but it's really really
odd that it's different-yet-almost-the-same.
Mark, were these 4 results on a normal (ext4) filesystem, or tmpfs?
(Because the top 2 entries of your 4 results match the first set of 2
entries you sent... so I'm thinking these 4 wer
It's been a while since I wrote that benchmarker... is it OK that the
answer is different? Did you use the same test file?
-Yonik
http://www.lucidimagination.com
On Tue, Sep 15, 2009 at 2:18 PM, Mark Miller wrote:
> The results:
>
> config: impl=SeparateFile serial=false nThreads=4 iterations
The results:
config: impl=SeparateFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-282295611, ms=173550, MB/sec=1683.7899579371938
config: impl=ChannelFile serial=false nThreads=4 iterations=100
bufsize=1024 poolsize=2 filelen=730554368
answer=-2822953
: Someone has made the decision that we will not be interested in
: storing files read using a Reader (at least not with these
: constructors).
: This is rather arbitrary.
No, it was not arbitrary at all.
The javadocs there are not a "decree" of what shall or shan't be
supported, they are an ex
: "it's possibly you just have a simple bug where you are closing the reader
before you pass it to Lucene,
:
: or maybe you are mistakenly adding the same field twice
:
: (or in two different documents)"
:
: Are you saying that if I were attempting to delete a doc and then add it
: aga
: Would it be useful to allow some sort of data tolerance when creating these
: caches? At least now the only solution is to delete that Document. Perhaps
: the values could then be returned as 0 in the Parser implementations for
: numeric failures.
picking an artibtrary number wouldn't be very
How does a conventional file system compare?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Tuesday, September 15, 2009 7:15 PM
> To: java-user@lucene.a
Mark Miller wrote:
> Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
> SeparateFile all 4 of my cores are immediately pinned and remain so.
> With ChannelFile, all 4 cores hover 20-30%.
>
> It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
>
> Even
Indeed - I just ran the FileReaderTest on a Linux tmpfs ramdisk - with
SeparateFile all 4 of my cores are immediately pinned and remain so.
With ChannelFile, all 4 cores hover 20-30%.
It would appear it may not be a good idea to use NIOFSDirectory on ramdisks.
Even still though - it looks like yo
Hi,
I read through the lucene thread/process safety issue for concurrent
indexing, my understanding is that each indexing through IndexWriter
will lock the whole index directory.
Now we need to index a community blog where many people add/update,
so queuing all those indexing requests would be a
Maybe Linux has some problems with NIO on tmpfs/other ramdisks. What Linux
do you use, 64bit or 32bit JVM and kernel, ram fs type?
If you have 64 bit and you stored your index in Linux tmpfs (not the old RAM
fs), the fastest would be MMapDirectory, as the tmpfs RAM can be directly
used when mapped
>Do you plan to support in memory indexes using the memcache api?
I'm afraid not, I prefer to do indexing on another machine before I got a plan
that can finish indexing within 30s.
- Original Message
From: Erdinc Yilmazel
To: java-user@lucene.apache.org
Sent: Tuesday, September 15, 2
>I think I will try this today evening.
Remember to update your local project from the svn, I fixed some mistakes just
now. I apologize for my negligence.
>I think we should put this as one of component in lucene-contrib. What do you
>say?
Yes, that's a good news.
- Original Message
Hi Uwe,
already done. See my last message.
Cheers,
Thomas
Uwe Schindler wrote:
> On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
> FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
> of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir
Mark Miller wrote:
> Thomas Becker wrote:
>> Hey Mark,
>>
>> yes. I'm running the app on unix. You see the difference between 2.9 and 2.4
>> here:
>>
>> http://ankeschwarzer.de/tmp/graph.jpg
>>
> Right - I know your measurements showed a difference (and will keep that
> in mind) - but the pro
Will do, tomorrow.
Mark Miller wrote:
> Can you run the following test on your RAMDISK?
>
> http://people.apache.org/~markrmiller/FileReadTest.java
>
> I've taken it from the following issue (in which NIOFSDirectory was
> developed):
> https://issues.apache.org/jira/browse/LUCENE-753
>
--
Tho
On 2.9. NIOFS is only used, if you use FSDirectory.open() instead of
FSDirectory.getDirectory (Deprecated). Can you compare when you use instead
of FSDirectory.open() the direct ctor of SimpleFSDir vs. NIOFSDir vs.
MMapDir and compare?
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http:
Can you run the following test on your RAMDISK?
http://people.apache.org/~markrmiller/FileReadTest.java
I've taken it from the following issue (in which NIOFSDirectory was
developed):
https://issues.apache.org/jira/browse/LUCENE-753
--
- Mark
http://www.lucidimagination.com
---
Thomas Becker wrote:
> Hey Mark,
>
> yes. I'm running the app on unix. You see the difference between 2.9 and 2.4
> here:
>
> http://ankeschwarzer.de/tmp/graph.jpg
>
Right - I know your measurements showed a difference (and will keep that
in mind) - but the profiling results then seem
oddly sim
Hey Mark,
yes. I'm running the app on unix. You see the difference between 2.9 and 2.4
here:
http://ankeschwarzer.de/tmp/graph.jpg
2.4 responds much quicker thus increasing throughput severly. I'm having a
single segment only:
-rw-r--r-- 1 asuser asgroup 20 Sep 9 16:40 segments.gen
-r
Hmm, so if you wanna use the Filter to narrow down the search results
you could use it in the while loop like this:
BitSet set = filter.bits(reader);
int numDocs
TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm"));
while (termDocs.next()) {
if(set.get(termDocs.doc()))
numDocs
A few quick notes -
Lucene 2.9 old api doesn't appear much worse than Lucene 2.4?
You save a lot with the new Intern impl, because thats not a hotspot
anymore. But then,
RandomAccessFile seeks end up being a lot more of the pie. They look
fairly similar in speed overall?
It looks like the major
Hello,
This seams to be a similar solution like:
Term t = new Term(fieldname, term);
int count = searcher.docFreq(t);
The problem is, that in this situation it is not possible to apply a
filter object. If I don't wanna use this filter object, I would have
to use a complex search query, wich is -
Thomas Becker wrote:
> Here's the results of profiling 10 different search requests:
>
> http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
> http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
> http://ankeschwarzer.de/tmp/lucene_29_newapi.png
>
> But you already gave me a good hint. The index being us
Here's the results of profiling 10 different search requests:
http://ankeschwarzer.de/tmp/lucene_24_oldapi.png
http://ankeschwarzer.de/tmp/lucene_29_oldapi.png
http://ankeschwarzer.de/tmp/lucene_29_newapi.png
But you already gave me a good hint. The index being used is an old one build
with lucen
On Sep 15, 2009, at 9:26 AM, Chris Bamford wrote:
Mark
It appears you are right - it *IS* something tricky. My code is
single threaded, so there is no contention. I still get
intermittent "Stream Close" exceptions (about 1 in every 800
indexWriter.addDocument() calls) which I cannot ex
Thomas Becker wrote:
> Hey Mark,
>
> thanks for your reply. Will do. Results will follow in a couple of minutes.
>
>
>
Thanks, awesome.
Also, how many segments (approx) are in your index? If there are a lot,
have you/can you try the same tests on an optimized index? Don't want to
get ahead of t
Hey Mark,
thanks for your reply. Will do. Results will follow in a couple of minutes.
Yes the custom sorts are doing something tricky. :) I'll try to explain them in
few words and paste the code.
But even w/o them 2.9 is slower. Testcase 2 and 3 have only different lucene
jars.
CustomFieldComp
Categorically I store everything in the index unless/until I *know* it
doesn'twork. With some things, it's easy to know from the outset, like if I
have
20T of data to store.
First, storing fields has minimal impact on the search speed, the stored
text
isn't interleaved with the search tokens, so t
Hey Thomas - any chance you can do some quick profiling and grab the
hotspots from the 3 configurations?
Are your custom sorts doing anything tricky?
--
- Mark
http://www.lucidimagination.com
Thomas Becker wrote:
> Urm and uploaded here:
> http://ankeschwarzer.de/tmp/graph.jpg
>
> Sorry.
>
>
Urm and uploaded here:
http://ankeschwarzer.de/tmp/graph.jpg
Sorry.
Thomas Becker wrote:
> Missed the attachment, sorry.
>
> Thomas Becker wrote:
>> Hi all,
>>
>> I'm experiencing a performance degradation after migrating to 2.9 and running
>> some tests. I'm getting out of ideas and any help to
Missed the attachment, sorry.
Thomas Becker wrote:
> Hi all,
>
> I'm experiencing a performance degradation after migrating to 2.9 and running
> some tests. I'm getting out of ideas and any help to identify the reasons why
> 2.9 is slower than 2.4 are highly appreciated.
>
> We've had some issue
Hi all,
I'm experiencing a performance degradation after migrating to 2.9 and running
some tests. I'm getting out of ideas and any help to identify the reasons why
2.9 is slower than 2.4 are highly appreciated.
We've had some issues with custom sorting in lucene 2.4.1. We worked around them
by so
Mark
It appears you are right - it *IS* something tricky. My code is single
threaded, so there is no contention. I still get intermittent "Stream Close"
exceptions (about 1 in every 800 indexWriter.addDocument() calls) which I
cannot explain. By moving code around / recompiling, I have manag
Did you try:
int numDocs
TermDocs termDocs = reader.termDocs(new Term("myField", "myTerm"));
while (termDocs.next()) { numDocs++; }
simon
On Tue, Sep 15, 2009 at 2:19 PM, Mathias Bank wrote:
> Hello,
>
> I'm trying to find the number of documents for a specific term to
> create text statistics.
OK, thanks. :-)
Glen
2009/9/14 Anthony Urso :
> It's best to file a feature request on the Lucene issue tracker if you
> are interested in seeing this implemented.
>
> http://issues.apache.org/jira/browse/LUCENE
>
> Just cut and paste your description and attach a patch and/or tests if
> you have
HI Mike,
I think adding this in Lucene 3.0 contrib would be the best we could do. I
think we could add it in Lucene 2.9 Release as it would grow the community and
we would also able to find some nice practices, bugs, improvement and that
would make it better in upcoming release.
Regards,
Alla
Hello,
I'm trying to find the number of documents for a specific term to
create text statistics. I'm not interested in ordering the results or
even recieving the first result. I just need the number of results.
Currently, I'm trying to do this by using the lucene searcher class:
IndexSearcher se
This is great news! Are you happy with the the performance of the google
data store? Do you plan to support in memory indexes using the memcache api?
Thanks
On Mon, Sep 14, 2009 at 5:04 PM, Kerang Lv wrote:
> Hi Lucene users,
>
> Enlightened by the discussion "Can I run Lucene in google app eng
On Tue, Sep 15, 2009 at 12:39 AM, Allahbaksh Mohammedali Asadullah
wrote:
> I think we should put this as one of component in lucene-contrib.
+1, this looks like a great contribution!
Mike
-
To unsubscribe, e-mail: java-user-u
Hi,
When using Lucene I always consider two approaches to displaying search
result data to users:
1. Store any fields that we index and display to users in the Lucene
Documents themselves. When we perform a search simply retrieve the data
to be displayed from the Lucence documents themselves.
or
54 matches
Mail list logo