Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
e a transaction log in parallel to > > indexing, > > >> so they commit very seldom. If the system crashes, the changes are > > replayed > > >> from tranlog since last commit. > > >> > > >> Uwe > > >> > > >>

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Adrien Grand
gt; >> > >> - > >> Uwe Schindler > >> Achterdiek 19, D-28357 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > -Original Message- > >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.c

Re: indexing performance 6.6 vs 7.1

2018-01-31 Thread Rob Audenaerde
>> > -Original Message- >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] >> > Sent: Monday, January 29, 2018 11:29 AM >> > To: java-user@lucene.apache.org >> > Subject: Re: indexing performance 6.6 vs 7.1 >> > >> >

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
we > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > > Sent: Monday, January 29, 2018 11:29 AM > > To

RE: indexing performance 6.6 vs 7.1

2018-01-29 Thread Uwe Schindler
28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com] > Sent: Monday, January 29, 2018 11:29 AM > To: java-user@lucene.apache.org > Subject: Re: indexing performance 6.6 vs 7.1 > > H

Re: indexing performance 6.6 vs 7.1

2018-01-29 Thread Rob Audenaerde
t; create pivot tables on search results really fast. > >> > >> These tables have some overlapping columns, but also disjoint ones. > >> > >> We anticipated a decrease in index size because of the sparse > docvalues. We > >> see this happening, w

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
search results really fast. >> >> These tables have some overlapping columns, but also disjoint ones. >> >> We anticipated a decrease in index size because of the sparse docvalues. We >> see this happening, with decreases to ~50%-80% of the original index size. >>

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Adrien Grand
x size. > But we did not expect an drop in indexing performance (client systems > indexing time increased with +50% to +250%). > > (Our indexing-speed used to be mainly bound by the speed the Taxonomy could > deliver new ordinals for new values, currently we are investigating if this

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Robert Muir
FacetFields as well. This allows us to >> create pivot tables on search results really fast. >> >> These tables have some overlapping columns, but also disjoint ones. >> >> We anticipated a decrease in index size because of the sparse docvalues. We >> see this hap

Re: indexing performance 6.6 vs 7.1

2018-01-18 Thread Erick Erickson
a decrease in index size because of the sparse docvalues. We > see this happening, with decreases to ~50%-80% of the original index size. > But we did not expect an drop in indexing performance (client systems > indexing time increased with +50% to +250%). > > (Our indexing-speed used t

indexing performance 6.6 vs 7.1

2018-01-18 Thread Rob Audenaerde
fast. These tables have some overlapping columns, but also disjoint ones. We anticipated a decrease in index size because of the sparse docvalues. We see this happening, with decreases to ~50%-80% of the original index size. But we did not expect an drop in indexing performance (client systems

Re: How to improve indexing performance for lucene 6

2016-06-14 Thread Hans Lund
Hi Mukul There is not much information in your question. So to make a guess could you provide 1) the time it takes to fetch the docs from sql server (without doing any indexing) 2) the size of the documents. 3) what kind of analysing is done 4) why are you creating this mergepolicy - is this what

How to improve indexing performance for lucene 6

2016-06-14 Thread Mukul Ranjan
Hi, I have 150k documents in lucene index folder. It is taking 30-35 minute to rebuild the index. We are fetching this data from sql server. I have applied below parameters while getting instance of indexWriter- IndexWriterConfig indexWriterConfig = new IndexWriterConfig(getAnalyzer(callerCon

Indexing performance on HDFS

2016-04-26 Thread KORTMANN Stefan (MORPHO)
Hi, can indexing on HDFS somehow be tuned up using pluggable codecs / some customized PostingsFormat? What settings would you recommend for using Lucene 5.5 on HDFS? Regards, Stefan # " This e-mail and any attached documents may contain confidential or proprietary information. If you are not t

Lucene Indexing performance issue

2014-10-22 Thread Jason Wu
Hi Team, I am a new user of Lucene 4.8.1. I encountered a Lucene indexing performance issue which slow down my application greatly. I tried several ways from google searchs but still couldn't resolve it. Any suggestions from your experts might help me a lot. One of my application uses the l

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless wrote: > This sounds reasonable (500 M docs / 50 GB index), though you'll need > to test resulting search perf for what you want to do with it. > > To reduce merging time, maximize your IndexWriter RAM buffer > (setRAMBufferSizeMB). You could als

Re: Concurrent indexing performance problem

2013-03-07 Thread Simon Willnauer
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote: > Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size > and segments-per-tier settings and see what that does. > > The time spent merging seems to be so great though, that I'm wondering if > I'm actually better off doing the

Re: Concurrent indexing performance problem

2013-03-07 Thread Jan Stette
Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size and segments-per-tier settings and see what that does. The time spent merging seems to be so great though, that I'm wondering if I'm actually better off doing the indexing single-threaded. Am I right in thinking that no me

Re: Concurrent indexing performance problem

2013-03-07 Thread Michael McCandless
This sounds reasonable (500 M docs / 50 GB index), though you'll need to test resulting search perf for what you want to do with it. To reduce merging time, maximize your IndexWriter RAM buffer (setRAMBufferSizeMB). You could also increase the TieredMergePolicy.setSegmentsPerTier to allow more se

Concurrent indexing performance problem

2013-03-07 Thread Jan Stette
I'm seeing performance problems when indexing a certain set of data, and I'm looking for pointers on how to improve the situation. I've read the very helpful performance advice on the Wiki and I am carrying on doing experiment based on that, but I'd also ask for comments as to whether I'm heading i

RE: NumericField indexing performance

2010-04-15 Thread Uwe Schindler
, April 15, 2010 2:13 PM > To: java-user@lucene.apache.org > Subject: RE: NumericField indexing performance > > Hi Tomislav, > > when reading your mail its not 100% clear what you did wrong, but I > think the following occurred (so its no GC problem): > > You reused

RE: NumericField indexing performance

2010-04-15 Thread Uwe Schindler
http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Thursday, April 15, 2010 2:00 PM > To: java-user@lucene.apache.org > Subject: Re: NumericField indexing performance > > Hi, > > I a

Re: NumericField indexing performance

2010-04-15 Thread Otis Gospodnetic
adoop ecosystem search :: http://search-hadoop.com/ - Original Message > From: Tomislav Poljak > To: java-user@lucene.apache.org > Sent: Thu, April 15, 2010 7:41:02 AM > Subject: RE: NumericField indexing performance > > Hi Uwe, thank you very much for your answers. I

RE: NumericField indexing performance

2010-04-15 Thread Tomislav Poljak
.@thetaphi.de > > > > -Original Message- > > From: Uwe Schindler [mailto:u...@thetaphi.de] > > Sent: Wednesday, April 14, 2010 11:28 PM > > To: java-user@lucene.apache.org > > Subject: RE: NumericField indexing performance > > > > Hi Tomislav,

RE: NumericField indexing performance

2010-04-14 Thread Uwe Schindler
E: NumericField indexing performance > > Hi Tomislav, > > indexing with NumericField takes longer (at least for the default > precision step of 4, which means out of 32 bit integers make 8 subterms > with each 4 bits of the value). So you produce 8 times more terms > during

RE: NumericField indexing performance

2010-04-14 Thread Uwe Schindler
indexing performance, try larger precision Steps like 6 or 8. If you don’t use NumericRangeQuery and only want to index the numeric terms as *one* term, use precStep=Integer.MAX_VALUE. Also check your memory requirements, as the indexer may need more memory and GC costs too much. Also the index size

NumericField indexing performance

2010-04-14 Thread Tomislav Poljak
Hi, is it normal for indexing time to increase up to 10 times after introducing NumericField instead of Field (for two fields)? I've changed two date fields from String representation (Field) to NumericField, now it is: doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600)) and a

Re: indexing performance problems

2009-06-10 Thread Michael McCandless
Thanks for bringing closure! Mike On Wed, Jun 10, 2009 at 4:42 AM, Mateusz Berezecki wrote: > Hi list! > > I'm forwarding as somehow I did not put the list in the CC but the > answer I think is noteworthy, so here it is. Please remember to use > StringBuffer before blaming lucene ;-) > > Actual t

Re: indexing performance problems

2009-06-10 Thread Mateusz Berezecki
Hi list! I'm forwarding as somehow I did not put the list in the CC but the answer I think is noteworthy, so here it is. Please remember to use StringBuffer before blaming lucene ;-) Actual time consumed by lucene is now ~130 minutes as opposed to 20 hours which is neat. I can do much more passes

Re: indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi Michael, Thanks a lot for a hint. I'll test it out in a few hours and get back to you and/or the list. best, Mateusz On Mon, Jun 8, 2009 at 2:13 PM, Michael McCandless wrote: > On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote: > >> Thanks for a prompt response. > > You're welcome! > >>

Re: indexing performance problems

2009-06-08 Thread Michael McCandless
On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote: > Thanks for a prompt response. You're welcome! >> A mergeFactor of 150 is way too high; I'd put that back to 10 and see >> if the problem persists.  Also make sure you're using >> autoCommit=false, and try the suggestions here: >> >>    h

Re: indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi Michael Thanks for a prompt response. On Mon, Jun 8, 2009 at 1:27 PM, Michael McCandless wrote: > This isn't normal. > > A mergeFactor of 150 is way too high; I'd put that back to 10 and see > if the problem persists.  Also make sure you're using > autoCommit=false, and try the suggestions her

Re: indexing performance problems

2009-06-08 Thread Michael McCandless
This isn't normal. A mergeFactor of 150 is way too high; I'd put that back to 10 and see if the problem persists. Also make sure you're using autoCommit=false, and try the suggestions here: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed You're sure the JRE's heap size is big enough

indexing performance problems

2009-06-08 Thread Mateusz Berezecki
Hi list, I'm having a trouble with achieving good performance when indexing XML wikipedia dump. The indexing process works as follows 1. setup FSDirectory 2. setup IndexWriter 3. setup custom analyzer chaining wikipediatokenizer, lowercasefilter, porterstemmer, stopfilter and lengthfilter 3. crea

Re: Improving Indexing Performance

2008-12-08 Thread buFka
It is interesting and i think, it will help us :) Thanks! buFka -- View this message in context: http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20891965.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: Improving Indexing Performance

2008-12-08 Thread Karsten F.
08/06/indexing-database-using-apache-lucene.html > > The indexing takes about 4 hours. Can I speed up this process? > -- View this message in context: http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20890723.html Sent from the Lucene -

Improving Indexing Performance

2008-12-08 Thread buFka
View this message in context: http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20890720.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED

Re: 2.3.2 Indexing Performance

2008-10-01 Thread Michael McCandless
Awesome! Thanks for following up. Mike Gary Moore wrote: Finally got back to this. The great bulk of the time is spent parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M docs and feeding them to an IndexWriter took 106 minutes including a final optimization. The ind

Re: 2.3.2 Indexing Performance

2008-10-01 Thread Gary Moore
Finally got back to this. The great bulk of the time is spent parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M docs and feeding them to an IndexWriter took 106 minutes including a final optimization. The index is 5.6 GB. I'm tempted to try multiple indexing threads but

Re: 2.3.2 Indexing Performance

2008-08-08 Thread Michael McCandless
Thanks for the data point! This is expected -- alot of work went into increasing IndexWriter's throughput in 2.3. Actually, I'd expect even more speedup, if indeed Lucene is the bottleneck in your app. You could test how much time just creating/ parsing & tokenizing the docs (from whatev

2.3.2 Indexing Performance

2008-08-08 Thread Gary Moore
Parsing and indexing 4.5 million MARC/XML bibliographic records was requiring ~14 hrs. using 2.2. The same job using 2.3 takes ~ 5 hrs. on the same platform -- a quad processor Sun V440 w/8GB memory. I'm using the PerFieldAnalyzerWrapper (StandardAnalyzer and SnowballAnalyzer). I'm impress

Re: Typical Indexing performance

2008-06-06 Thread Konstantyn Smirnov
http://www.nabble.com/Typical-Indexing-performance-tp17619271p17687701.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Typical Indexing performance

2008-06-03 Thread Marcelo Ochoa
he size > of documents and number of fields, whether fields are stored or only indexed, > the IndexWriter settings for segment merging and memory usage, of course, > there is hardware, etc. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > >

Re: Typical Indexing performance

2008-06-03 Thread Otis Gospodnetic
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Simon Wistow <[EMAIL PROTECTED]> > To: Lucene > Sent: Monday, June 2, 2008 7:40:52 PM > Subject: Typical Indexing performance > > I know this is one of those "How lo

Re: Typical Indexing performance

2008-06-03 Thread Grant Ingersoll
"How long is a piece of string?" questions but I'm curious as to the order of magnitude of indexing performance. http://lucene.apache.org/java/docs/benchmarks.html seems to indicate about 100-120 docs/s is pretty good for average sized documents (say, an email or someth

Typical Indexing performance

2008-06-03 Thread Simon Wistow
I know this is one of those "How long is a piece of string?" questions but I'm curious as to the order of magnitude of indexing performance. http://lucene.apache.org/java/docs/benchmarks.html seems to indicate about 100-120 docs/s is pretty good for average sized documents (s

Re: [Fwd: Re: indexing performance]

2007-03-01 Thread Mike Klaas
On 3/1/07, Saravana <[EMAIL PROTECTED]> wrote: Is this still hold good now ? Thanks for your reply. Probably most of that still applies to some extent. However, it is unclear whether it will speed up your application. First thing is to find out what your bottleneck is. Looking at the stats

Re: [Fwd: Re: indexing performance]

2007-03-01 Thread Saravana
cene.apache.org Date: Thu, 1 Mar 2007 10:28:07 +0200 Subject: Re: indexing performance On Tue, Feb 27, 2007, Saravana wrote about "indexing performance": > Hi, > > Is it possible to scale lucene indexing like 2000/3000 documents per > second? I don't know about the actual

Re: indexing performance

2007-03-01 Thread Nadav Har'El
On Tue, Feb 27, 2007, Saravana wrote about "indexing performance": > Hi, > > Is it possible to scale lucene indexing like 2000/3000 documents per > second? I don't know about the actual numbers, but one trick I've used in the past to get really fast indexing w

Re: indexing performance

2007-02-27 Thread Chris Hostetter
: : > I am trying to index the syslogs generated from one of my busy ftp : > server so : > that I can get counts specific to an user with the given time : > frame. Since : My immediate thought when reading this is if it really is a text : search engine you want to use for this? ditto ... if you a

Re: indexing performance

2007-02-27 Thread karl wettin
27 feb 2007 kl. 16.49 skrev Saravana: I am trying to index the syslogs generated from one of my busy ftp server so that I can get counts specific to an user with the given time frame. Since my ftp server is very busy it can generate so much syslogs per second. And the important point her

Re: indexing performance

2007-02-27 Thread Saravana
Hi, I thought of getting the maximum indexing rate by lucene. However I did the test with sample strings and I am getting close to 600 documents/sec in a 512 MB RAM with 1.9 GHz Linux machine. Searching is pretty fast and I can create new index files based on user or based on time etc so that I w

Re: indexing performance

2007-02-27 Thread Erick Erickson
How do you expect anyone to be able to answer such an open-ended question? What I'd do is create a test harness that generates a random set of strings and try it. Off the top of my head, this seems like a pretty steep requirement. And at 2,000 docs a second you're going to have a huge index prett

indexing performance

2007-02-27 Thread Saravana
Hi, Is it possible to scale lucene indexing like 2000/3000 documents per second? I need to index 10 fields each with 20 bytes long. I should be able to search by just giving any of the field values as criteria. I need to get the count that has same field values. Will it be possible? with rega

Re: Question about basic indexing performance improvements

2007-02-18 Thread Erick Erickson
e documents one by one using a single threaded indexing program. Now we want to be able to index that same set of documents in much less time. I am new to Lucene, so I am just going by what I have found so far in the Lucene in Action book and on the internet. The section in the book on indexing c

Re: Question about basic indexing performance improvements

2007-02-18 Thread Nicolas Lalevée
by what I have found so far in > the Lucene in Action book and on the internet. The section in the book on > indexing concurrency says that you can share an IndexWriter object among > several threads and that the calls from these threads will be properly > synchronized. Will this in itself im

Re: indexing performance issue

2006-11-30 Thread Antony Bowesman
spinergywmy wrote: I have posted this question before and this time I found that it could be pdfbox problem and this pdfbox I downloaded doesn't use the log4j.jar. To index the app 2.13mb pdf file took me 17s and total time to upload a file is 18s. Re: PFDBox. I have a 2.5Mb test file that

Re: indexing performance issue

2006-11-30 Thread Antony Bowesman
log4j.jar cause the indexing performance and takes up a lot of memory resources. However, the latest version of pdfbox doesn't need to integrate with log4j.jar, and I thought that will actually speed up the indexing performance but the result was no. I would isolate PDFBox and do some perfor

Re: indexing performance issue

2006-11-30 Thread Grant Ingersoll
previous version of pdfbox integrate with log4j.jar file and I believe is the log4j.jar cause the indexing performance and takes up a lot of memory resources. However, the latest version of pdfbox doesn't need to integrate with log4j.jar, and I thought that will actually speed up the ind

Re: indexing performance issue

2006-11-30 Thread spinergywmy
and I believe is the log4j.jar cause the indexing performance and takes up a lot of memory resources. However, the latest version of pdfbox doesn't need to integrate with log4j.jar, and I thought that will actually speed up the indexing performance but the result was no. Please correct me i

Re: indexing performance issue

2006-11-30 Thread Grant Ingersoll
re any way or others software than pdfbox to solve the performance issue. Thanks. regards, Wooi Meng -- View this message in context: http://www.nabble.com/indexing- performance-issue-tf2730895.html#a7617155 Sent from the Lucene - Java Users mailing list a

indexing performance issue

2006-11-30 Thread spinergywmy
than pdfbox to solve the performance issue. Thanks. regards, Wooi Meng -- View this message in context: http://www.nabble.com/indexing-performance-issue-tf2730895.html#a7617155 Sent from the Lucene - Java Users mailing list archive at Nabbl

Re: Indexing Performance issue

2006-11-16 Thread Antony Bowesman
spinergywmy wrote: Hi, I having this indexing the pdf file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? If you're using the log4j PDFBox jar file, you must make

Re: Indexing Performance issue

2006-11-10 Thread Ioan Cocan
file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? Thanks regards, Wooi Meng - To unsubscribe, e

Re: Indexing Performance issue

2006-11-10 Thread Erick Erickson
wrote: > I having this indexing the pdf file performance issue. It took me more > than 10 sec to index a pdf file about 200kb. Is it because I only have a > segment file? How can I make the indexing performance better? PDFBox (which I assume you are using) can be quite slow converting lar

Re: Indexing Performance issue

2006-11-10 Thread Daniel Naber
On Friday 10 November 2006 12:18, spinergywmy wrote: >  I having this indexing the pdf file performance issue. It took me more > than 10 sec to index a pdf file about 200kb. Is it because I only have a > segment file? How can I make the indexing performance better? PDFBox (which I assum

Indexing Performance issue

2006-11-10 Thread spinergywmy
Hi, I having this indexing the pdf file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? Thanks regards, Wooi Meng -- View this message in context: http

Tuning Indexing performance question ..

2006-04-10 Thread Mufaddal Khumri
they are created. Now, reading the Lucened docs, I understand the indexing performance can be further tweaked by playing with mergeFactor, maxMergeDocs and minMergeDocs. Am I understanding this right that these three parameters effect the writing of the index to the FSDirectory and not to the

Re: Indexing performance with Lucene 1.9

2006-03-01 Thread Eric Jain
Eric Jain wrote: I'll rerun the indexing procedure with the old version overnight, just to be sure. Just to confirm: There no longer seems to be any difference in indexing performance between the nightly build and

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Otis Gospodnetic wrote: Regarding performance fix - if you can be more precise (is it really just more or less or is it as good as before), that would be great for those of us itching to use 1.9. To be more precise: The patch reduced the time required to build one large index from 13 to 11 ho

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Otis Gospodnetic wrote: Regarding performance fix - if you can be more precise (is it really > just more or less or is it as good as before), that would be great > for those of us itching to use 1.9. Yes, I can confirm that performance differs by no more than 3.1 fraggles. ;-) --

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Otis Gospodnetic
g Sent: Tue 28 Feb 2006 05:54:05 AM EST Subject: Re: Indexing performance with Lucene 1.9 Daniel Naber wrote: > A fix has now been committed to trunk in SVN, it should be part of the next > 1.9 release. Performance seems to have recovered, more or

Re: Indexing performance with Lucene 1.9

2006-02-28 Thread Eric Jain
Daniel Naber wrote: A fix has now been committed to trunk in SVN, it should be part of the next 1.9 release. Performance seems to have recovered, more or less, thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additiona

Re: Indexing performance with Lucene 1.9

2006-02-26 Thread Daniel Naber
On Samstag 25 Februar 2006 14:20, Eric Jain wrote: > After upgrading to Lucene 1.9, an index that used to take about 9h to > build now requires 13h. Any one else notice a decrease in performance? A fix has now been committed to trunk in SVN, it should be part of the next 1.9 release. Regards D

Re: Indexing performance with Lucene 1.9

2006-02-25 Thread Daniel Naber
On Samstag 25 Februar 2006 14:20, Eric Jain wrote: > After upgrading to Lucene 1.9, an index that used to take about 9h to > build now requires 13h. Any one else notice a decrease in performance? Yes, I can reproduce this with the Lucene demo on a much smaller index of 2000 documents. It (partly

Indexing performance with Lucene 1.9

2006-02-25 Thread Eric Jain
After upgrading to Lucene 1.9, an index that used to take about 9h to build now requires 13h. Any one else notice a decrease in performance? This is how I configure the IndexWriter: writer = new IndexWriter(dir, analyzer, false); writer.mergeFactor = 100; writer.minMergeDocs = 100; writ

RE: lucene indexing performance

2005-05-16 Thread Jayakumar.V
2005 1:58 AM To: java-user@lucene.apache.org Subject: Re: lucene indexing performance One immediate optimization would be to only close the writer and open the reader if the document is present. You can have a reader open and do searches while indexing (and optimization) are underway. It'

Re: lucene indexing performance

2005-04-23 Thread Chuck Williams
One immediate optimization would be to only close the writer and open the reader if the document is present. You can have a reader open and do searches while indexing (and optimization) are underway. It's just the delete operation that requires you to close the writer (so you don't have two d

lucene indexing performance

2005-04-23 Thread Jayakumar.V
Hi, Maybe this query has been answered before. My first email to this user group did not generate any response. I had forwarded it to the following email ids : [EMAIL PROTECTED] java-user@lucene.apache.org This is my second email to this mail id. Hope I've reached the right place. We a

Re: indexing performance of little documents

2005-04-01 Thread Karl Øie
ining some particular text. The natural way of doing it with lucene would be to create 1 lucene Document per line. It works well except it is too slow for my needs, even after tweaking all possible parameters of IndexWriter and using cvs version of lucene. I can get 10x the indexing performan

indexing performance of little documents

2005-04-01 Thread Fabien Le Floc'h
needs, even after tweaking all possible parameters of IndexWriter and using cvs version of lucene. I can get 10x the indexing performance by indexing the file as 1 lucene Document. Lucene builds a good index with all the terms and I am able to get the number of terms matching a query but not the