e a transaction log in parallel to
> > indexing,
> > >> so they commit very seldom. If the system crashes, the changes are
> > replayed
> > >> from tranlog since last commit.
> > >>
> > >> Uwe
> > >>
> > >>
gt; >>
> >> -
> >> Uwe Schindler
> >> Achterdiek 19, D-28357 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.c
>> > -Original Message-
>> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
>> > Sent: Monday, January 29, 2018 11:29 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: indexing performance 6.6 vs 7.1
>> >
>> >
we
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> > Sent: Monday, January 29, 2018 11:29 AM
> > To
28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Rob Audenaerde [mailto:rob.audenae...@gmail.com]
> Sent: Monday, January 29, 2018 11:29 AM
> To: java-user@lucene.apache.org
> Subject: Re: indexing performance 6.6 vs 7.1
>
> H
t; create pivot tables on search results really fast.
> >>
> >> These tables have some overlapping columns, but also disjoint ones.
> >>
> >> We anticipated a decrease in index size because of the sparse
> docvalues. We
> >> see this happening, w
search results really fast.
>>
>> These tables have some overlapping columns, but also disjoint ones.
>>
>> We anticipated a decrease in index size because of the sparse docvalues. We
>> see this happening, with decreases to ~50%-80% of the original index size.
>>
x size.
> But we did not expect an drop in indexing performance (client systems
> indexing time increased with +50% to +250%).
>
> (Our indexing-speed used to be mainly bound by the speed the Taxonomy could
> deliver new ordinals for new values, currently we are investigating if this
FacetFields as well. This allows us to
>> create pivot tables on search results really fast.
>>
>> These tables have some overlapping columns, but also disjoint ones.
>>
>> We anticipated a decrease in index size because of the sparse docvalues. We
>> see this hap
a decrease in index size because of the sparse docvalues. We
> see this happening, with decreases to ~50%-80% of the original index size.
> But we did not expect an drop in indexing performance (client systems
> indexing time increased with +50% to +250%).
>
> (Our indexing-speed used t
fast.
These tables have some overlapping columns, but also disjoint ones.
We anticipated a decrease in index size because of the sparse docvalues. We
see this happening, with decreases to ~50%-80% of the original index size.
But we did not expect an drop in indexing performance (client systems
Hi Mukul
There is not much information in your question. So to make a guess could
you provide
1) the time it takes to fetch the docs from sql server (without doing any
indexing)
2) the size of the documents.
3) what kind of analysing is done
4) why are you creating this mergepolicy - is this what
Hi,
I have 150k documents in lucene index folder. It is taking 30-35 minute to
rebuild the index. We are fetching this data from sql server.
I have applied below parameters while getting instance of indexWriter-
IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(getAnalyzer(callerCon
Hi,
can indexing on HDFS somehow be tuned up using pluggable codecs / some
customized PostingsFormat? What settings would you recommend for using Lucene
5.5 on HDFS?
Regards,
Stefan
#
" This e-mail and any attached documents may contain confidential or
proprietary information. If you are not t
Hi Team,
I am a new user of Lucene 4.8.1. I encountered a Lucene indexing
performance issue which slow down my application greatly. I tried several
ways from google searchs but still couldn't resolve it. Any suggestions
from your experts might help me a lot.
One of my application uses the l
On Thu, Mar 7, 2013 at 6:44 PM, Michael McCandless
wrote:
> This sounds reasonable (500 M docs / 50 GB index), though you'll need
> to test resulting search perf for what you want to do with it.
>
> To reduce merging time, maximize your IndexWriter RAM buffer
> (setRAMBufferSizeMB). You could als
On Thu, Mar 7, 2013 at 7:06 PM, Jan Stette wrote:
> Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
> and segments-per-tier settings and see what that does.
>
> The time spent merging seems to be so great though, that I'm wondering if
> I'm actually better off doing the
Thanks for your suggestions, Mike, I'll experiment with the RAM buffer size
and segments-per-tier settings and see what that does.
The time spent merging seems to be so great though, that I'm wondering if
I'm actually better off doing the indexing single-threaded. Am I right in
thinking that no me
This sounds reasonable (500 M docs / 50 GB index), though you'll need
to test resulting search perf for what you want to do with it.
To reduce merging time, maximize your IndexWriter RAM buffer
(setRAMBufferSizeMB). You could also increase the
TieredMergePolicy.setSegmentsPerTier to allow more se
I'm seeing performance problems when indexing a certain set of data, and
I'm looking for pointers on how to improve the situation. I've read the
very helpful performance advice on the Wiki and I am carrying on doing
experiment based on that, but I'd also ask for comments as to whether I'm
heading i
, April 15, 2010 2:13 PM
> To: java-user@lucene.apache.org
> Subject: RE: NumericField indexing performance
>
> Hi Tomislav,
>
> when reading your mail its not 100% clear what you did wrong, but I
> think the following occurred (so its no GC problem):
>
> You reused
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> Sent: Thursday, April 15, 2010 2:00 PM
> To: java-user@lucene.apache.org
> Subject: Re: NumericField indexing performance
>
> Hi,
>
> I a
adoop ecosystem search :: http://search-hadoop.com/
- Original Message
> From: Tomislav Poljak
> To: java-user@lucene.apache.org
> Sent: Thu, April 15, 2010 7:41:02 AM
> Subject: RE: NumericField indexing performance
>
> Hi Uwe,
thank you very much for your answers. I
.@thetaphi.de
>
>
> > -Original Message-
> > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > Sent: Wednesday, April 14, 2010 11:28 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: NumericField indexing performance
> >
> > Hi Tomislav,
E: NumericField indexing performance
>
> Hi Tomislav,
>
> indexing with NumericField takes longer (at least for the default
> precision step of 4, which means out of 32 bit integers make 8 subterms
> with each 4 bits of the value). So you produce 8 times more terms
> during
indexing performance,
try larger precision Steps like 6 or 8. If you don’t use NumericRangeQuery and
only want to index the numeric terms as *one* term, use
precStep=Integer.MAX_VALUE. Also check your memory requirements, as the indexer
may need more memory and GC costs too much. Also the index size
Hi,
is it normal for indexing time to increase up to 10 times after
introducing NumericField instead of Field (for two fields)?
I've changed two date fields from String representation (Field) to
NumericField, now it is:
doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600))
and a
Thanks for bringing closure!
Mike
On Wed, Jun 10, 2009 at 4:42 AM, Mateusz Berezecki wrote:
> Hi list!
>
> I'm forwarding as somehow I did not put the list in the CC but the
> answer I think is noteworthy, so here it is. Please remember to use
> StringBuffer before blaming lucene ;-)
>
> Actual t
Hi list!
I'm forwarding as somehow I did not put the list in the CC but the
answer I think is noteworthy, so here it is. Please remember to use
StringBuffer before blaming lucene ;-)
Actual time consumed by lucene is now ~130 minutes as opposed to 20
hours which is neat. I can do much more passes
Hi Michael,
Thanks a lot for a hint. I'll test it out in a few hours and get back
to you and/or the list.
best,
Mateusz
On Mon, Jun 8, 2009 at 2:13 PM, Michael
McCandless wrote:
> On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote:
>
>> Thanks for a prompt response.
>
> You're welcome!
>
>>
On Mon, Jun 8, 2009 at 7:54 AM, Mateusz Berezecki wrote:
> Thanks for a prompt response.
You're welcome!
>> A mergeFactor of 150 is way too high; I'd put that back to 10 and see
>> if the problem persists. Also make sure you're using
>> autoCommit=false, and try the suggestions here:
>>
>> h
Hi Michael
Thanks for a prompt response.
On Mon, Jun 8, 2009 at 1:27 PM, Michael
McCandless wrote:
> This isn't normal.
>
> A mergeFactor of 150 is way too high; I'd put that back to 10 and see
> if the problem persists. Also make sure you're using
> autoCommit=false, and try the suggestions her
This isn't normal.
A mergeFactor of 150 is way too high; I'd put that back to 10 and see
if the problem persists. Also make sure you're using
autoCommit=false, and try the suggestions here:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
You're sure the JRE's heap size is big enough
Hi list,
I'm having a trouble with achieving good performance when indexing XML
wikipedia dump.
The indexing process works as follows
1. setup FSDirectory
2. setup IndexWriter
3. setup custom analyzer chaining wikipediatokenizer, lowercasefilter,
porterstemmer, stopfilter and lengthfilter
3. crea
It is interesting and i think, it will help us :)
Thanks!
buFka
--
View this message in context:
http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20891965.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
08/06/indexing-database-using-apache-lucene.html
>
> The indexing takes about 4 hours. Can I speed up this process?
>
--
View this message in context:
http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20890723.html
Sent from the Lucene -
View this message in context:
http://www.nabble.com/Improving-Indexing-Performance-tp20890720p20890720.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED
Awesome! Thanks for following up.
Mike
Gary Moore wrote:
Finally got back to this. The great bulk of the time is spent
parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M
docs and feeding them to an IndexWriter took 106 minutes including a
final optimization. The ind
Finally got back to this. The great bulk of the time is spent
parsing/tokenizing. So, using 10 threads parsing/analyzing the 4.5M
docs and feeding them to an IndexWriter took 106 minutes including a
final optimization. The index is 5.6 GB. I'm tempted to try multiple
indexing threads but
Thanks for the data point!
This is expected -- alot of work went into increasing IndexWriter's
throughput in 2.3.
Actually, I'd expect even more speedup, if indeed Lucene is the
bottleneck in your app. You could test how much time just creating/
parsing & tokenizing the docs (from whatev
Parsing and indexing 4.5 million MARC/XML bibliographic records was
requiring ~14 hrs. using 2.2. The same job using 2.3 takes ~ 5 hrs. on
the same platform -- a quad processor Sun V440 w/8GB memory. I'm
using the PerFieldAnalyzerWrapper (StandardAnalyzer and SnowballAnalyzer).
I'm impress
http://www.nabble.com/Typical-Indexing-performance-tp17619271p17687701.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
he size
> of documents and number of fields, whether fields are stored or only indexed,
> the IndexWriter settings for segment merging and memory usage, of course,
> there is hardware, etc.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Simon Wistow <[EMAIL PROTECTED]>
> To: Lucene
> Sent: Monday, June 2, 2008 7:40:52 PM
> Subject: Typical Indexing performance
>
> I know this is one of those "How lo
"How long is a piece of string?" questions
but I'm curious as to the order of magnitude of indexing performance.
http://lucene.apache.org/java/docs/benchmarks.html
seems to indicate about 100-120 docs/s is pretty good for average
sized
documents (say, an email or someth
I know this is one of those "How long is a piece of string?" questions
but I'm curious as to the order of magnitude of indexing performance.
http://lucene.apache.org/java/docs/benchmarks.html
seems to indicate about 100-120 docs/s is pretty good for average sized
documents (s
On 3/1/07, Saravana <[EMAIL PROTECTED]> wrote:
Is this still hold good now ? Thanks for your reply.
Probably most of that still applies to some extent. However, it is
unclear whether it will speed up your application.
First thing is to find out what your bottleneck is. Looking at the
stats
cene.apache.org
Date: Thu, 1 Mar 2007 10:28:07 +0200
Subject: Re: indexing performance
On Tue, Feb 27, 2007, Saravana wrote about "indexing performance":
> Hi,
>
> Is it possible to scale lucene indexing like 2000/3000 documents per
> second?
I don't know about the actual
On Tue, Feb 27, 2007, Saravana wrote about "indexing performance":
> Hi,
>
> Is it possible to scale lucene indexing like 2000/3000 documents per
> second?
I don't know about the actual numbers, but one trick I've used in the past
to get really fast indexing w
:
: > I am trying to index the syslogs generated from one of my busy ftp
: > server so
: > that I can get counts specific to an user with the given time
: > frame. Since
: My immediate thought when reading this is if it really is a text
: search engine you want to use for this?
ditto ... if you a
27 feb 2007 kl. 16.49 skrev Saravana:
I am trying to index the syslogs generated from one of my busy ftp
server so
that I can get counts specific to an user with the given time
frame. Since
my ftp server is very busy it can generate so much syslogs per
second. And
the important point her
Hi,
I thought of getting the maximum indexing rate by lucene. However I did the
test with sample strings and I am getting close to 600 documents/sec in a
512 MB RAM with 1.9 GHz Linux machine. Searching is pretty fast and I can
create new index files based on user or based on time etc so that I w
How do you expect anyone to be able to answer such an open-ended
question? What I'd do is create a test harness that generates a random
set of strings and try it.
Off the top of my head, this seems like a pretty steep requirement. And
at 2,000 docs a second you're going to have a huge index prett
Hi,
Is it possible to scale lucene indexing like 2000/3000 documents per
second? I need to index 10 fields each with 20 bytes long. I should be
able to search by just giving any of the field values as criteria. I need to
get the count that has same field values.
Will it be possible?
with rega
e documents one by one using a single
threaded indexing program.
Now we want to be able to index that same set of documents in much less
time. I am new to Lucene, so I am just going by what I have found so far
in
the Lucene in Action book and on the internet. The section in the book on
indexing c
by what I have found so far in
> the Lucene in Action book and on the internet. The section in the book on
> indexing concurrency says that you can share an IndexWriter object among
> several threads and that the calls from these threads will be properly
> synchronized. Will this in itself im
spinergywmy wrote:
I have posted this question before and this time I found that it could be
pdfbox problem and this pdfbox I downloaded doesn't use the log4j.jar. To
index the app 2.13mb pdf file took me 17s and total time to upload a file is
18s.
Re: PFDBox.
I have a 2.5Mb test file that
log4j.jar cause the indexing performance and takes up a lot of memory
resources. However, the latest version of pdfbox doesn't need to
integrate
with log4j.jar, and I thought that will actually speed up the indexing
performance but the result was no.
I would isolate PDFBox and do some perfor
previous
version of pdfbox integrate with log4j.jar file and I believe is the
log4j.jar cause the indexing performance and takes up a lot of memory
resources. However, the latest version of pdfbox doesn't need to
integrate
with log4j.jar, and I thought that will actually speed up the ind
and I believe is the
log4j.jar cause the indexing performance and takes up a lot of memory
resources. However, the latest version of pdfbox doesn't need to integrate
with log4j.jar, and I thought that will actually speed up the indexing
performance but the result was no.
Please correct me i
re any way or others software than pdfbox to solve the
performance issue.
Thanks.
regards,
Wooi Meng
--
View this message in context: http://www.nabble.com/indexing-
performance-issue-tf2730895.html#a7617155
Sent from the Lucene - Java Users mailing list a
than pdfbox to solve the
performance issue.
Thanks.
regards,
Wooi Meng
--
View this message in context:
http://www.nabble.com/indexing-performance-issue-tf2730895.html#a7617155
Sent from the Lucene - Java Users mailing list archive at Nabbl
spinergywmy wrote:
Hi,
I having this indexing the pdf file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
If you're using the log4j PDFBox jar file, you must make
file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
Thanks
regards,
Wooi Meng
-
To unsubscribe, e
wrote:
> I having this indexing the pdf file performance issue. It took me more
> than 10 sec to index a pdf file about 200kb. Is it because I only have a
> segment file? How can I make the indexing performance better?
PDFBox (which I assume you are using) can be quite slow converting lar
On Friday 10 November 2006 12:18, spinergywmy wrote:
> I having this indexing the pdf file performance issue. It took me more
> than 10 sec to index a pdf file about 200kb. Is it because I only have a
> segment file? How can I make the indexing performance better?
PDFBox (which I assum
Hi,
I having this indexing the pdf file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
Thanks
regards,
Wooi Meng
--
View this message in context:
http
they are created.
Now, reading the Lucened docs, I understand the indexing performance can
be further tweaked by playing with mergeFactor, maxMergeDocs and
minMergeDocs. Am I understanding this right that these three parameters
effect the writing of the index to the FSDirectory and not to the
Eric Jain wrote:
I'll rerun the indexing
procedure with the old version overnight, just to be sure.
Just to confirm: There no longer seems to be any difference in indexing
performance between the nightly build and
Otis Gospodnetic wrote:
Regarding performance fix - if you can be more precise (is it really
just more or less or is it as good as before), that would be great
for those of us itching to use 1.9.
To be more precise: The patch reduced the time required to build one large
index from 13 to 11 ho
Otis Gospodnetic wrote:
Regarding performance fix - if you can be more precise (is it really
> just more or less or is it as good as before), that would be great
> for those of us itching to use 1.9.
Yes, I can confirm that performance differs by no more than 3.1 fraggles.
;-)
--
g
Sent: Tue 28 Feb 2006 05:54:05 AM EST
Subject: Re: Indexing performance with Lucene 1.9
Daniel Naber wrote:
> A fix has now been committed to trunk in SVN, it should be part of the next
> 1.9 release.
Performance seems to have recovered, more or
Daniel Naber wrote:
A fix has now been committed to trunk in SVN, it should be part of the next
1.9 release.
Performance seems to have recovered, more or less, thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additiona
On Samstag 25 Februar 2006 14:20, Eric Jain wrote:
> After upgrading to Lucene 1.9, an index that used to take about 9h to
> build now requires 13h. Any one else notice a decrease in performance?
A fix has now been committed to trunk in SVN, it should be part of the next
1.9 release.
Regards
D
On Samstag 25 Februar 2006 14:20, Eric Jain wrote:
> After upgrading to Lucene 1.9, an index that used to take about 9h to
> build now requires 13h. Any one else notice a decrease in performance?
Yes, I can reproduce this with the Lucene demo on a much smaller index of
2000 documents. It (partly
After upgrading to Lucene 1.9, an index that used to take about 9h to build
now requires 13h. Any one else notice a decrease in performance?
This is how I configure the IndexWriter:
writer = new IndexWriter(dir, analyzer, false);
writer.mergeFactor = 100;
writer.minMergeDocs = 100;
writ
2005 1:58 AM
To: java-user@lucene.apache.org
Subject: Re: lucene indexing performance
One immediate optimization would be to only close the writer and open
the reader if the document is present. You can have a reader open and
do searches while indexing (and optimization) are underway. It'
One immediate optimization would be to only close the writer and open
the reader if the document is present. You can have a reader open and
do searches while indexing (and optimization) are underway. It's just
the delete operation that requires you to close the writer (so you don't
have two d
Hi,
Maybe this query has been answered before. My first email to this user group
did not generate any response. I had forwarded it to the following email ids
:
[EMAIL PROTECTED]
java-user@lucene.apache.org
This is my second email to this mail id. Hope I've reached the right place.
We a
ining some particular text. The natural way of doing it with
lucene
would be to create 1 lucene Document per line. It works well except it
is too slow for my needs, even after tweaking all possible parameters
of
IndexWriter and using cvs version of lucene.
I can get 10x the indexing performan
needs, even after tweaking all possible parameters of
IndexWriter and using cvs version of lucene.
I can get 10x the indexing performance by indexing the file as 1 lucene
Document. Lucene builds a good index with all the terms and I am able to
get the number of terms matching a query but not the
81 matches
Mail list logo