Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Also double check that it's Lucene that you should be concentrating
on.  In my experience it's often the reading of the data from a
database, if that's what you are doing, that is the bottleneck.


--
Ian.


On Wed, Sep 9, 2015 at 6:07 AM, Modassar Ather  wrote:
> There are few things you can try to improve indexing performance.
>
> 1. Try indexing documents in batches.
> 2. You can try multi-threaded indexing. What I mean to say is feed the data
> using multiple threads to the indexer.
> 3. Analysis of memory utilization and GC tuning.
>
> Following are few links which has few details on Solr indexing performance.
> http://wiki.apache.org/solr/SolrPerformanceFactors
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>
> Regards,
> Modassar
>
> On Wed, Sep 9, 2015 at 7:29 AM, Humberto Rocha  wrote:
>
>> Hi,
>>
>> I need to improve the performance of my indexing with Lucene .
>>
>> Is there any material (eg, article, book , tutorial ) that can be used for
>> this?
>>
>> Could anyone help me please ?
>>
>> Thanks a lot!
>>
>> --
>> Humberto
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Humberto Rocha
Thanks a lot !

But do you know some links that helps implement these optimization options
without the Solr (using only lucene) ?

I am using lucene 4.9.

More thanks.

Humberto


On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea  wrote:

> See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> Also double check that it's Lucene that you should be concentrating
> on.  In my experience it's often the reading of the data from a
> database, if that's what you are doing, that is the bottleneck.
>
>
> --
> Ian.
>
>
> On Wed, Sep 9, 2015 at 6:07 AM, Modassar Ather 
> wrote:
> > There are few things you can try to improve indexing performance.
> >
> > 1. Try indexing documents in batches.
> > 2. You can try multi-threaded indexing. What I mean to say is feed the
> data
> > using multiple threads to the indexer.
> > 3. Analysis of memory utilization and GC tuning.
> >
> > Following are few links which has few details on Solr indexing
> performance.
> > http://wiki.apache.org/solr/SolrPerformanceFactors
> >
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >
> > Regards,
> > Modassar
> >
> > On Wed, Sep 9, 2015 at 7:29 AM, Humberto Rocha 
> wrote:
> >
> >> Hi,
> >>
> >> I need to improve the performance of my indexing with Lucene .
> >>
> >> Is there any material (eg, article, book , tutorial ) that can be used
> for
> >> this?
> >>
> >> Could anyone help me please ?
> >>
> >> Thanks a lot!
> >>
> >> --
> >> Humberto
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Humberto Rocha


Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
The link that I sent,
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene,
not Solr.  The second item on the list is to make sure you are using
the latest version of lucene so that would be a good starting point.


--
Ian.


On Wed, Sep 9, 2015 at 3:10 PM, Humberto Rocha  wrote:
> Thanks a lot !
>
> But do you know some links that helps implement these optimization options
> without the Solr (using only lucene) ?
>
> I am using lucene 4.9.
>
> More thanks.
>
> Humberto
>
>
> On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea  wrote:
>
>> See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>>
>> Also double check that it's Lucene that you should be concentrating
>> on.  In my experience it's often the reading of the data from a
>> database, if that's what you are doing, that is the bottleneck.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Sep 9, 2015 at 6:07 AM, Modassar Ather 
>> wrote:
>> > There are few things you can try to improve indexing performance.
>> >
>> > 1. Try indexing documents in batches.
>> > 2. You can try multi-threaded indexing. What I mean to say is feed the
>> data
>> > using multiple threads to the indexer.
>> > 3. Analysis of memory utilization and GC tuning.
>> >
>> > Following are few links which has few details on Solr indexing
>> performance.
>> > http://wiki.apache.org/solr/SolrPerformanceFactors
>> >
>> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>> >
>> > Regards,
>> > Modassar
>> >
>> > On Wed, Sep 9, 2015 at 7:29 AM, Humberto Rocha 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I need to improve the performance of my indexing with Lucene .
>> >>
>> >> Is there any material (eg, article, book , tutorial ) that can be used
>> for
>> >> this?
>> >>
>> >> Could anyone help me please ?
>> >>
>> >> Thanks a lot!
>> >>
>> >> --
>> >> Humberto
>> >>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Humberto Rocha

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Humberto Rocha
Great! I will upgrade Lucene then.

I'm not using database.

Are there some java samples code ?

Samples with:

1. indexing documents in batches.
2. Multi-threaded indexing

Thanks a lot.


On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea  wrote:

> The link that I sent,
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene,
> not Solr.  The second item on the list is to make sure you are using
> the latest version of lucene so that would be a good starting point.
>
>
> --
> Ian.
>
>
> On Wed, Sep 9, 2015 at 3:10 PM, Humberto Rocha  wrote:
> > Thanks a lot !
> >
> > But do you know some links that helps implement these optimization
> options
> > without the Solr (using only lucene) ?
> >
> > I am using lucene 4.9.
> >
> > More thanks.
> >
> > Humberto
> >
> >
> > On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea  wrote:
> >
> >> See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
> >>
> >> Also double check that it's Lucene that you should be concentrating
> >> on.  In my experience it's often the reading of the data from a
> >> database, if that's what you are doing, that is the bottleneck.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Wed, Sep 9, 2015 at 6:07 AM, Modassar Ather 
> >> wrote:
> >> > There are few things you can try to improve indexing performance.
> >> >
> >> > 1. Try indexing documents in batches.
> >> > 2. You can try multi-threaded indexing. What I mean to say is feed the
> >> data
> >> > using multiple threads to the indexer.
> >> > 3. Analysis of memory utilization and GC tuning.
> >> >
> >> > Following are few links which has few details on Solr indexing
> >> performance.
> >> > http://wiki.apache.org/solr/SolrPerformanceFactors
> >> >
> >>
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >> >
> >> > Regards,
> >> > Modassar
> >> >
> >> > On Wed, Sep 9, 2015 at 7:29 AM, Humberto Rocha 
> >> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I need to improve the performance of my indexing with Lucene .
> >> >>
> >> >> Is there any material (eg, article, book , tutorial ) that can be
> used
> >> for
> >> >> this?
> >> >>
> >> >> Could anyone help me please ?
> >> >>
> >> >> Thanks a lot!
> >> >>
> >> >> --
> >> >> Humberto
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Humberto Rocha
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Humberto Rocha


Re: Improvement performance of my indexing with Lucene

2015-09-09 Thread Ian Lea
> Great! I will upgrade Lucene then.

Good start.

> I'm not using database.

Fine, but you must be getting your data from somewhere.  Maybe that is
blazingly fast, maybe it isn't.

> Are there some java samples code ?
>
> Samples with:
>
> 1. indexing documents in batches.

I think this means call IndexWriter.commit() every some-large-number
of docs rather than some-small-number.

> 2. Multi-threaded indexing

I don't have examples, but pseudocode would look something like

 IndexWriter iw = whatever
 Thread t1 = whatever(iw, data-source-1)
 Thread t2 = whatever(iw, data-source-2)
 ...
 t1.start()
 t2.start()
 ...
 wait ...
 iw.close()


--
Ian.


> On Wed, Sep 9, 2015 at 11:23 AM, Ian Lea  wrote:
>
>> The link that I sent,
>> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed is for Lucene,
>> not Solr.  The second item on the list is to make sure you are using
>> the latest version of lucene so that would be a good starting point.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Sep 9, 2015 at 3:10 PM, Humberto Rocha  wrote:
>> > Thanks a lot !
>> >
>> > But do you know some links that helps implement these optimization
>> options
>> > without the Solr (using only lucene) ?
>> >
>> > I am using lucene 4.9.
>> >
>> > More thanks.
>> >
>> > Humberto
>> >
>> >
>> > On Wed, Sep 9, 2015 at 5:23 AM, Ian Lea  wrote:
>> >
>> >> See also http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>> >>
>> >> Also double check that it's Lucene that you should be concentrating
>> >> on.  In my experience it's often the reading of the data from a
>> >> database, if that's what you are doing, that is the bottleneck.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Wed, Sep 9, 2015 at 6:07 AM, Modassar Ather 
>> >> wrote:
>> >> > There are few things you can try to improve indexing performance.
>> >> >
>> >> > 1. Try indexing documents in batches.
>> >> > 2. You can try multi-threaded indexing. What I mean to say is feed the
>> >> data
>> >> > using multiple threads to the indexer.
>> >> > 3. Analysis of memory utilization and GC tuning.
>> >> >
>> >> > Following are few links which has few details on Solr indexing
>> >> performance.
>> >> > http://wiki.apache.org/solr/SolrPerformanceFactors
>> >> >
>> >>
>> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>> >> >
>> >> > Regards,
>> >> > Modassar
>> >> >
>> >> > On Wed, Sep 9, 2015 at 7:29 AM, Humberto Rocha 
>> >> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> I need to improve the performance of my indexing with Lucene .
>> >> >>
>> >> >> Is there any material (eg, article, book , tutorial ) that can be
>> used
>> >> for
>> >> >> this?
>> >> >>
>> >> >> Could anyone help me please ?
>> >> >>
>> >> >> Thanks a lot!
>> >> >>
>> >> >> --
>> >> >> Humberto
>> >> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > Humberto Rocha
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Humberto Rocha

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Problems with toString at TermsQuery

2015-09-09 Thread Robert Muir
I think its a bug: https://issues.apache.org/jira/browse/LUCENE-6792

On Tue, Sep 8, 2015 at 10:35 AM, Ruslan Muzhikov  wrote:
> Hi!
> Sometimes TermsQuery.toString() method falls with exception:
>
> *Exception in thread "main" java.lang.AssertionError*
> * at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:546)*
> * at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:149)*
> * at org.apache.lucene.queries.TermsQuery.toString(TermsQuery.java:190)*
> * at org.apache.lucene.search.Query.toString(Query.java:67)*
> * ...*
>
>
> Here is the example of such program:
>
> *public static void main(String[] args) {*
> *System.out.print(new TermsQuery(new Term("DATA", new
> BytesRef(toBytes(128.toString());*
> *}*
>
> *public static byte[] toBytes(int val) {*
> *byte[] b = new byte[4];*
> *for(int i = 3; i > 0; i--) {*
> *b[i] = (byte) val;*
> *val >>>= 8;*
> *}*
> *b[0] = (byte) val;*
> *return b;*
> *}*
>
>
> Is there any limits on BytesRef content?
>
> Thanks,
> Ruslan Muzhikov

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene 5.2.1: FSDirectory, is it possible to open existing output for append?

2015-09-09 Thread Vlad K
Hi Uwe,

Can you please share some details about that design decision "Whenever
Lucene updates something in the index, it creates a new file". Is it right
understanding that while IndexOutput is open, Lucene continues to use the
same output/file. But after it closed (for instance application was
restarted), lucene will create a new output-file? Does it mean that in case
of many restarts, lucene will create many small files or it reads previous
one and writes to the newly created (like merge)?

I am asking because at Lucene 4.1 we used lucene way and interfaces to work
with our own data files. We have implemented RepositoryDirectory (FS and
RAM) that implemented Directory interface and provided IndexInput and
IndexOutput that we used to work with files. We write index and repository
data into bucket directory and create a new bucket directory when index +
repository reaches 1GB. That's why our raw data file size is usually 300Mb
and we appended to it after close/restart. Now to upgrade to lucene 5 and
higher we in a position to make a decision: either use our own interface to
work with repository (data files) or understand lucene internals/motivation
and continue to use it. I believe that lucene should use effective way how
it works with Directory and maybe we could continue to use it for "raw data
directory" too, but as results we may produce many small files (for every
restart) or we will need to merge too big files.

Can you point to some internals details?

Thanks!
Vladimir Kuzmin

On Wed, Sep 2, 2015 at 12:47 AM, Uwe Schindler  wrote:

> Hi,
>
> Lucene never appends to files, so this is not something that is not used
> anywhere. Whenever Lucene updates something in the index, it creates a new
> file. In earlier Lucene version there was seeking supported, but this is
> removed since Lucene 4.7 (I think). This was just a hack around some
> problems (requirement to modify header after writing file), but this is now
> solved, so seek() was removed completely. And it won't come back.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Vlad K [mailto:kuzmi...@gmail.com]
> > Sent: Wednesday, September 02, 2015 8:07 AM
> > To: java-user@lucene.apache.org
> > Subject: Lucene 5.2.1: FSDirectory, is it possible to open existing
> output for
> > append?
> >
> > FSDirectory createOutput re-creates file because it opens stream with
> > TRUNCATE_EXISTING. What is the way to open existing file and append
> > data? I used it at Lucene 4.1 to create store with raw messages. I could
> use
> > Files.newOutputStream directly to do that but I just want to understand
> > what is the idea of the design that prohibits appending to existing
> data? I
> > can't keep IndexOutput always open, at least after restart of
> application I
> > have to re-open existing data and continue to append. What is the way
> > Lucene suggest for that now?
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>