Re: LUCENE 6322 & Version 4.X

2015-09-29 Thread Adrien Grand
Patching Lucene 4 would be quite hard I'm afraid.


Le mer. 16 sept. 2015 à 16:58, Sascha Janz  a écrit :

> Hi,
>
> we use lucene 4.6 in our project. we got some perfomamce problems with
> IndexSearcher.doc(int docID, SetfieldsToLoad). i found this issue
> https://issues.apache.org/jira/browse/LUCENE-6322
>
> and may be  that is our problem. is it possible to patch lucene 4.X with
> new source of CompressingStoredFieldsReader? or what can we do?
>
> greetings
>
> Sascha
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread Adrien Grand
Indeed this is new but I'm a bit surprised this is the source of your
issues as it should be much faster than the merge itself. I don't
understand your proposal to check the index after merge: the goal is to
make sure that we do not propagate corruptions so it's better to check the
index before the merge starts so that we don't even try to merge if there
are corruptions?

Le mar. 15 sept. 2015 à 00:40, Selva Kumar 
a écrit :

> it appears Lucene 5.2 index merge is running checkIntegrity on existing
> index prior to merging additional indices.
> This seems to be new.
>
> We have an existing checkIndex but this is run post index merge.
>
> Two follow up questions :
> * Is there way to turn off built-in checkIntegrity? Just for my understand.
> No plan to turn this off.
> * Is running checkIntegrity prior to index merge better than running post
> merge?
>
>
> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar <
> selva.kumar.at.w...@gmail.com
> > wrote:
>
> > We observe some merge slowness after we migrated from 4.10 to 5.2.
> > Is this expected? Any new tunable merge parameters in Lucene 5 ?
> >
> > -Selva
> >
> >
>


Re: 5.3.1 artifacts in maven central

2015-09-29 Thread Terry Smith
Noble,

Everything looks good now, thank you.

--Terry


On Tue, Sep 29, 2015 at 1:26 AM, Noble Paul  wrote:

> Please check now
>
> On Mon, Sep 28, 2015 at 8:42 PM, Noble Paul  wrote:
> > Looks like I missed it , I shall upload it soon
> >
> >
> > On Mon, Sep 28, 2015 at 7:59 PM, Terry Smith  wrote:
> >> Guys,
> >>
> >> I'm unable to find the 5.3.1 artifacts in maven central. Here is the
> search
> >> url for org.apache.lucene:lucene-core, the most recent version listed is
> >> 5.3.0.
> >>
> >>
> http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.lucene%22%20AND%20a%3A%22lucene-core%22
> >>
> >> Am I doing something wrong or are the artifacts not yet published?
> >>
> >> --Terry
> >
> >
> >
> > --
> > -
> > Noble Paul
>
>
>
> --
> -
> Noble Paul
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
So, if its new, it adds to pre-existing time? So it is a cost that needs to be 
understood I think.

 

And, I'm really curious, what happens to the result of the post merge 
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if 
you let it merge anyway could you get a false positive for integrity?  [see the 
concept of lazy-evaluation]

 

These are, imo, the kinds of engineering questions Selva's post raised in my 
triage mode of the scenario.

 

 

-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com] 
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?

 

Indeed this is new but I'm a bit surprised this is the source of your issues as 
it should be much faster than the merge itself. I don't understand your 
proposal to check the index after merge: the goal is to make sure that we do 
not propagate corruptions so it's better to check the index before the merge 
starts so that we don't even try to merge if there are corruptions?

 

Le mar. 15 sept. 2015 à 00:40, Selva Kumar < 
 selva.kumar.at.w...@gmail.com> a écrit :

 

> it appears Lucene 5.2 index merge is running checkIntegrity on 

> existing index prior to merging additional indices.

> This seems to be new.

> 

> We have an existing checkIndex but this is run post index merge.

> 

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running 

> post merge?

> 

> 

> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar < 

>   selva.kumar.at.w...@gmail.com

> > wrote:

> 

> > We observe some merge slowness after we migrated from 4.10 to 5.2.

> > Is this expected? Any new tunable merge parameters in Lucene 5 ?

> >

> > -Selva

> >

> >

> 



Re: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread McKinley, James T
Hi Adrien and Will,

Thanks for your responses.  I work with Selva and he's busy right now with 
other things, so I'll add some more context to his question in an attempt to 
improve clarity.

The merge in question is part of our batch indexing workflow wherein we index 
new content for a given partition and then merge this new index with the big 
index of everything that was previously loaded on the given partition.  The 
increase in merge time we've seen since upgrading from 4.10 to 5.2 is on the 
order of 25%.  It varies from partition to partition, but 25% is a good 
ballpark estimate I think.  Maybe our case is non-standard, we have a large 
number of fields (> 200).

The reason we perform an index check after the merge is that this is the final 
index state that will be used for a given batch.  Since we have a 
batch-oriented workflow we are able to roll back to a previous batch if we find 
a problem with a given batch (Lucene or other problem).  However due to disk 
space constraints we can only keep a couple batches.  If our indexing workflow 
completes without errors but the index is corrupt, we may not know right away 
and we might delete the previous good batch thinking the latest batch is OK, 
which would be very bad requiring a full reload of all our content.

Checking the index prior to the merge would no doubt catch many issues, but it 
might not catch corruption that occurs during the merge step itself, so we 
implemented a check step once the index is in its final state to ensure that it 
is OK.

So, since we want to do the check post-merge, is there a way to disable the 
check during merge so we don't have to do two checks?

Thanks!

Jim 


From: will martin 
Sent: 29 September 2015 12:08
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

So, if its new, it adds to pre-existing time? So it is a cost that needs to be 
understood I think.



And, I'm really curious, what happens to the result of the post merge 
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean if 
you let it merge anyway could you get a false positive for integrity?  [see the 
concept of lazy-evaluation]



These are, imo, the kinds of engineering questions Selva's post raised in my 
triage mode of the scenario.





-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?



Indeed this is new but I'm a bit surprised this is the source of your issues as 
it should be much faster than the merge itself. I don't understand your 
proposal to check the index after merge: the goal is to make sure that we do 
not propagate corruptions so it's better to check the index before the merge 
starts so that we don't even try to merge if there are corruptions?



Le mar. 15 sept. 2015 à 00:40, Selva Kumar < 
 selva.kumar.at.w...@gmail.com> a écrit :



> it appears Lucene 5.2 index merge is running checkIntegrity on

> existing index prior to merging additional indices.

> This seems to be new.

>

> We have an existing checkIndex but this is run post index merge.

>

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running

> post merge?

>

>

> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar <

>   selva.kumar.at.w...@gmail.com

> > wrote:

>

> > We observe some merge slowness after we migrated from 4.10 to 5.2.

> > Is this expected? Any new tunable merge parameters in Lucene 5 ?

> >

> > -Selva

> >

> >

>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
This sounds robust. Is the index batch creation workflow a separate process?
Distributed shared filesystems?

--will

-Original Message-
From: McKinley, James T [mailto:james.mckin...@cengage.com] 
Sent: Tuesday, September 29, 2015 2:22 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?

Hi Adrien and Will,

Thanks for your responses.  I work with Selva and he's busy right now with
other things, so I'll add some more context to his question in an attempt to
improve clarity.

The merge in question is part of our batch indexing workflow wherein we
index new content for a given partition and then merge this new index with
the big index of everything that was previously loaded on the given
partition.  The increase in merge time we've seen since upgrading from 4.10
to 5.2 is on the order of 25%.  It varies from partition to partition, but
25% is a good ballpark estimate I think.  Maybe our case is non-standard, we
have a large number of fields (> 200).

The reason we perform an index check after the merge is that this is the
final index state that will be used for a given batch.  Since we have a
batch-oriented workflow we are able to roll back to a previous batch if we
find a problem with a given batch (Lucene or other problem).  However due to
disk space constraints we can only keep a couple batches.  If our indexing
workflow completes without errors but the index is corrupt, we may not know
right away and we might delete the previous good batch thinking the latest
batch is OK, which would be very bad requiring a full reload of all our
content.

Checking the index prior to the merge would no doubt catch many issues, but
it might not catch corruption that occurs during the merge step itself, so
we implemented a check step once the index is in its final state to ensure
that it is OK.

So, since we want to do the check post-merge, is there a way to disable the
check during merge so we don't have to do two checks?

Thanks!

Jim 


From: will martin 
Sent: 29 September 2015 12:08
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

So, if its new, it adds to pre-existing time? So it is a cost that needs to
be understood I think.



And, I'm really curious, what happens to the result of the post merge
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean
if you let it merge anyway could you get a false positive for integrity?
[see the concept of lazy-evaluation]



These are, imo, the kinds of engineering questions Selva's post raised in my
triage mode of the scenario.





-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?



Indeed this is new but I'm a bit surprised this is the source of your issues
as it should be much faster than the merge itself. I don't understand your
proposal to check the index after merge: the goal is to make sure that we do
not propagate corruptions so it's better to check the index before the merge
starts so that we don't even try to merge if there are corruptions?



Le mar. 15 sept. 2015 à 00:40, Selva Kumar <
 selva.kumar.at.w...@gmail.com> a
écrit :



> it appears Lucene 5.2 index merge is running checkIntegrity on

> existing index prior to merging additional indices.

> This seems to be new.

>

> We have an existing checkIndex but this is run post index merge.

>

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my
understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running

> post merge?

>

>

> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar <

>   selva.kumar.at.w...@gmail.com

> > wrote:

>

> > We observe some merge slowness after we migrated from 4.10 to 5.2.

> > Is this expected? Any new tunable merge parameters in Lucene 5 ?

> >

> > -Selva

> >

> >

>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread McKinley, James T
Yes, the indexing workflow is completely separate from the runtime system.  The 
file system is EMC Isilon via NFS.

Jim


From: will martin 
Sent: 29 September 2015 14:29
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

This sounds robust. Is the index batch creation workflow a separate process?
Distributed shared filesystems?

--will

-Original Message-
From: McKinley, James T [mailto:james.mckin...@cengage.com]
Sent: Tuesday, September 29, 2015 2:22 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?

Hi Adrien and Will,

Thanks for your responses.  I work with Selva and he's busy right now with
other things, so I'll add some more context to his question in an attempt to
improve clarity.

The merge in question is part of our batch indexing workflow wherein we
index new content for a given partition and then merge this new index with
the big index of everything that was previously loaded on the given
partition.  The increase in merge time we've seen since upgrading from 4.10
to 5.2 is on the order of 25%.  It varies from partition to partition, but
25% is a good ballpark estimate I think.  Maybe our case is non-standard, we
have a large number of fields (> 200).

The reason we perform an index check after the merge is that this is the
final index state that will be used for a given batch.  Since we have a
batch-oriented workflow we are able to roll back to a previous batch if we
find a problem with a given batch (Lucene or other problem).  However due to
disk space constraints we can only keep a couple batches.  If our indexing
workflow completes without errors but the index is corrupt, we may not know
right away and we might delete the previous good batch thinking the latest
batch is OK, which would be very bad requiring a full reload of all our
content.

Checking the index prior to the merge would no doubt catch many issues, but
it might not catch corruption that occurs during the merge step itself, so
we implemented a check step once the index is in its final state to ensure
that it is OK.

So, since we want to do the check post-merge, is there a way to disable the
check during merge so we don't have to do two checks?

Thanks!

Jim


From: will martin 
Sent: 29 September 2015 12:08
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

So, if its new, it adds to pre-existing time? So it is a cost that needs to
be understood I think.



And, I'm really curious, what happens to the result of the post merge
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean
if you let it merge anyway could you get a false positive for integrity?
[see the concept of lazy-evaluation]



These are, imo, the kinds of engineering questions Selva's post raised in my
triage mode of the scenario.





-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?



Indeed this is new but I'm a bit surprised this is the source of your issues
as it should be much faster than the merge itself. I don't understand your
proposal to check the index after merge: the goal is to make sure that we do
not propagate corruptions so it's better to check the index before the merge
starts so that we don't even try to merge if there are corruptions?



Le mar. 15 sept. 2015 à 00:40, Selva Kumar <
 selva.kumar.at.w...@gmail.com> a
écrit :



> it appears Lucene 5.2 index merge is running checkIntegrity on

> existing index prior to merging additional indices.

> This seems to be new.

>

> We have an existing checkIndex but this is run post index merge.

>

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my
understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running

> post merge?

>

>

> On Mon, Sep 14, 2015 at 12:24 PM, Selva Kumar <

>   selva.kumar.at.w...@gmail.com

> > wrote:

>

> > We observe some merge slowness after we migrated from 4.10 to 5.2.

> > Is this expected? Any new tunable merge parameters in Lucene 5 ?

> >

> > -Selva

> >

> >

>


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsu

RE: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread will martin
Ok So I'm a little confused:

The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on a
flag to setCheckIntegrityAtMerge ... 

Method states it controls pre-merge cost.

Ref: 

https://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/LiveIndex
WriterConfig.html#setCheckIntegrityAtMerge%28boolean%29

And it seems to be gone in 5.3 folks? Meaning Adrien's comment is a whole
lot significant? Merges ALWAYS pre-merge CheckIntegrity? Is this a 5.0
feature drop? You can't deprecate, um, er totally remove an index time audit
feature on a point release of any level IMHO.


-Original Message-
From: McKinley, James T [mailto:james.mckin...@cengage.com] 
Sent: Tuesday, September 29, 2015 2:42 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?

Yes, the indexing workflow is completely separate from the runtime system.
The file system is EMC Isilon via NFS.

Jim


From: will martin 
Sent: 29 September 2015 14:29
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

This sounds robust. Is the index batch creation workflow a separate process?
Distributed shared filesystems?

--will

-Original Message-
From: McKinley, James T [mailto:james.mckin...@cengage.com]
Sent: Tuesday, September 29, 2015 2:22 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?

Hi Adrien and Will,

Thanks for your responses.  I work with Selva and he's busy right now with
other things, so I'll add some more context to his question in an attempt to
improve clarity.

The merge in question is part of our batch indexing workflow wherein we
index new content for a given partition and then merge this new index with
the big index of everything that was previously loaded on the given
partition.  The increase in merge time we've seen since upgrading from 4.10
to 5.2 is on the order of 25%.  It varies from partition to partition, but
25% is a good ballpark estimate I think.  Maybe our case is non-standard, we
have a large number of fields (> 200).

The reason we perform an index check after the merge is that this is the
final index state that will be used for a given batch.  Since we have a
batch-oriented workflow we are able to roll back to a previous batch if we
find a problem with a given batch (Lucene or other problem).  However due to
disk space constraints we can only keep a couple batches.  If our indexing
workflow completes without errors but the index is corrupt, we may not know
right away and we might delete the previous good batch thinking the latest
batch is OK, which would be very bad requiring a full reload of all our
content.

Checking the index prior to the merge would no doubt catch many issues, but
it might not catch corruption that occurs during the merge step itself, so
we implemented a check step once the index is in its final state to ensure
that it is OK.

So, since we want to do the check post-merge, is there a way to disable the
check during merge so we don't have to do two checks?

Thanks!

Jim


From: will martin 
Sent: 29 September 2015 12:08
To: java-user@lucene.apache.org
Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?

So, if its new, it adds to pre-existing time? So it is a cost that needs to
be understood I think.



And, I'm really curious, what happens to the result of the post merge
checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean
if you let it merge anyway could you get a false positive for integrity?
[see the concept of lazy-evaluation]



These are, imo, the kinds of engineering questions Selva's post raised in my
triage mode of the scenario.





-Original Message-
From: Adrien Grand [mailto:jpou...@gmail.com]
Sent: Tuesday, September 29, 2015 8:46 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?



Indeed this is new but I'm a bit surprised this is the source of your issues
as it should be much faster than the merge itself. I don't understand your
proposal to check the index after merge: the goal is to make sure that we do
not propagate corruptions so it's better to check the index before the merge
starts so that we don't even try to merge if there are corruptions?



Le mar. 15 sept. 2015 à 00:40, Selva Kumar <
 selva.kumar.at.w...@gmail.com> a
écrit :



> it appears Lucene 5.2 index merge is running checkIntegrity on

> existing index prior to merging additional indices.

> This seems to be new.

>

> We have an existing checkIndex but this is run post index merge.

>

> Two follow up questions :

> * Is there way to turn off built-in checkIntegrity? Just for my
understand.

> No plan to turn this off.

> * Is running checkIntegrity prior to index merge better than running

> post merge?

>


Re: Lucene 5 : any merge performance metrics compared to 4.x?

2015-09-29 Thread Michael McCandless
No, it is not possible to disable, and, yes, we removed that API in
5.x because 1) the risk of silent index corruption is too high to
warrant this small optimization and 2) we re-worked how merging works
so that this checkIntegrity has IO locality with what's being merged
next.

There were other performance gains for merging in 5.x, e.g. using much
less memory in the many-fields case, not decompressing + recompressing
stored fields and term vectors, etc.

As Adrien pointed out, the cost should be much lower than 25% for a
local filesystem ... I suspect something about your NFS setup is
making it more costly.

NFS is in general a dangerous filesystem to use with Lucene (no delete
on last close, locking is tricky to get right, incoherent client file
contents and directory listing caching).

If you want to also checkIntegrity of the merged segment you could
e.g. install an IndexReaderWarmer in your IW and call
IndexReader.checkIntegrity.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 29, 2015 at 9:00 PM, will martin  wrote:
> Ok So I'm a little confused:
>
> The 4.10 JavaDoc for LiveIndexWriterConfig supports volatile access on a
> flag to setCheckIntegrityAtMerge ...
>
> Method states it controls pre-merge cost.
>
> Ref:
>
> https://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/LiveIndex
> WriterConfig.html#setCheckIntegrityAtMerge%28boolean%29
>
> And it seems to be gone in 5.3 folks? Meaning Adrien's comment is a whole
> lot significant? Merges ALWAYS pre-merge CheckIntegrity? Is this a 5.0
> feature drop? You can't deprecate, um, er totally remove an index time audit
> feature on a point release of any level IMHO.
>
>
> -Original Message-
> From: McKinley, James T [mailto:james.mckin...@cengage.com]
> Sent: Tuesday, September 29, 2015 2:42 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?
>
> Yes, the indexing workflow is completely separate from the runtime system.
> The file system is EMC Isilon via NFS.
>
> Jim
>
> 
> From: will martin 
> Sent: 29 September 2015 14:29
> To: java-user@lucene.apache.org
> Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?
>
> This sounds robust. Is the index batch creation workflow a separate process?
> Distributed shared filesystems?
>
> --will
>
> -Original Message-
> From: McKinley, James T [mailto:james.mckin...@cengage.com]
> Sent: Tuesday, September 29, 2015 2:22 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 5 : any merge performance metrics compared to 4.x?
>
> Hi Adrien and Will,
>
> Thanks for your responses.  I work with Selva and he's busy right now with
> other things, so I'll add some more context to his question in an attempt to
> improve clarity.
>
> The merge in question is part of our batch indexing workflow wherein we
> index new content for a given partition and then merge this new index with
> the big index of everything that was previously loaded on the given
> partition.  The increase in merge time we've seen since upgrading from 4.10
> to 5.2 is on the order of 25%.  It varies from partition to partition, but
> 25% is a good ballpark estimate I think.  Maybe our case is non-standard, we
> have a large number of fields (> 200).
>
> The reason we perform an index check after the merge is that this is the
> final index state that will be used for a given batch.  Since we have a
> batch-oriented workflow we are able to roll back to a previous batch if we
> find a problem with a given batch (Lucene or other problem).  However due to
> disk space constraints we can only keep a couple batches.  If our indexing
> workflow completes without errors but the index is corrupt, we may not know
> right away and we might delete the previous good batch thinking the latest
> batch is OK, which would be very bad requiring a full reload of all our
> content.
>
> Checking the index prior to the merge would no doubt catch many issues, but
> it might not catch corruption that occurs during the merge step itself, so
> we implemented a check step once the index is in its final state to ensure
> that it is OK.
>
> So, since we want to do the check post-merge, is there a way to disable the
> check during merge so we don't have to do two checks?
>
> Thanks!
>
> Jim
>
> 
> From: will martin 
> Sent: 29 September 2015 12:08
> To: java-user@lucene.apache.org
> Subject: RE: Lucene 5 : any merge performance metrics compared to 4.x?
>
> So, if its new, it adds to pre-existing time? So it is a cost that needs to
> be understood I think.
>
>
>
> And, I'm really curious, what happens to the result of the post merge
> checkIntegrity IFF (if and only if) there was corruption pre-merge: I mean
> if you let it merge anyway could you get a false positive for integrity?
> [see the concept of lazy-evaluation]
>
>
>
> These are, imo, the kinds of engineering questions Selva's po

Re: 5.3.1 artifacts in maven central

2015-09-29 Thread Michael McCandless
Hi Noble,

Is there something we could improve about the release check list to
reduce the chance of these sorts of mistakes in the future?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 29, 2015 at 4:36 PM, Terry Smith  wrote:
> Noble,
>
> Everything looks good now, thank you.
>
> --Terry
>
>
> On Tue, Sep 29, 2015 at 1:26 AM, Noble Paul  wrote:
>
>> Please check now
>>
>> On Mon, Sep 28, 2015 at 8:42 PM, Noble Paul  wrote:
>> > Looks like I missed it , I shall upload it soon
>> >
>> >
>> > On Mon, Sep 28, 2015 at 7:59 PM, Terry Smith  wrote:
>> >> Guys,
>> >>
>> >> I'm unable to find the 5.3.1 artifacts in maven central. Here is the
>> search
>> >> url for org.apache.lucene:lucene-core, the most recent version listed is
>> >> 5.3.0.
>> >>
>> >>
>> http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.lucene%22%20AND%20a%3A%22lucene-core%22
>> >>
>> >> Am I doing something wrong or are the artifacts not yet published?
>> >>
>> >> --Terry
>> >
>> >
>> >
>> > --
>> > -
>> > Noble Paul
>>
>>
>>
>> --
>> -
>> Noble Paul
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Need help in alphanumeric search

2015-09-29 Thread Bhaskar
Hi Uwe,

Below is my indexing code:

public static void main(String[] args) throws Exception {
//Path indexDir = new Path(INDEX_DIR);
public static final String INDEX_DIR = "c:/DBIndexAll/";
final Path indexDir = Paths.get(INDEX_DIR);
SimpleDBIndexer indexer = new SimpleDBIndexer();
try{
   Class.forName(JDBC_DRIVER).newInstance();
   Connection conn = DriverManager.getConnection(CONNECTION_URL + DBNAME,
USER_NAME, PASSWORD);
   SimpleAnalyzer analyzer = new SimpleAnalyzer();
   IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
   IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),
indexWriterConfig);
   System.out.println("Indexing to directory '" + indexDir + "'...");
   int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
   indexWriter.close();
   System.out.println(indexedDocumentCount + " records have been indexed
successfully");
} catch (Exception e) {
   e.printStackTrace();
}
}

int indexDocs(IndexWriter writer, Connection conn) throws Exception {
  String sql = QUERY1;
  Statement stmt = conn.createStatement();
  ResultSet rs = stmt.executeQuery(sql);
  int i=0;
  while (rs.next()) {
 Document d = new Document();
 d.add(new TextField("cpn", rs.getString("cpn"), Field.Store.YES));

 writer.addDocument(d);
 i++;
 }
  stmt.close();
  rs.close();

  return i;
}


Searching code:

public class SimpleDBSearcher {
// PLASTRON
private static final String LUCENE_QUERY = "SD*";
private static final int MAX_HITS = 500;
private static final String INDEX_DIR = "C:/DBIndexAll/";

public static void main(String[] args) throws Exception {
// File indexDir = new File(SimpleDBIndexer.INDEX_DIR);
final Path indexDir = Paths.get(SimpleDBIndexer.INDEX_DIR);
String query = LUCENE_QUERY;
SimpleDBSearcher searcher = new SimpleDBSearcher();
searcher.searchIndex(indexDir, query);
}

private void searchIndex(Path indexDir, String queryStr) throws Exception {
Directory directory = FSDirectory.open(indexDir);
System.out.println("The query string is " + queryStr);
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new String[]
{ "cpn" }, new StandardAnalyzer());
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
queryParser.getAllowLeadingWildcard();

Query query = queryParser.parse(queryStr);
TopDocs topDocs = searcher.search(query, MAX_HITS);

ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println("\"cpn value is:\" " + d.get("cpn"));
}
if (hits.length == 0) {
System.out.println("No Data Founds ");
}

}
}


Please help here, thanks in advance.

Regards,
Bhaskar

On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler  wrote:

> Hi Erick,
>
> This mail was in Lucene's user mailing list. This is not about Solr, so
> user cannot provide his Solr config! :-)
> In any case, it would be good to get the Analyzer + code you use while
> indexing and also the code (+ Analyzer) that creates the query while
> searching.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Monday, September 28, 2015 6:01 PM
> > To: java-user
> > Subject: Re: Need help in alphanumeric search
> >
> > You need to supply the definitions of this field from your schema.xml
> file,
> > both the  and 
> >
> > Additionally, please provide the results of the query you're trying with
> > &debug=true appended.
> >
> > The adminUI/analysis page is very helpful in these situations as well.
> Select
> > the appropriate core from the drop-down on the left and you'll see an
> > "analysis"
> > section appear that shows you exactly what happens when the field is
> > analyzed.
> >
> > Best,
> > Erick
> >
> > On Mon, Sep 28, 2015 at 5:01 AM, Bhaskar  wrote:
> > > Thanks Lan for reply.
> > >
> > > cpn values are like 123-0049, 342-043, ab23-090, hedwsdg
> > >
> > > my application is working when i gave search  for below inputs
> > > 1) ab*
> > >  2)hedwsdg
> > > 3) hed*
> > >
> > > but it is not working for
> > > 1) 123*
> > > 2) 123-0049
> > > 3) ab23*
> > >
> > >
> > > Note: if the search input has number then it is not working.
> > >
> > > Thanks in advacne.
> > >
> > >
> > > On Mon, Sep 28, 2015 at 3:49 PM, Ian Lea  wrote:
> > >
> > >> Hi
> > >>
> > >>
> > >> Can you provide a few examples of values of cpn that a) are and b)
> > >> are not being found, for indexing and searching.
> > >>
> > >> You may also find some of the tips at
> > >>
> > >> http://wiki.apache.org/lucene-
> > java/LuceneFAQ#Why_am_I_getting_no_hits
> > >> _.2F_incorrect_hits.3F
> > >> useful.
> > >>
> > >> You haven't shown the code that created the IndexWriter so the tip
> > >> about using the same analyzer at index and search time might be
> > >> relevant.

RE: Need help in alphanumeric search

2015-09-29 Thread Uwe Schindler
Hi Bhaskar,

the answer is very simple: Your analysis is not useful for the type of queries 
and data you are using. You are using SimpleAnalyzer in your search/indexing 
code:

https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
"An Analyzer that filters LetterTokenizer with LowerCaseFilter"

And LetterTokenizer does the following:
https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
"A LetterTokenizer is a tokenizer that divides text at non-letters. That's to 
say, it defines tokens as maximal strings of adjacent letters, as defined by 
java.lang.Character.isLetter() predicate."

So it creates a new token at every non-letter boundary. All non-letters are 
discarded (because they are treated as token boundary). So your queries can 
never match.

I'd suggest to first inform yourself about analysis and choose a better one 
that suits your underlying data and the queries you want to do. Maybe use 
WhitespaceAnalyzer or better StandardAnalyzer as a first step. Be sure to 
reindex your data before querying. The Analyzer used on the search side must be 
the same like on the query side. If you want to use wildcards, you have to take 
care more, because wildcards are not really natural for "full text search 
engine" and may cause inconsistent results.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Bhaskar [mailto:bhaskar1...@gmail.com]
> Sent: Wednesday, September 30, 2015 4:28 AM
> To: java-user@lucene.apache.org
> Subject: Re: Need help in alphanumeric search
> 
> Hi Uwe,
> 
> Below is my indexing code:
> 
> public static void main(String[] args) throws Exception { //Path indexDir =
> new Path(INDEX_DIR); public static final String INDEX_DIR = "c:/DBIndexAll/";
> final Path indexDir = Paths.get(INDEX_DIR); SimpleDBIndexer indexer = new
> SimpleDBIndexer(); try{
>Class.forName(JDBC_DRIVER).newInstance();
>Connection conn = DriverManager.getConnection(CONNECTION_URL +
> DBNAME, USER_NAME, PASSWORD);
>SimpleAnalyzer analyzer = new SimpleAnalyzer();
>IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
>IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir),
> indexWriterConfig);
>System.out.println("Indexing to directory '" + indexDir + "'...");
>int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
>indexWriter.close();
>System.out.println(indexedDocumentCount + " records have been indexed
> successfully"); } catch (Exception e) {
>e.printStackTrace();
> }
> }
> 
> int indexDocs(IndexWriter writer, Connection conn) throws Exception {
>   String sql = QUERY1;
>   Statement stmt = conn.createStatement();
>   ResultSet rs = stmt.executeQuery(sql);
>   int i=0;
>   while (rs.next()) {
>  Document d = new Document();
>  d.add(new TextField("cpn", rs.getString("cpn"), Field.Store.YES));
> 
>  writer.addDocument(d);
>  i++;
>  }
>   stmt.close();
>   rs.close();
> 
>   return i;
> }
> 
> 
> Searching code:
> 
> public class SimpleDBSearcher {
> // PLASTRON
> private static final String LUCENE_QUERY = "SD*"; private static final int
> MAX_HITS = 500; private static final String INDEX_DIR = "C:/DBIndexAll/";
> 
> public static void main(String[] args) throws Exception { // File indexDir = 
> new
> File(SimpleDBIndexer.INDEX_DIR); final Path indexDir =
> Paths.get(SimpleDBIndexer.INDEX_DIR);
> String query = LUCENE_QUERY;
> SimpleDBSearcher searcher = new SimpleDBSearcher();
> searcher.searchIndex(indexDir, query); }
> 
> private void searchIndex(Path indexDir, String queryStr) throws Exception {
> Directory directory = FSDirectory.open(indexDir); System.out.println("The
> query string is " + queryStr); MultiFieldQueryParser queryParser = new
> MultiFieldQueryParser(new String[] { "cpn" }, new StandardAnalyzer());
> IndexReader reader = DirectoryReader.open(directory); IndexSearcher
> searcher = new IndexSearcher(reader);
> queryParser.getAllowLeadingWildcard();
> 
> Query query = queryParser.parse(queryStr); TopDocs topDocs =
> searcher.search(query, MAX_HITS);
> 
> ScoreDoc[] hits = topDocs.scoreDocs;
> System.out.println(hits.length + " Record(s) Found"); for (int i = 0; i <
> hits.length; i++) { int docId = hits[i].doc; Document d = searcher.doc(docId);
> System.out.println("\"cpn value is:\" " + d.get("cpn")); } if (hits.length == 
> 0) {
> System.out.println("No Data Founds "); }
> 
> }
> }
> 
> 
> Please help here, thanks in advance.
> 
> Regards,
> Bhaskar
> 
> On Tue, Sep 29, 2015 at 3:47 AM, Uwe Schindler  wrote:
> 
> > Hi Erick,
> >
> > This mail was in Lucene's user mailing list. This is not about Solr,
> > so user cannot provide his Solr config! :-) In any case, it would be
> > good to get the Analyzer + code you use while indexing and also the
> > code (+ Analyzer) that creates the query while searching.
>