ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Uwe Schindler
Hello Lucene users,

On behalf of the Lucene development community I would like to announce the 
release of Lucene Java versions 3.0.1 and 2.9.2:

Both releases fix bugs in the previous versions:

- 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 
- 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based 
on Java 5.

New users of Lucene are advised to use version 3.0.1 for new developments, 
because it has a clean, type-safe API.

Important improvements in these releases include:

- An increased maximum number of unique terms in each index segment.
- Fixed experimental CustomScoreQuery to respect per-segment search. This 
introduced an API change!
- Important fixes to IndexWriter: a commit() thread-safety issue, lost document 
deletes in near real-time indexing. 
- Bugfixes for Contrib's Analyzers package.
- Restoration of some public methods that were lost during deprecation removal. 
- The new Attribute-based TokenStream API now works correctly with different 
class loaders.

Both releases are fully compatible with the corresponding previous versions. We 
strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 
3.0.1 if you are using 3.0.0.

See core changes at
http://lucene.apache.org/java/3_0_1/changes/Changes.html
http://lucene.apache.org/java/2_9_2/changes/Changes.html

and contrib changes at
http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html
http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html

Binary and source distributions are available at
http://www.apache.org/dyn/closer.cgi/lucene/java/

Lucene artifacts are also available in the Maven2 repository at
http://repo1.maven.org/maven2/org/apache/lucene/

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene Java Committer
Bremen, Germany
http://lucene.apache.org/java/docs/



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: problem about backup index file

2010-02-26 Thread Michael McCandless
Well, lucene is "write once" and then, eventually, "delete once" ;)

Ie files are eventually deleted (when they are merged away).

So when you do the incremental backup, any file not listed in the
current commit can be removed from your backup (assuming you only want
to backup the last commit).

Don't delete the files in your last backup... keep the ones that are
"in common" with your new backup as you don't need to recopy them
(they will not have changed).

Mike

On Fri, Feb 26, 2010 at 1:47 AM,   wrote:
> thanks for your paper,Michael McCandlessI have one quetion about thisFor all 
> other files, Lucene is "write once."This makes doing
> incremental backups very easy: Simply compare the file names.Once a
> file is written, it will never change; therefore, if you've already
> backed up that file, there's no need to copy it again.this means the number 
> of files is growing more and more.never getting less.another question:I 
> should delete the files previous I backup when I backup again. - 原文 - 
> 发件人: Michael McCandless 主 题: Re: problem about backup index file时 间: 
> 2010年2月25日  23:19:59This is likely happening because you're attempting to 
> copy a file thatIndexWriter is currently writing?You shouldn't do that (copy 
> files that are still being written) --that just wastes bytes (they aren't 
> used by the index), and causesthis failure on Windows.Instead, you should use 
> SnapshotDeletionPolicy -- it tells youspecifically which files make up the 
> latest commit point.  Those fileswill not be opened for writing (only for 
> reading, if you have anIndexReader open on that commit) and they should copy 
> just fine onwindows.The "Hot backups with Lucene" article (NOTE: I'm the 
> author) inupcoming Lucene in Action 2 revision shows how to do this -- 
> it'savailable for download from http://manning.com/hatcher3.MikeOn Thu, Feb 
> 25, 2010 at 3:15 AM,   wrote:> I want 
> backup my index file,but I get the follow error.> 
> java.io.IOException: another program lock the file! at 
> java.io.FileInputStream.readBytes(Native Method) at 
> java.io.FileInputStream.read(Unknown Source) at 
> com.common.Utils.copyDirectory(Utils.java:149) at 
> com.common.Utils.copyDirectory(Utils.java:138) at 
> com.common.Utils.copyDirectory(Utils.java:138) at 
> com.index.IndexManager.backDataPolicy(IndexManager.java:398) at 
> com.index.IndexManager.indexLoop(IndexManager.java:222) at 
> com.Main$1.run(Main.java:48) at java.lang.Thread.run(Unknown 
> Source)>  > How can I backup lucene file in IR 
> thread.-To
>  unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.orgfor additional 
> commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: If you could have one feature in Lucene...

2010-02-26 Thread Paul Taylor

Glen Newton wrote:

+2

On 25 February 2010 04:45, Avi Rosenschein  wrote:
  

Similarity can only be set per index, but I  want to adjust scoring
behaviour at a field level, to faciliate this could we pass make field name
available to all score methods.
Currently it is only passed to some such as lengthNorm() but not others
such as tf()

+1
  

-- Avi


I had already raised this in 
https://issues.apache.org/jira/browse/LUCENE-2236 if you ant to add 
anything to it







  



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor

Uwe Schindler wrote:

Hello Lucene users,

On behalf of the Lucene development community I would like to announce the 
release of Lucene Java versions 3.0.1 and 2.9.2:

Both releases fix bugs in the previous versions:

- 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 
- 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.
Hmm , I don't really agree with deprecating Version.LUCENE_CURRENT ( 
http://issues.apache.org/jira/browse/LUCENE-2080)


I'm sure in many projects, especially ones that are not yet released 
developers would expect to pick up the latest features when they 
download the latest version of Lucene without having to update the value 
of the Version constant (its just make devlopment a bit more tiresome) 
and would realize that code should be rebuilt and  indexes should be 
built with same version as searching indexes. I mean whenever you update 
the version of a library should really test your code, after all Lucene 
3.0.0 changed some things in 2.9.2 unwittingly, hence the need for 3.0.1.


Paul




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Robert Muir
such projects can do this, in one place:

public static final Version MY_APP_CURRENT = Version.LUCENE_30;

then later

StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT);

then they have complete control of this, independent of when the upgrade
lucene's jar file!

On Fri, Feb 26, 2010 at 5:12 AM, Paul Taylor  wrote:

> Uwe Schindler wrote:
>
>> Hello Lucene users,
>>
>> On behalf of the Lucene development community I would like to announce the
>> release of Lucene Java versions 3.0.1 and 2.9.2:
>>
>> Both releases fix bugs in the previous versions:
>>
>> - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java
>> 1.4 - 3.0.1 has the same bug fix level but is for the Lucene Java 3.x
>> series, based on Java 5.
>>
> Hmm , I don't really agree with deprecating Version.LUCENE_CURRENT (
> http://issues.apache.org/jira/browse/LUCENE-2080)
>
> I'm sure in many projects, especially ones that are not yet released
> developers would expect to pick up the latest features when they download
> the latest version of Lucene without having to update the value of the
> Version constant (its just make devlopment a bit more tiresome) and would
> realize that code should be rebuilt and  indexes should be built with same
> version as searching indexes. I mean whenever you update the version of a
> library should really test your code, after all Lucene 3.0.0 changed some
> things in 2.9.2 unwittingly, hence the need for 3.0.1.
>
> Paul
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com


Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor

Robert Muir wrote:

such projects can do this, in one place:

public static final Version MY_APP_CURRENT = Version.LUCENE_30;

then later

StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT);

then they have complete control of this, independent of when the 
upgrade lucene's jar file!
Not quite true because you still need to update MY_APP_CURRENT when 
there is a new version, but yes thats more mangeable


Paul

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Johannes Zillmann
Just one thought...

For me it would be natural to be never confronted with the Version.xx thing in 
the api unless you really need.  
so f.e. having
new QueryParser("", new KeywordAnalyzer()).parse("content: the");
as a default (probably using Version.LUCENE_CURRENT under the hood), but having 
new QueryParser(Version.XXX,"", new KeywordAnalyzer()).parse("content: 
the");
as well.

Of cause this would require a lot of method/constructor overloading, but would 
make the api more user friendly for those who write some code where the version 
don't matter...
Johannes

On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote:

> Robert Muir wrote:
>> such projects can do this, in one place:
>> 
>> public static final Version MY_APP_CURRENT = Version.LUCENE_30;
>> 
>> then later
>> 
>> StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT);
>> 
>> then they have complete control of this, independent of when the upgrade 
>> lucene's jar file!
> Not quite true because you still need to update MY_APP_CURRENT when there is 
> a new version, but yes thats more mangeable
> 
> Paul
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Michael McCandless
That would be more natural/convenient, but it'd unfortunately defeat
the whole reason Version was added in the first place.

By making Version required, we force callers to be explicit to Lucene
about what level of back compat is required.

This then enables Lucene to improve its defaults with each release,
without breaking users that need to keep backwards compatibility.

Mike

On Fri, Feb 26, 2010 at 5:42 AM, Johannes Zillmann
 wrote:
> Just one thought...
>
> For me it would be natural to be never confronted with the Version.xx thing 
> in the api unless you really need.
> so f.e. having
>        new QueryParser("", new KeywordAnalyzer()).parse("content: the");
> as a default (probably using Version.LUCENE_CURRENT under the hood), but 
> having
>        new QueryParser(Version.XXX,"", new KeywordAnalyzer()).parse("content: 
> the");
> as well.
>
> Of cause this would require a lot of method/constructor overloading, but 
> would make the api more user friendly for those who write some code where the 
> version don't matter...
> Johannes
>
> On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote:
>
>> Robert Muir wrote:
>>> such projects can do this, in one place:
>>>
>>> public static final Version MY_APP_CURRENT = Version.LUCENE_30;
>>>
>>> then later
>>>
>>> StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT);
>>>
>>> then they have complete control of this, independent of when the upgrade 
>>> lucene's jar file!
>> Not quite true because you still need to update MY_APP_CURRENT when there is 
>> a new version, but yes thats more mangeable
>>
>> Paul
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Marcelo Ochoa
Hi Ian:
  Only as curiosity ;)
  Which distributed file system are you using on top of your NAS storage?
  Best regards, Marcelo.

On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea  wrote:
> We've run lucene on NAS, although not with indexes anything like as
> large as 1Tb, and gave up because NFS and lucene don't really work
> very well together. Google for "lucene nfs" for some details, and some
> workarounds.
>
> I'd second Kay Kay's suggestion to look at a distributed solution such as 
> Katta.
>
>
> --
> Ian.
>
>
> On Wed, Feb 24, 2010 at 11:54 PM, Andrew Bruno  wrote:
>> Hello,
>>
>> I am working with an application that offers its customers their own index,
>> primary two indexes for different needs per customer.
>>
>> As our business is growing and growing, I now have a situation where the web
>> application has its customer's index on one volume, and its getting close to
>> 1Tbyte.
>>
>> There are lots of updates and inserts, and plenty of searches.  As you can
>> imagine, the application is starting to slow down heavily, especially during
>> high traffic time when documents (PDF, DOCs, etc) are indexed.
>>
>> Since the disk IO on the server is high, our datacenter engineers suggested
>> we look at NAS or SAN, for performance gain, and for future growth.
>>
>> Has anyone had any experience in running Lucene2.0/Compass in these
>> environments?  Do you know of any case studies, whitepapers, web sites?
>>
>> Thanks
>> Andrew
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://mochoa.sites.exa.unicen.edu.ar/
__
Want to integrate Lucene and Oracle?
http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
Is Oracle 11g REST ready?
http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Ian Lea
NFS.  It works fine for simple essentially static lucene indexes and
we still use it for that, but things tended to fall apart with dynamic
indexes.


--
Ian.


On Fri, Feb 26, 2010 at 11:06 AM, Marcelo Ochoa  wrote:
> Hi Ian:
>  Only as curiosity ;)
>  Which distributed file system are you using on top of your NAS storage?
>  Best regards, Marcelo.
>
> On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea  wrote:
>> We've run lucene on NAS, although not with indexes anything like as
>> large as 1Tb, and gave up because NFS and lucene don't really work
>> very well together. Google for "lucene nfs" for some details, and some
>> workarounds.
>>
>> I'd second Kay Kay's suggestion to look at a distributed solution such as 
>> Katta.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Feb 24, 2010 at 11:54 PM, Andrew Bruno  
>> wrote:
>>> Hello,
>>>
>>> I am working with an application that offers its customers their own index,
>>> primary two indexes for different needs per customer.
>>>
>>> As our business is growing and growing, I now have a situation where the web
>>> application has its customer's index on one volume, and its getting close to
>>> 1Tbyte.
>>>
>>> There are lots of updates and inserts, and plenty of searches.  As you can
>>> imagine, the application is starting to slow down heavily, especially during
>>> high traffic time when documents (PDF, DOCs, etc) are indexed.
>>>
>>> Since the disk IO on the server is high, our datacenter engineers suggested
>>> we look at NAS or SAN, for performance gain, and for future growth.
>>>
>>> Has anyone had any experience in running Lucene2.0/Compass in these
>>> environments?  Do you know of any case studies, whitepapers, web sites?
>>>
>>> Thanks
>>> Andrew
>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Marcelo F. Ochoa
> http://marceloochoa.blogspot.com/
> http://mochoa.sites.exa.unicen.edu.ar/
> __
> Want to integrate Lucene and Oracle?
> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
> Is Oracle 11g REST ready?
> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Ian Lea
Could there be a Version value called
LUCENE_LATEST_DANGER_USE_AT_YOUR_OWN_RISK or whatever you want to make
it.

I understand the argument about backwards compatibility but I'm with
Johannes on making things easier for those who have code which doesn't
require the compatibility.  Like me.  I've been using lucene since the
very beginning and don't recall ever having been bitten by any back
compatibility problems (another reason for praise for the committers)
and would rather not have to start changing literals on upgrades.

Is the plan to remove LUCENE_CURRENT altogether or to leave it in,
permanently deprecated?
If the latter we could carry on using it although living with
deprecations isn't great.


Doing simple things in lucene does in general seem to be getting
harder.  Off the top of my head ...

 IndexSearcher s = new IndexSearcher("/my/index");
 QueryParser qp = new QueryParser("", new StandardAnalyzer());
 Query q = qp.parse("field: value");
 Hits h = s.search(q);
 for (int i = 0; i < h.length; i++) {
 System.out.println(h.doc(i).get("field"));
 }

used to work.  It won't now of course, and I'd have to look at the
javadocs to come up with alternatives.

Keep APIs simple!



--
Ian.


On Fri, Feb 26, 2010 at 10:50 AM, Michael McCandless
 wrote:
> That would be more natural/convenient, but it'd unfortunately defeat
> the whole reason Version was added in the first place.
>
> By making Version required, we force callers to be explicit to Lucene
> about what level of back compat is required.
>
> This then enables Lucene to improve its defaults with each release,
> without breaking users that need to keep backwards compatibility.
>
> Mike
>
> On Fri, Feb 26, 2010 at 5:42 AM, Johannes Zillmann
>  wrote:
>> Just one thought...
>>
>> For me it would be natural to be never confronted with the Version.xx thing 
>> in the api unless you really need.
>> so f.e. having
>>        new QueryParser("", new KeywordAnalyzer()).parse("content: the");
>> as a default (probably using Version.LUCENE_CURRENT under the hood), but 
>> having
>>        new QueryParser(Version.XXX,"", new 
>> KeywordAnalyzer()).parse("content: the");
>> as well.
>>
>> Of cause this would require a lot of method/constructor overloading, but 
>> would make the api more user friendly for those who write some code where 
>> the version don't matter...
>> Johannes
>>
>> On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote:
>>
>>> Robert Muir wrote:
 such projects can do this, in one place:

 public static final Version MY_APP_CURRENT = Version.LUCENE_30;

 then later

 StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT);

 then they have complete control of this, independent of when the upgrade 
 lucene's jar file!
>>> Not quite true because you still need to update MY_APP_CURRENT when there 
>>> is a new version, but yes thats more mangeable
>>>
>>> Paul
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
OK -- I can now see what happened.

There was a merge still running, when you called IW.commit (Lucene
Merge Thread #0).  Because IW.commit does not wait for BG merges to
finish, but IW.close does (by default), this means you'll pick up an
extra version whenever a merge is running when you call close.

Mike

On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan  wrote:
> I'm pretty sure this output occurred when the version number skipped +1.
> The line containing ''. separates the close/open IndexWriter.
>
> IFD [Indexer]: setInfoStream
> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
> IW 9 [Indexer]: setInfoStream:
> dir=org.apache.lucene.store.SimpleFSDirectory@ pathname>\lresumes1.search.main.1 autoCommit=false
> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler=org.apache.lucene.index.concurrentmergescheduler@6772cfdframBufferSizeMB=16.0
> maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
> maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b _d:C1220->_d
> _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> IW 9 [Indexer]: flush at getReader
> IW 9 [Indexer]:   flush: segment=null docStoreSegment=null docStoreOffset=0
> flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
> numBufDelTerms=0
> IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
> allocMB=15.816 deletesMB=0.203 triggerMB=16
> IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
> docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false
> numDocs=1456 numBufDelTerms=1456
> IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
> _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> _i:C703->_h
> IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456
> IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704 newFlushedSize=4969789
> docs/MB=307.202 new/old=29.966%
> IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> isCommit = false]
> IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> isCommit = false]
> IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
> IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1 segments
> IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9 segments
> IW 9 [UpdWriterBuild : 9]: CMS: now merge
> IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> _j:C1456->_j
> IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
> IW 9 [Indexer]: prepareCommit: flush
> IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j docStoreOffset=1456
> flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
> numBufDelTerms=509
> IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> _j:C1456->_j
> IW 9 [Indexer]:   flush shared docStore segment _j
> IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j
> numDocs=1965
> IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509
> IW 9 [Indexer]: DW:   oldRAMSize=7483392 newFlushedSize=1854970
> docs/MB=287.727 new/old=24.788%
> IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = false]
> IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted docIDs
> and 0 deleted queries on 11 segments.
> IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = false]
> IW 9 [Indexer]: LMP: findMerges: 11 segments
> IW 9 [Indexer]: LMP:   level 6.863048 to 7.613048: 1 segments
> IW 9 [Indexer]: LMP:   level 6.2247195 to 6.696363: 10 segments
> IW 9 [Indexer]: LMP:     1 to 11: add this merge
> IW 9 [Indexer]: add merge to pendingMerges: _b:C1204->_b _c:C717->_b
> _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> _j:C1456->_j _k:C509->_j [total 1 pending]
> IW 9 [Indexer]: CMS: now merge
> IW 9 [Indexer]: CMS:   index: _a:C9780 _b:C1204->_b _c:C717->_b _d:C1220->_d
> _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h _j:C1456->_j
> _k:C509->_j
> IW 9 [Indexer]: CMS:   consider merge _b:C1204->_b _c:C717->_b _d:C1220->_d
> _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h _j:C1456->_j
> _k:C509->_j into _l [mergeDocStores]
> IW 9 [Indexer]: CMS:     launch new thread [Lucene Merge Thread #0]
> IW 9 [Indexer]: CMS:   no more merges pending; now return
> IW 9 [Indexer]: startCommit(): start sizeInBytes=0
> IW 9 [Indexer]: startCommit index=_a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> _j:C1456->_j _k:C509->_j changeCount=7
> IW 9 [Lucene Merge Thread #0]: CMS:   merge thread: start
> IW 9 [Indexer]: now sync _k.f

NumericField exact match

2010-02-26 Thread Ivan Vasilev

Hi Guys,

Is it possible to make exact searches on fields that are of type 
NumericField and if yes how?
In the LIA book part 2 I found only information about Range searches on 
such fields and how to Sort them.


Example - I have field "size" that can take integers as values.
I want to get docs that are with "size:100".
For the regular fields "size:100" is OK to pass to Parser but with 
NumericField it does not work.
The only approach to support such fields that I can see is - to have 
parallel casual Field (example "size2") and to index the same data there.

And then when user wants exact search on "size" I to perform "size2:100".

Is this the most appropriate way for my case on your opinion?

Thanks,
Ivan



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Is there a way for the application to wait for the BG commit to finish
before it calls IW.close? If so, would this prevent the extra version? The
extra version causes the app. to think that the external data it committed
is out of synch with the index, which requires the app to do extra
processing to re-synch.

Thanks,
Peter


On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> OK -- I can now see what happened.
>
> There was a merge still running, when you called IW.commit (Lucene
> Merge Thread #0).  Because IW.commit does not wait for BG merges to
> finish, but IW.close does (by default), this means you'll pick up an
> extra version whenever a merge is running when you call close.
>
> Mike
>
> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan 
> wrote:
> > I'm pretty sure this output occurred when the version number skipped +1.
> > The line containing ''. separates the close/open IndexWriter.
> >
> > IFD [Indexer]: setInfoStream
> >
> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
> > IW 9 [Indexer]: setInfoStream:
> > dir=org.apache.lucene.store.SimpleFSDirectory@ > pathname>\lresumes1.search.main.1 autoCommit=false
> >
> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler
> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb
> =16.0
> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d
> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> > IW 9 [Indexer]: flush at getReader
> > IW 9 [Indexer]:   flush: segment=null docStoreSegment=null
> docStoreOffset=0
> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
> > numBufDelTerms=0
> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> _i:C703->_h
> > IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
> > allocMB=15.816 deletesMB=0.203 triggerMB=16
> > IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
> > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false
> > numDocs=1456 numBufDelTerms=1456
> > IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f
> _h:C1291->_h
> > _i:C703->_h
> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456
> > IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704
> newFlushedSize=4969789
> > docs/MB=307.202 new/old=29.966%
> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> > isCommit = false]
> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> > isCommit = false]
> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1 segments
> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9 segments
> > IW 9 [UpdWriterBuild : 9]: CMS: now merge
> > IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b
> _c:C717->_b
> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> _i:C703->_h
> > _j:C1456->_j
> > IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
> > IW 9 [Indexer]: prepareCommit: flush
> > IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j
> docStoreOffset=1456
> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
> > numBufDelTerms=509
> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> _i:C703->_h
> > _j:C1456->_j
> > IW 9 [Indexer]:   flush shared docStore segment _j
> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j
> > numDocs=1965
> > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509
> > IW 9 [Indexer]: DW:   oldRAMSize=7483392 newFlushedSize=1854970
> > docs/MB=287.727 new/old=24.788%
> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit =
> false]
> > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted
> docIDs
> > and 0 deleted queries on 11 segments.
> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit =
> false]
> > IW 9 [Indexer]: LMP: findMerges: 11 segments
> > IW 9 [Indexer]: LMP:   level 6.863048 to 7.613048: 1 segments
> > IW 9 [Indexer]: LMP:   level 6.2247195 to 6.696363: 10 segments
> > IW 9 [Indexer]: LMP: 1 to 11: add this merge
> > IW 9 [Indexer]: add merge to pendingMerges: _b:C1204->_b _c:C717->_b
> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> _i:C703->_h
> > _j:C1456->_j _k:C509->_j [total 1 pending]
> > IW 9 [Indexer]: CMS: now merge
> > IW 9 [Indexer]: CMS:   index: _a:C9780 _b:C1204->_b _c:C717->_b
> _d:C1220->_d
> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> _j:C1456->_j
> > _k:C509->_j
> > IW 9 [Indexer]: CMS:   consider merg

RE: NumericField exact match

2010-02-26 Thread Uwe Schindler
It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - 
val is the exact match. This is not slower as this automatically rewrites to a 
non-scored TermQuery. If you already changed QueryParser, you can also override 
the method for exactMatches (newTermQuery).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
> Sent: Friday, February 26, 2010 8:21 PM
> To: LUCENE MAIL LIST
> Subject: NumericField exact match
> 
> Hi Guys,
> 
> Is it possible to make exact searches on fields that are of type
> NumericField and if yes how?
> In the LIA book part 2 I found only information about Range searches on
> such fields and how to Sort them.
> 
> Example - I have field "size" that can take integers as values.
> I want to get docs that are with "size:100".
> For the regular fields "size:100" is OK to pass to Parser but with
> NumericField it does not work.
> The only approach to support such fields that I can see is - to have
> parallel casual Field (example "size2") and to index the same data
> there.
> And then when user wants exact search on "size" I to perform
> "size2:100".
> 
> Is this the most appropriate way for my case on your opinion?
> 
> Thanks,
> Ivan
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
Note that it's a BG merge (not commit)...

You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API?
 If you call that, then call .getReader().getVersion(), then close &
open the writer, I think (but you better test to be sure!) the next
.getReader().getVersion() should always match.

Mike

On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan  wrote:
> Is there a way for the application to wait for the BG commit to finish
> before it calls IW.close? If so, would this prevent the extra version? The
> extra version causes the app. to think that the external data it committed
> is out of synch with the index, which requires the app to do extra
> processing to re-synch.
>
> Thanks,
> Peter
>
>
> On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> OK -- I can now see what happened.
>>
>> There was a merge still running, when you called IW.commit (Lucene
>> Merge Thread #0).  Because IW.commit does not wait for BG merges to
>> finish, but IW.close does (by default), this means you'll pick up an
>> extra version whenever a merge is running when you call close.
>>
>> Mike
>>
>> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan 
>> wrote:
>> > I'm pretty sure this output occurred when the version number skipped +1.
>> > The line containing ''. separates the close/open IndexWriter.
>> >
>> > IFD [Indexer]: setInfoStream
>> >
>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
>> > IW 9 [Indexer]: setInfoStream:
>> > dir=org.apache.lucene.store.SimpleFSDirectory@> > pathname>\lresumes1.search.main.1 autoCommit=false
>> >
>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler
>> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb
>> =16.0
>> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
>> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b
>> _d:C1220->_d
>> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
>> > IW 9 [Indexer]: flush at getReader
>> > IW 9 [Indexer]:   flush: segment=null docStoreSegment=null
>> docStoreOffset=0
>> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>> > numBufDelTerms=0
>> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
>> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> _i:C703->_h
>> > IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
>> > allocMB=15.816 deletesMB=0.203 triggerMB=16
>> > IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
>> > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false
>> > numDocs=1456 numBufDelTerms=1456
>> > IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
>> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f
>> _h:C1291->_h
>> > _i:C703->_h
>> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456
>> > IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704
>> newFlushedSize=4969789
>> > docs/MB=307.202 new/old=29.966%
>> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>> > isCommit = false]
>> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>> > isCommit = false]
>> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
>> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1 segments
>> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9 segments
>> > IW 9 [UpdWriterBuild : 9]: CMS: now merge
>> > IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b
>> _c:C717->_b
>> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> _i:C703->_h
>> > _j:C1456->_j
>> > IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
>> > IW 9 [Indexer]: prepareCommit: flush
>> > IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j
>> docStoreOffset=1456
>> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
>> > numBufDelTerms=509
>> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
>> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> _i:C703->_h
>> > _j:C1456->_j
>> > IW 9 [Indexer]:   flush shared docStore segment _j
>> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j
>> > numDocs=1965
>> > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509
>> > IW 9 [Indexer]: DW:   oldRAMSize=7483392 newFlushedSize=1854970
>> > docs/MB=287.727 new/old=24.788%
>> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit =
>> false]
>> > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted
>> docIDs
>> > and 0 deleted queries on 11 segments.
>> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit =
>> false]
>> > IW 9 [Indexer]: LMP: findMerges: 11 segments
>> > IW 9 [Indexer]: LMP:   level 6.863048 to 7.613048: 1 segments
>> > IW 9 [Indexer]: LMP:   level 6.2247195 to 6.696363: 10 segments
>> > IW 9 [Indexer]: LMP:     1 to 11: a

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Great, I'll give it a try.
Thanks!

On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Note that it's a BG merge (not commit)...
>
> You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API?
>  If you call that, then call .getReader().getVersion(), then close &
> open the writer, I think (but you better test to be sure!) the next
> .getReader().getVersion() should always match.
>
> Mike
>
> On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan 
> wrote:
> > Is there a way for the application to wait for the BG commit to finish
> > before it calls IW.close? If so, would this prevent the extra version?
> The
> > extra version causes the app. to think that the external data it
> committed
> > is out of synch with the index, which requires the app to do extra
> > processing to re-synch.
> >
> > Thanks,
> > Peter
> >
> >
> > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> OK -- I can now see what happened.
> >>
> >> There was a merge still running, when you called IW.commit (Lucene
> >> Merge Thread #0).  Because IW.commit does not wait for BG merges to
> >> finish, but IW.close does (by default), this means you'll pick up an
> >> extra version whenever a merge is running when you call close.
> >>
> >> Mike
> >>
> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan 
> >> wrote:
> >> > I'm pretty sure this output occurred when the version number skipped
> +1.
> >> > The line containing ''. separates the close/open
> IndexWriter.
> >> >
> >> > IFD [Indexer]: setInfoStream
> >> >
> >>
> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
> >> > IW 9 [Indexer]: setInfoStream:
> >> > dir=org.apache.lucene.store.SimpleFSDirectory@ >> > pathname>\lresumes1.search.main.1 autoCommit=false
> >> >
> >>
> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler
> >>
> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb
> >> =16.0
> >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
> >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b
> >> _d:C1220->_d
> >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
> >> > IW 9 [Indexer]: flush at getReader
> >> > IW 9 [Indexer]:   flush: segment=null docStoreSegment=null
> >> docStoreOffset=0
> >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
> >> > numBufDelTerms=0
> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> >> _i:C703->_h
> >> > IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
> >> > allocMB=15.816 deletesMB=0.203 triggerMB=16
> >> > IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
> >> > docStoreOffset=0 flushDocs=true flushDeletes=false
> flushDocStores=false
> >> > numDocs=1456 numBufDelTerms=1456
> >> > IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
> >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f
> >> _h:C1291->_h
> >> > _i:C703->_h
> >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j
> numDocs=1456
> >> > IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704
> >> newFlushedSize=4969789
> >> > docs/MB=307.202 new/old=29.966%
> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> >> > isCommit = false]
> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
> >> > isCommit = false]
> >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1
> segments
> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9
> segments
> >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge
> >> > IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b
> >> _c:C717->_b
> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> >> _i:C703->_h
> >> > _j:C1456->_j
> >> > IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
> >> > IW 9 [Indexer]: prepareCommit: flush
> >> > IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j
> >> docStoreOffset=1456
> >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
> >> > numBufDelTerms=509
> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b _c:C717->_b
> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
> >> _i:C703->_h
> >> > _j:C1456->_j
> >> > IW 9 [Indexer]:   flush shared docStore segment _j
> >> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j
> >> > numDocs=1965
> >> > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509
> >> > IW 9 [Indexer]: DW:   oldRAMSize=7483392 newFlushedSize=1854970
> >> > docs/MB=287.727 new/old=24.788%
> >> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit =
> >> false]
> >> > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted
> >> docI

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Can  IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's
when the app calls 'getReader' to create external data.

Peter

On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote:

> Great, I'll give it a try.
> Thanks!
>
>
> On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Note that it's a BG merge (not commit)...
>>
>> You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API?
>>  If you call that, then call .getReader().getVersion(), then close &
>> open the writer, I think (but you better test to be sure!) the next
>> .getReader().getVersion() should always match.
>>
>> Mike
>>
>> On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan 
>> wrote:
>> > Is there a way for the application to wait for the BG commit to finish
>> > before it calls IW.close? If so, would this prevent the extra version?
>> The
>> > extra version causes the app. to think that the external data it
>> committed
>> > is out of synch with the index, which requires the app to do extra
>> > processing to re-synch.
>> >
>> > Thanks,
>> > Peter
>> >
>> >
>> > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> OK -- I can now see what happened.
>> >>
>> >> There was a merge still running, when you called IW.commit (Lucene
>> >> Merge Thread #0).  Because IW.commit does not wait for BG merges to
>> >> finish, but IW.close does (by default), this means you'll pick up an
>> >> extra version whenever a merge is running when you call close.
>> >>
>> >> Mike
>> >>
>> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan 
>> >> wrote:
>> >> > I'm pretty sure this output occurred when the version number skipped
>> +1.
>> >> > The line containing ''. separates the close/open
>> IndexWriter.
>> >> >
>> >> > IFD [Indexer]: setInfoStream
>> >> >
>> >>
>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
>> >> > IW 9 [Indexer]: setInfoStream:
>> >> > dir=org.apache.lucene.store.SimpleFSDirectory@> >> > pathname>\lresumes1.search.main.1 autoCommit=false
>> >> >
>> >>
>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler
>> >>
>> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb
>> >> =16.0
>> >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
>> >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b
>> >> _d:C1220->_d
>> >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
>> >> > IW 9 [Indexer]: flush at getReader
>> >> > IW 9 [Indexer]:   flush: segment=null docStoreSegment=null
>> >> docStoreOffset=0
>> >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>> >> > numBufDelTerms=0
>> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b
>> _c:C717->_b
>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> >> _i:C703->_h
>> >> > IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
>> >> > allocMB=15.816 deletesMB=0.203 triggerMB=16
>> >> > IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
>> >> > docStoreOffset=0 flushDocs=true flushDeletes=false
>> flushDocStores=false
>> >> > numDocs=1456 numBufDelTerms=1456
>> >> > IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
>> >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f
>> >> _h:C1291->_h
>> >> > _i:C703->_h
>> >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j
>> numDocs=1456
>> >> > IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704
>> >> newFlushedSize=4969789
>> >> > docs/MB=307.202 new/old=29.966%
>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>> >> > isCommit = false]
>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>> >> > isCommit = false]
>> >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
>> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1
>> segments
>> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9
>> segments
>> >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge
>> >> > IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b
>> >> _c:C717->_b
>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> >> _i:C703->_h
>> >> > _j:C1456->_j
>> >> > IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
>> >> > IW 9 [Indexer]: prepareCommit: flush
>> >> > IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j
>> >> docStoreOffset=1456
>> >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
>> >> > numBufDelTerms=509
>> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b
>> _c:C717->_b
>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>> >> _i:C703->_h
>> >> > _j:C1456->_j
>> >> > IW 9 [Indexer]:   flush shared docStore segment _j
>> >> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j
>> >> > numDocs=1965
>> >> > IW 9 [Indexer]: DW:

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
That should be fine!

Mike

On Fri, Feb 26, 2010 at 3:26 PM, Peter Keegan  wrote:
> Can  IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's
> when the app calls 'getReader' to create external data.
>
> Peter
>
> On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote:
>
>> Great, I'll give it a try.
>> Thanks!
>>
>>
>> On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Note that it's a BG merge (not commit)...
>>>
>>> You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API?
>>>  If you call that, then call .getReader().getVersion(), then close &
>>> open the writer, I think (but you better test to be sure!) the next
>>> .getReader().getVersion() should always match.
>>>
>>> Mike
>>>
>>> On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan 
>>> wrote:
>>> > Is there a way for the application to wait for the BG commit to finish
>>> > before it calls IW.close? If so, would this prevent the extra version?
>>> The
>>> > extra version causes the app. to think that the external data it
>>> committed
>>> > is out of synch with the index, which requires the app to do extra
>>> > processing to re-synch.
>>> >
>>> > Thanks,
>>> > Peter
>>> >
>>> >
>>> > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless <
>>> > luc...@mikemccandless.com> wrote:
>>> >
>>> >> OK -- I can now see what happened.
>>> >>
>>> >> There was a merge still running, when you called IW.commit (Lucene
>>> >> Merge Thread #0).  Because IW.commit does not wait for BG merges to
>>> >> finish, but IW.close does (by default), this means you'll pick up an
>>> >> extra version whenever a merge is running when you call close.
>>> >>
>>> >> Mike
>>> >>
>>> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan 
>>> >> wrote:
>>> >> > I'm pretty sure this output occurred when the version number skipped
>>> +1.
>>> >> > The line containing ''. separates the close/open
>>> IndexWriter.
>>> >> >
>>> >> > IFD [Indexer]: setInfoStream
>>> >> >
>>> >>
>>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9
>>> >> > IW 9 [Indexer]: setInfoStream:
>>> >> > dir=org.apache.lucene.store.SimpleFSDirectory@>> >> > pathname>\lresumes1.search.main.1 autoCommit=false
>>> >> >
>>> >>
>>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler
>>> >>
>>> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb
>>> >> =16.0
>>> >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1
>>> >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b
>>> >> _d:C1220->_d
>>> >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h
>>> >> > IW 9 [Indexer]: flush at getReader
>>> >> > IW 9 [Indexer]:   flush: segment=null docStoreSegment=null
>>> >> docStoreOffset=0
>>> >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0
>>> >> > numBufDelTerms=0
>>> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b
>>> _c:C717->_b
>>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>>> >> _i:C703->_h
>>> >> > IW 9 [UpdWriterBuild : 9]: DW:   RAM: now flush @ usedMB=15.816
>>> >> > allocMB=15.816 deletesMB=0.203 triggerMB=16
>>> >> > IW 9 [UpdWriterBuild : 9]:   flush: segment=_j docStoreSegment=_j
>>> >> > docStoreOffset=0 flushDocs=true flushDeletes=false
>>> flushDocStores=false
>>> >> > numDocs=1456 numBufDelTerms=1456
>>> >> > IW 9 [UpdWriterBuild : 9]:   index before flush _a:C9780 _b:C1204->_b
>>> >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f
>>> >> _h:C1291->_h
>>> >> > _i:C703->_h
>>> >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j
>>> numDocs=1456
>>> >> > IW 9 [UpdWriterBuild : 9]: DW:   oldRAMSize=16584704
>>> >> newFlushedSize=4969789
>>> >> > docs/MB=307.202 new/old=29.966%
>>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>>> >> > isCommit = false]
>>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ;
>>> >> > isCommit = false]
>>> >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments
>>> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.863048 to 7.613048: 1
>>> segments
>>> >> > IW 9 [UpdWriterBuild : 9]: LMP:   level 6.2247195 to 6.696363: 9
>>> segments
>>> >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge
>>> >> > IW 9 [UpdWriterBuild : 9]: CMS:   index: _a:C9780 _b:C1204->_b
>>> >> _c:C717->_b
>>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>>> >> _i:C703->_h
>>> >> > _j:C1456->_j
>>> >> > IW 9 [UpdWriterBuild : 9]: CMS:   no more merges pending; now return
>>> >> > IW 9 [Indexer]: prepareCommit: flush
>>> >> > IW 9 [Indexer]:   flush: segment=_k docStoreSegment=_j
>>> >> docStoreOffset=1456
>>> >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509
>>> >> > numBufDelTerms=509
>>> >> > IW 9 [Indexer]:   index before flush _a:C9780 _b:C1204->_b
>>> _c:C717->_b
>>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h
>>> >> _i:C70

Re: NumericField exact match

2010-02-26 Thread Ivan Vasilev

Thanks for the answer Uwe,

Does it matter precision step when I use NumericRangeQuery for exact 
matches? I mean if I use the default precision step when indexing that 
fields it is guaranteed that:
1. With this query I will always hit the docs that contain "val" for the 
"field";
2. I will never hit docs with different that have diferent "val" for the 
"field";


Ivan


Uwe Schindler wrote:

It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - 
val is the exact match. This is not slower as this automatically rewrites to a 
non-scored TermQuery. If you already changed QueryParser, you can also override 
the method for exactMatches (newTermQuery).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

  

-Original Message-
From: Ivan Vasilev [mailto:ivasi...@sirma.bg]
Sent: Friday, February 26, 2010 8:21 PM
To: LUCENE MAIL LIST
Subject: NumericField exact match

Hi Guys,

Is it possible to make exact searches on fields that are of type
NumericField and if yes how?
In the LIA book part 2 I found only information about Range searches on
such fields and how to Sort them.

Example - I have field "size" that can take integers as values.
I want to get docs that are with "size:100".
For the regular fields "size:100" is OK to pass to Parser but with
NumericField it does not work.
The only approach to support such fields that I can see is - to have
parallel casual Field (example "size2") and to index the same data
there.
And then when user wants exact search on "size" I to perform
"size2:100".

Is this the most appropriate way for my case on your opinion?

Thanks,
Ivan



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


__ NOD32 3990 (20090406) Information __

This message was checked by NOD32 antivirus system.
http://www.eset.com



  



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



recovering payload from fields

2010-02-26 Thread Christopher Condit
I'm trying to store semantic information in payloads at index time. I believe 
this part is successful - but I'm having trouble getting access to the payload 
locations after the index is created. I'd like to know the offset in the 
original text for the token with the payload - and get this information for all 
payloads that are set in a Field even if they don't relate to the query. I 
tried (from the highlighting filter):
TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body");
 while (tokens.incrementToken()) {
TermAttribute term = tokens.getAttribute(TermAttribute.class);
if (toker.hasAttribute(PayloadAttribute.class)) {
  PayloadAttribute payload = tokens.getAttribute(PayloadAttribute.class);
  OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class);
}   
  }
But the OffsetAttribute never seems to contain any information.
In my token filter do I need to do more than:
offsetAtt = addAttribute(OffsetAttribute.class);
during construction in order to store Offset information?

Thanks,
-Chris


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: recovering payload from fields

2010-02-26 Thread Christopher Tignor
Hello,

To my knoweldge, the character position of the tokens is not preserved by
Lucene - only the ordinal postion of token's within a document / field is
preserved.  Thus you need to store this character offset information
separately, say, as Payload data.

best,

C>T>

On Fri, Feb 26, 2010 at 3:41 PM, Christopher Condit  wrote:

> I'm trying to store semantic information in payloads at index time. I
> believe this part is successful - but I'm having trouble getting access to
> the payload locations after the index is created. I'd like to know the
> offset in the original text for the token with the payload - and get this
> information for all payloads that are set in a Field even if they don't
> relate to the query. I tried (from the highlighting filter):
> TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body");
>  while (tokens.incrementToken()) {
>TermAttribute term = tokens.getAttribute(TermAttribute.class);
>if (toker.hasAttribute(PayloadAttribute.class)) {
>  PayloadAttribute payload =
> tokens.getAttribute(PayloadAttribute.class);
>  OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class);
>}
>  }
> But the OffsetAttribute never seems to contain any information.
> In my token filter do I need to do more than:
> offsetAtt = addAttribute(OffsetAttribute.class);
> during construction in order to store Offset information?
>
> Thanks,
> -Chris
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
TH!NKMAP

Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999


RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
Hi Chris-
> To my knoweldge, the character position of the tokens is not preserved by
> Lucene - only the ordinal postion of token's within a document / field is
> preserved.  Thus you need to store this character offset information
> separately, say, as Payload data.

Thanks for the information. So adding the OffsetAttribute at index time doesn't 
embed the offset information in the index - it just makes it available to the 
TokenFilter? I'll try adding the offset from the attribute to the payload..

In terms of getting access to the payloads is the best way to reconstruct the 
token stream (as the Highlighter does)? Or is than an easier way to just get 
access to the payloads?

Thanks,
-Chris


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Petite Abeille

On Feb 25, 2010, at 12:54 AM, Andrew Bruno wrote:

> Since the disk IO on the server is high, our datacenter engineers suggested
> we look at NAS or SAN, for performance gain, and for future growth.

Alternatively, get a stack of RamSan and call it a day:

http://www.ramsan.com/products/products.htm

If you cannot afford these, something like Sun's storage server is a pretty 
cost effective solution:

http://www.oracle.com/us/products/servers-storage/servers/x64/031210.htm

Personally, I would stay away from a SAN based solution and favor local 
storage. As always, YMMV.




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Infinite loop when searching empty index

2010-02-26 Thread Justin
Is this a bug in Lucene Java as of tr...@915399?

int numDocs = reader.numDocs(); // = 0 (empty index)
TopDocsCollector collector = TopScoreDocCollector.create(numDocs, 
true);
searcher.search(new MatchAllDocsQuery(), collector);  // never 
returns

// Searcher
public void search(Query query, Collector collector)
  throws IOException {
  search(createWeight(query), null, collector); // never returns
}

// extends IndexSearcher
public void search(Weight weight, Filter filter, final Collector collector) 
throws IOException {
  boolean topScorer = (filter == null) true : false;
  Scorer scorer = weight.scorer(reader, true, topScorer);
  if (scorer != null && topScorer) {
scorer.score(collector); // never returns

// Scorer
public void score(Collector collector) throws IOException {
  collector.setScorer(this);
  int doc;
  while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 (infinite)
collector.collect(doc);
  }
}


Thanks for any feedback,
Justin


  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
> Payload Data is accessed through PayloadSpans so using SpanQUeries is the
> netry point it seems.  There are tools like PayloadSpanUtil that convert other
> queries into SpanQueries for this purpose if needed but the api for Payloads
> looks it like it goes through Spans is the bottom line.

So there's no way to iterate through all the payloads for a given field? I 
can't use the SpanQuery mechanism because in this case the entire field will be 
displayed - and I can't search for "*". Is there some trick I'm not thinking of?

> this is the tip of the iceberg; a big dangerous iceberg...

Yes - I'm beginning to see that...

Thanks,
-Chris

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org