ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 - 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5. New users of Lucene are advised to use version 3.0.1 for new developments, because it has a clean, type-safe API. Important improvements in these releases include: - An increased maximum number of unique terms in each index segment. - Fixed experimental CustomScoreQuery to respect per-segment search. This introduced an API change! - Important fixes to IndexWriter: a commit() thread-safety issue, lost document deletes in near real-time indexing. - Bugfixes for Contrib's Analyzers package. - Restoration of some public methods that were lost during deprecation removal. - The new Attribute-based TokenStream API now works correctly with different class loaders. Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0. See core changes at http://lucene.apache.org/java/3_0_1/changes/Changes.html http://lucene.apache.org/java/2_9_2/changes/Changes.html and contrib changes at http://lucene.apache.org/java/3_0_1/changes/Contrib-Changes.html http://lucene.apache.org/java/2_9_2/changes/Contrib-Changes.html Binary and source distributions are available at http://www.apache.org/dyn/closer.cgi/lucene/java/ Lucene artifacts are also available in the Maven2 repository at http://repo1.maven.org/maven2/org/apache/lucene/ - Uwe Schindler uschind...@apache.org Apache Lucene Java Committer Bremen, Germany http://lucene.apache.org/java/docs/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: problem about backup index file
Well, lucene is "write once" and then, eventually, "delete once" ;) Ie files are eventually deleted (when they are merged away). So when you do the incremental backup, any file not listed in the current commit can be removed from your backup (assuming you only want to backup the last commit). Don't delete the files in your last backup... keep the ones that are "in common" with your new backup as you don't need to recopy them (they will not have changed). Mike On Fri, Feb 26, 2010 at 1:47 AM, wrote: > thanks for your paper,Michael McCandlessI have one quetion about thisFor all > other files, Lucene is "write once."This makes doing > incremental backups very easy: Simply compare the file names.Once a > file is written, it will never change; therefore, if you've already > backed up that file, there's no need to copy it again.this means the number > of files is growing more and more.never getting less.another question:I > should delete the files previous I backup when I backup again. - 原文 - > 发件人: Michael McCandless 主 题: Re: problem about backup index file时 间: > 2010年2月25日 23:19:59This is likely happening because you're attempting to > copy a file thatIndexWriter is currently writing?You shouldn't do that (copy > files that are still being written) --that just wastes bytes (they aren't > used by the index), and causesthis failure on Windows.Instead, you should use > SnapshotDeletionPolicy -- it tells youspecifically which files make up the > latest commit point. Those fileswill not be opened for writing (only for > reading, if you have anIndexReader open on that commit) and they should copy > just fine onwindows.The "Hot backups with Lucene" article (NOTE: I'm the > author) inupcoming Lucene in Action 2 revision shows how to do this -- > it'savailable for download from http://manning.com/hatcher3.MikeOn Thu, Feb > 25, 2010 at 3:15 AM,wrote:> I want > backup my index file,but I get the follow error.> > java.io.IOException: another program lock the file! at > java.io.FileInputStream.readBytes(Native Method) at > java.io.FileInputStream.read(Unknown Source) at > com.common.Utils.copyDirectory(Utils.java:149) at > com.common.Utils.copyDirectory(Utils.java:138) at > com.common.Utils.copyDirectory(Utils.java:138) at > com.index.IndexManager.backDataPolicy(IndexManager.java:398) at > com.index.IndexManager.indexLoop(IndexManager.java:222) at > com.Main$1.run(Main.java:48) at java.lang.Thread.run(Unknown > Source)> > How can I backup lucene file in IR > thread.-To > unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.orgfor additional > commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: If you could have one feature in Lucene...
Glen Newton wrote: +2 On 25 February 2010 04:45, Avi Rosenschein wrote: Similarity can only be set per index, but I want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only passed to some such as lengthNorm() but not others such as tf() +1 -- Avi I had already raised this in https://issues.apache.org/jira/browse/LUCENE-2236 if you ant to add anything to it - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
Uwe Schindler wrote: Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 - 3.0.1 has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5. Hmm , I don't really agree with deprecating Version.LUCENE_CURRENT ( http://issues.apache.org/jira/browse/LUCENE-2080) I'm sure in many projects, especially ones that are not yet released developers would expect to pick up the latest features when they download the latest version of Lucene without having to update the value of the Version constant (its just make devlopment a bit more tiresome) and would realize that code should be rebuilt and indexes should be built with same version as searching indexes. I mean whenever you update the version of a library should really test your code, after all Lucene 3.0.0 changed some things in 2.9.2 unwittingly, hence the need for 3.0.1. Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar file! On Fri, Feb 26, 2010 at 5:12 AM, Paul Taylor wrote: > Uwe Schindler wrote: > >> Hello Lucene users, >> >> On behalf of the Lucene development community I would like to announce the >> release of Lucene Java versions 3.0.1 and 2.9.2: >> >> Both releases fix bugs in the previous versions: >> >> - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java >> 1.4 - 3.0.1 has the same bug fix level but is for the Lucene Java 3.x >> series, based on Java 5. >> > Hmm , I don't really agree with deprecating Version.LUCENE_CURRENT ( > http://issues.apache.org/jira/browse/LUCENE-2080) > > I'm sure in many projects, especially ones that are not yet released > developers would expect to pick up the latest features when they download > the latest version of Lucene without having to update the value of the > Version constant (its just make devlopment a bit more tiresome) and would > realize that code should be rebuilt and indexes should be built with same > version as searching indexes. I mean whenever you update the version of a > library should really test your code, after all Lucene 3.0.0 changed some > things in 2.9.2 unwittingly, hence the need for 3.0.1. > > Paul > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
Robert Muir wrote: such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar file! Not quite true because you still need to update MY_APP_CURRENT when there is a new version, but yes thats more mangeable Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
Just one thought... For me it would be natural to be never confronted with the Version.xx thing in the api unless you really need. so f.e. having new QueryParser("", new KeywordAnalyzer()).parse("content: the"); as a default (probably using Version.LUCENE_CURRENT under the hood), but having new QueryParser(Version.XXX,"", new KeywordAnalyzer()).parse("content: the"); as well. Of cause this would require a lot of method/constructor overloading, but would make the api more user friendly for those who write some code where the version don't matter... Johannes On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote: > Robert Muir wrote: >> such projects can do this, in one place: >> >> public static final Version MY_APP_CURRENT = Version.LUCENE_30; >> >> then later >> >> StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); >> >> then they have complete control of this, independent of when the upgrade >> lucene's jar file! > Not quite true because you still need to update MY_APP_CURRENT when there is > a new version, but yes thats more mangeable > > Paul > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
That would be more natural/convenient, but it'd unfortunately defeat the whole reason Version was added in the first place. By making Version required, we force callers to be explicit to Lucene about what level of back compat is required. This then enables Lucene to improve its defaults with each release, without breaking users that need to keep backwards compatibility. Mike On Fri, Feb 26, 2010 at 5:42 AM, Johannes Zillmann wrote: > Just one thought... > > For me it would be natural to be never confronted with the Version.xx thing > in the api unless you really need. > so f.e. having > new QueryParser("", new KeywordAnalyzer()).parse("content: the"); > as a default (probably using Version.LUCENE_CURRENT under the hood), but > having > new QueryParser(Version.XXX,"", new KeywordAnalyzer()).parse("content: > the"); > as well. > > Of cause this would require a lot of method/constructor overloading, but > would make the api more user friendly for those who write some code where the > version don't matter... > Johannes > > On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote: > >> Robert Muir wrote: >>> such projects can do this, in one place: >>> >>> public static final Version MY_APP_CURRENT = Version.LUCENE_30; >>> >>> then later >>> >>> StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); >>> >>> then they have complete control of this, independent of when the upgrade >>> lucene's jar file! >> Not quite true because you still need to update MY_APP_CURRENT when there is >> a new version, but yes thats more mangeable >> >> Paul >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: NAS vs SAN vs Server Disk RAID
Hi Ian: Only as curiosity ;) Which distributed file system are you using on top of your NAS storage? Best regards, Marcelo. On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea wrote: > We've run lucene on NAS, although not with indexes anything like as > large as 1Tb, and gave up because NFS and lucene don't really work > very well together. Google for "lucene nfs" for some details, and some > workarounds. > > I'd second Kay Kay's suggestion to look at a distributed solution such as > Katta. > > > -- > Ian. > > > On Wed, Feb 24, 2010 at 11:54 PM, Andrew Bruno wrote: >> Hello, >> >> I am working with an application that offers its customers their own index, >> primary two indexes for different needs per customer. >> >> As our business is growing and growing, I now have a situation where the web >> application has its customer's index on one volume, and its getting close to >> 1Tbyte. >> >> There are lots of updates and inserts, and plenty of searches. As you can >> imagine, the application is starting to slow down heavily, especially during >> high traffic time when documents (PDF, DOCs, etc) are indexed. >> >> Since the disk IO on the server is high, our datacenter engineers suggested >> we look at NAS or SAN, for performance gain, and for future growth. >> >> Has anyone had any experience in running Lucene2.0/Compass in these >> environments? Do you know of any case studies, whitepapers, web sites? >> >> Thanks >> Andrew >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://mochoa.sites.exa.unicen.edu.ar/ __ Want to integrate Lucene and Oracle? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html Is Oracle 11g REST ready? http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: NAS vs SAN vs Server Disk RAID
NFS. It works fine for simple essentially static lucene indexes and we still use it for that, but things tended to fall apart with dynamic indexes. -- Ian. On Fri, Feb 26, 2010 at 11:06 AM, Marcelo Ochoa wrote: > Hi Ian: > Only as curiosity ;) > Which distributed file system are you using on top of your NAS storage? > Best regards, Marcelo. > > On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea wrote: >> We've run lucene on NAS, although not with indexes anything like as >> large as 1Tb, and gave up because NFS and lucene don't really work >> very well together. Google for "lucene nfs" for some details, and some >> workarounds. >> >> I'd second Kay Kay's suggestion to look at a distributed solution such as >> Katta. >> >> >> -- >> Ian. >> >> >> On Wed, Feb 24, 2010 at 11:54 PM, Andrew Bruno >> wrote: >>> Hello, >>> >>> I am working with an application that offers its customers their own index, >>> primary two indexes for different needs per customer. >>> >>> As our business is growing and growing, I now have a situation where the web >>> application has its customer's index on one volume, and its getting close to >>> 1Tbyte. >>> >>> There are lots of updates and inserts, and plenty of searches. As you can >>> imagine, the application is starting to slow down heavily, especially during >>> high traffic time when documents (PDF, DOCs, etc) are indexed. >>> >>> Since the disk IO on the server is high, our datacenter engineers suggested >>> we look at NAS or SAN, for performance gain, and for future growth. >>> >>> Has anyone had any experience in running Lucene2.0/Compass in these >>> environments? Do you know of any case studies, whitepapers, web sites? >>> >>> Thanks >>> Andrew >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > > -- > Marcelo F. Ochoa > http://marceloochoa.blogspot.com/ > http://mochoa.sites.exa.unicen.edu.ar/ > __ > Want to integrate Lucene and Oracle? > http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html > Is Oracle 11g REST ready? > http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2
Could there be a Version value called LUCENE_LATEST_DANGER_USE_AT_YOUR_OWN_RISK or whatever you want to make it. I understand the argument about backwards compatibility but I'm with Johannes on making things easier for those who have code which doesn't require the compatibility. Like me. I've been using lucene since the very beginning and don't recall ever having been bitten by any back compatibility problems (another reason for praise for the committers) and would rather not have to start changing literals on upgrades. Is the plan to remove LUCENE_CURRENT altogether or to leave it in, permanently deprecated? If the latter we could carry on using it although living with deprecations isn't great. Doing simple things in lucene does in general seem to be getting harder. Off the top of my head ... IndexSearcher s = new IndexSearcher("/my/index"); QueryParser qp = new QueryParser("", new StandardAnalyzer()); Query q = qp.parse("field: value"); Hits h = s.search(q); for (int i = 0; i < h.length; i++) { System.out.println(h.doc(i).get("field")); } used to work. It won't now of course, and I'd have to look at the javadocs to come up with alternatives. Keep APIs simple! -- Ian. On Fri, Feb 26, 2010 at 10:50 AM, Michael McCandless wrote: > That would be more natural/convenient, but it'd unfortunately defeat > the whole reason Version was added in the first place. > > By making Version required, we force callers to be explicit to Lucene > about what level of back compat is required. > > This then enables Lucene to improve its defaults with each release, > without breaking users that need to keep backwards compatibility. > > Mike > > On Fri, Feb 26, 2010 at 5:42 AM, Johannes Zillmann > wrote: >> Just one thought... >> >> For me it would be natural to be never confronted with the Version.xx thing >> in the api unless you really need. >> so f.e. having >> new QueryParser("", new KeywordAnalyzer()).parse("content: the"); >> as a default (probably using Version.LUCENE_CURRENT under the hood), but >> having >> new QueryParser(Version.XXX,"", new >> KeywordAnalyzer()).parse("content: the"); >> as well. >> >> Of cause this would require a lot of method/constructor overloading, but >> would make the api more user friendly for those who write some code where >> the version don't matter... >> Johannes >> >> On Feb 26, 2010, at 11:27 AM, Paul Taylor wrote: >> >>> Robert Muir wrote: such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar file! >>> Not quite true because you still need to update MY_APP_CURRENT when there >>> is a new version, but yes thats more mangeable >>> >>> Paul >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
OK -- I can now see what happened. There was a merge still running, when you called IW.commit (Lucene Merge Thread #0). Because IW.commit does not wait for BG merges to finish, but IW.close does (by default), this means you'll pick up an extra version whenever a merge is running when you call close. Mike On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan wrote: > I'm pretty sure this output occurred when the version number skipped +1. > The line containing ''. separates the close/open IndexWriter. > > IFD [Indexer]: setInfoStream > deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 > IW 9 [Indexer]: setInfoStream: > dir=org.apache.lucene.store.SimpleFSDirectory@ pathname>\lresumes1.search.main.1 autoCommit=false > mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler=org.apache.lucene.index.concurrentmergescheduler@6772cfdframBufferSizeMB=16.0 > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b _d:C1220->_d > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > IW 9 [Indexer]: flush at getReader > IW 9 [Indexer]: flush: segment=null docStoreSegment=null docStoreOffset=0 > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 > numBufDelTerms=0 > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 > allocMB=15.816 deletesMB=0.203 triggerMB=16 > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false > numDocs=1456 numBufDelTerms=1456 > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > _i:C703->_h > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456 > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 newFlushedSize=4969789 > docs/MB=307.202 new/old=29.966% > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > isCommit = false] > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > isCommit = false] > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 segments > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 segments > IW 9 [UpdWriterBuild : 9]: CMS: now merge > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > _j:C1456->_j > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return > IW 9 [Indexer]: prepareCommit: flush > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j docStoreOffset=1456 > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 > numBufDelTerms=509 > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > _j:C1456->_j > IW 9 [Indexer]: flush shared docStore segment _j > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j > numDocs=1965 > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509 > IW 9 [Indexer]: DW: oldRAMSize=7483392 newFlushedSize=1854970 > docs/MB=287.727 new/old=24.788% > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = false] > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted docIDs > and 0 deleted queries on 11 segments. > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = false] > IW 9 [Indexer]: LMP: findMerges: 11 segments > IW 9 [Indexer]: LMP: level 6.863048 to 7.613048: 1 segments > IW 9 [Indexer]: LMP: level 6.2247195 to 6.696363: 10 segments > IW 9 [Indexer]: LMP: 1 to 11: add this merge > IW 9 [Indexer]: add merge to pendingMerges: _b:C1204->_b _c:C717->_b > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > _j:C1456->_j _k:C509->_j [total 1 pending] > IW 9 [Indexer]: CMS: now merge > IW 9 [Indexer]: CMS: index: _a:C9780 _b:C1204->_b _c:C717->_b _d:C1220->_d > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h _j:C1456->_j > _k:C509->_j > IW 9 [Indexer]: CMS: consider merge _b:C1204->_b _c:C717->_b _d:C1220->_d > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h _j:C1456->_j > _k:C509->_j into _l [mergeDocStores] > IW 9 [Indexer]: CMS: launch new thread [Lucene Merge Thread #0] > IW 9 [Indexer]: CMS: no more merges pending; now return > IW 9 [Indexer]: startCommit(): start sizeInBytes=0 > IW 9 [Indexer]: startCommit index=_a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > _j:C1456->_j _k:C509->_j changeCount=7 > IW 9 [Lucene Merge Thread #0]: CMS: merge thread: start > IW 9 [Indexer]: now sync _k.f
NumericField exact match
Hi Guys, Is it possible to make exact searches on fields that are of type NumericField and if yes how? In the LIA book part 2 I found only information about Range searches on such fields and how to Sort them. Example - I have field "size" that can take integers as values. I want to get docs that are with "size:100". For the regular fields "size:100" is OK to pass to Parser but with NumericField it does not work. The only approach to support such fields that I can see is - to have parallel casual Field (example "size2") and to index the same data there. And then when user wants exact search on "size" I to perform "size2:100". Is this the most appropriate way for my case on your opinion? Thanks, Ivan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
Is there a way for the application to wait for the BG commit to finish before it calls IW.close? If so, would this prevent the extra version? The extra version causes the app. to think that the external data it committed is out of synch with the index, which requires the app to do extra processing to re-synch. Thanks, Peter On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK -- I can now see what happened. > > There was a merge still running, when you called IW.commit (Lucene > Merge Thread #0). Because IW.commit does not wait for BG merges to > finish, but IW.close does (by default), this means you'll pick up an > extra version whenever a merge is running when you call close. > > Mike > > On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan > wrote: > > I'm pretty sure this output occurred when the version number skipped +1. > > The line containing ''. separates the close/open IndexWriter. > > > > IFD [Indexer]: setInfoStream > > > deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 > > IW 9 [Indexer]: setInfoStream: > > dir=org.apache.lucene.store.SimpleFSDirectory@ > pathname>\lresumes1.search.main.1 autoCommit=false > > > mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler > =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb > =16.0 > > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 > > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d > > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > > IW 9 [Indexer]: flush at getReader > > IW 9 [Indexer]: flush: segment=null docStoreSegment=null > docStoreOffset=0 > > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 > > numBufDelTerms=0 > > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > _i:C703->_h > > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 > > allocMB=15.816 deletesMB=0.203 triggerMB=16 > > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j > > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false > > numDocs=1456 numBufDelTerms=1456 > > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b > > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f > _h:C1291->_h > > _i:C703->_h > > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456 > > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 > newFlushedSize=4969789 > > docs/MB=307.202 new/old=29.966% > > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > > isCommit = false] > > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > > isCommit = false] > > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments > > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 segments > > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 segments > > IW 9 [UpdWriterBuild : 9]: CMS: now merge > > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b > _c:C717->_b > > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > _i:C703->_h > > _j:C1456->_j > > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return > > IW 9 [Indexer]: prepareCommit: flush > > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j > docStoreOffset=1456 > > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 > > numBufDelTerms=509 > > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > _i:C703->_h > > _j:C1456->_j > > IW 9 [Indexer]: flush shared docStore segment _j > > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j > > numDocs=1965 > > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509 > > IW 9 [Indexer]: DW: oldRAMSize=7483392 newFlushedSize=1854970 > > docs/MB=287.727 new/old=24.788% > > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = > false] > > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted > docIDs > > and 0 deleted queries on 11 segments. > > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = > false] > > IW 9 [Indexer]: LMP: findMerges: 11 segments > > IW 9 [Indexer]: LMP: level 6.863048 to 7.613048: 1 segments > > IW 9 [Indexer]: LMP: level 6.2247195 to 6.696363: 10 segments > > IW 9 [Indexer]: LMP: 1 to 11: add this merge > > IW 9 [Indexer]: add merge to pendingMerges: _b:C1204->_b _c:C717->_b > > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > _i:C703->_h > > _j:C1456->_j _k:C509->_j [total 1 pending] > > IW 9 [Indexer]: CMS: now merge > > IW 9 [Indexer]: CMS: index: _a:C9780 _b:C1204->_b _c:C717->_b > _d:C1220->_d > > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > _j:C1456->_j > > _k:C509->_j > > IW 9 [Indexer]: CMS: consider merg
RE: NumericField exact match
It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - val is the exact match. This is not slower as this automatically rewrites to a non-scored TermQuery. If you already changed QueryParser, you can also override the method for exactMatches (newTermQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ivan Vasilev [mailto:ivasi...@sirma.bg] > Sent: Friday, February 26, 2010 8:21 PM > To: LUCENE MAIL LIST > Subject: NumericField exact match > > Hi Guys, > > Is it possible to make exact searches on fields that are of type > NumericField and if yes how? > In the LIA book part 2 I found only information about Range searches on > such fields and how to Sort them. > > Example - I have field "size" that can take integers as values. > I want to get docs that are with "size:100". > For the regular fields "size:100" is OK to pass to Parser but with > NumericField it does not work. > The only approach to support such fields that I can see is - to have > parallel casual Field (example "size2") and to index the same data > there. > And then when user wants exact search on "size" I to perform > "size2:100". > > Is this the most appropriate way for my case on your opinion? > > Thanks, > Ivan > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
Note that it's a BG merge (not commit)... You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? If you call that, then call .getReader().getVersion(), then close & open the writer, I think (but you better test to be sure!) the next .getReader().getVersion() should always match. Mike On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan wrote: > Is there a way for the application to wait for the BG commit to finish > before it calls IW.close? If so, would this prevent the extra version? The > extra version causes the app. to think that the external data it committed > is out of synch with the index, which requires the app to do extra > processing to re-synch. > > Thanks, > Peter > > > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> OK -- I can now see what happened. >> >> There was a merge still running, when you called IW.commit (Lucene >> Merge Thread #0). Because IW.commit does not wait for BG merges to >> finish, but IW.close does (by default), this means you'll pick up an >> extra version whenever a merge is running when you call close. >> >> Mike >> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan >> wrote: >> > I'm pretty sure this output occurred when the version number skipped +1. >> > The line containing ''. separates the close/open IndexWriter. >> > >> > IFD [Indexer]: setInfoStream >> > >> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 >> > IW 9 [Indexer]: setInfoStream: >> > dir=org.apache.lucene.store.SimpleFSDirectory@> > pathname>\lresumes1.search.main.1 autoCommit=false >> > >> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler >> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb >> =16.0 >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b >> _d:C1220->_d >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h >> > IW 9 [Indexer]: flush at getReader >> > IW 9 [Indexer]: flush: segment=null docStoreSegment=null >> docStoreOffset=0 >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 >> > numBufDelTerms=0 >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> _i:C703->_h >> > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 >> > allocMB=15.816 deletesMB=0.203 triggerMB=16 >> > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j >> > docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=false >> > numDocs=1456 numBufDelTerms=1456 >> > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f >> _h:C1291->_h >> > _i:C703->_h >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j numDocs=1456 >> > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 >> newFlushedSize=4969789 >> > docs/MB=307.202 new/old=29.966% >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >> > isCommit = false] >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >> > isCommit = false] >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 segments >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 segments >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge >> > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b >> _c:C717->_b >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> _i:C703->_h >> > _j:C1456->_j >> > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return >> > IW 9 [Indexer]: prepareCommit: flush >> > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j >> docStoreOffset=1456 >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 >> > numBufDelTerms=509 >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> _i:C703->_h >> > _j:C1456->_j >> > IW 9 [Indexer]: flush shared docStore segment _j >> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j >> > numDocs=1965 >> > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509 >> > IW 9 [Indexer]: DW: oldRAMSize=7483392 newFlushedSize=1854970 >> > docs/MB=287.727 new/old=24.788% >> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = >> false] >> > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted >> docIDs >> > and 0 deleted queries on 11 segments. >> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = >> false] >> > IW 9 [Indexer]: LMP: findMerges: 11 segments >> > IW 9 [Indexer]: LMP: level 6.863048 to 7.613048: 1 segments >> > IW 9 [Indexer]: LMP: level 6.2247195 to 6.696363: 10 segments >> > IW 9 [Indexer]: LMP: 1 to 11: a
Re: IndexWriter.getReader.getVersion behavior
Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Note that it's a BG merge (not commit)... > > You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? > If you call that, then call .getReader().getVersion(), then close & > open the writer, I think (but you better test to be sure!) the next > .getReader().getVersion() should always match. > > Mike > > On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan > wrote: > > Is there a way for the application to wait for the BG commit to finish > > before it calls IW.close? If so, would this prevent the extra version? > The > > extra version causes the app. to think that the external data it > committed > > is out of synch with the index, which requires the app to do extra > > processing to re-synch. > > > > Thanks, > > Peter > > > > > > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> OK -- I can now see what happened. > >> > >> There was a merge still running, when you called IW.commit (Lucene > >> Merge Thread #0). Because IW.commit does not wait for BG merges to > >> finish, but IW.close does (by default), this means you'll pick up an > >> extra version whenever a merge is running when you call close. > >> > >> Mike > >> > >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan > >> wrote: > >> > I'm pretty sure this output occurred when the version number skipped > +1. > >> > The line containing ''. separates the close/open > IndexWriter. > >> > > >> > IFD [Indexer]: setInfoStream > >> > > >> > deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 > >> > IW 9 [Indexer]: setInfoStream: > >> > dir=org.apache.lucene.store.SimpleFSDirectory@ >> > pathname>\lresumes1.search.main.1 autoCommit=false > >> > > >> > mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler > >> > =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb > >> =16.0 > >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 > >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b > >> _d:C1220->_d > >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h > >> > IW 9 [Indexer]: flush at getReader > >> > IW 9 [Indexer]: flush: segment=null docStoreSegment=null > >> docStoreOffset=0 > >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 > >> > numBufDelTerms=0 > >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > >> _i:C703->_h > >> > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 > >> > allocMB=15.816 deletesMB=0.203 triggerMB=16 > >> > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j > >> > docStoreOffset=0 flushDocs=true flushDeletes=false > flushDocStores=false > >> > numDocs=1456 numBufDelTerms=1456 > >> > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b > >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f > >> _h:C1291->_h > >> > _i:C703->_h > >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j > numDocs=1456 > >> > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 > >> newFlushedSize=4969789 > >> > docs/MB=307.202 new/old=29.966% > >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > >> > isCommit = false] > >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; > >> > isCommit = false] > >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments > >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 > segments > >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 > segments > >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge > >> > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b > >> _c:C717->_b > >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > >> _i:C703->_h > >> > _j:C1456->_j > >> > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return > >> > IW 9 [Indexer]: prepareCommit: flush > >> > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j > >> docStoreOffset=1456 > >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 > >> > numBufDelTerms=509 > >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b _c:C717->_b > >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h > >> _i:C703->_h > >> > _j:C1456->_j > >> > IW 9 [Indexer]: flush shared docStore segment _j > >> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j > >> > numDocs=1965 > >> > IW 9 [Indexer]: DW: flush postings as segment _k numDocs=509 > >> > IW 9 [Indexer]: DW: oldRAMSize=7483392 newFlushedSize=1854970 > >> > docs/MB=287.727 new/old=24.788% > >> > IFD [Indexer]: now checkpoint "segments_b" [11 segments ; isCommit = > >> false] > >> > IW 9 [Indexer]: DW: apply 1965 buffered deleted terms and 0 deleted > >> docI
Re: IndexWriter.getReader.getVersion behavior
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote: > Great, I'll give it a try. > Thanks! > > > On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Note that it's a BG merge (not commit)... >> >> You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? >> If you call that, then call .getReader().getVersion(), then close & >> open the writer, I think (but you better test to be sure!) the next >> .getReader().getVersion() should always match. >> >> Mike >> >> On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan >> wrote: >> > Is there a way for the application to wait for the BG commit to finish >> > before it calls IW.close? If so, would this prevent the extra version? >> The >> > extra version causes the app. to think that the external data it >> committed >> > is out of synch with the index, which requires the app to do extra >> > processing to re-synch. >> > >> > Thanks, >> > Peter >> > >> > >> > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless < >> > luc...@mikemccandless.com> wrote: >> > >> >> OK -- I can now see what happened. >> >> >> >> There was a merge still running, when you called IW.commit (Lucene >> >> Merge Thread #0). Because IW.commit does not wait for BG merges to >> >> finish, but IW.close does (by default), this means you'll pick up an >> >> extra version whenever a merge is running when you call close. >> >> >> >> Mike >> >> >> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan >> >> wrote: >> >> > I'm pretty sure this output occurred when the version number skipped >> +1. >> >> > The line containing ''. separates the close/open >> IndexWriter. >> >> > >> >> > IFD [Indexer]: setInfoStream >> >> > >> >> >> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 >> >> > IW 9 [Indexer]: setInfoStream: >> >> > dir=org.apache.lucene.store.SimpleFSDirectory@> >> > pathname>\lresumes1.search.main.1 autoCommit=false >> >> > >> >> >> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler >> >> >> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb >> >> =16.0 >> >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 >> >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b >> >> _d:C1220->_d >> >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h >> >> > IW 9 [Indexer]: flush at getReader >> >> > IW 9 [Indexer]: flush: segment=null docStoreSegment=null >> >> docStoreOffset=0 >> >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 >> >> > numBufDelTerms=0 >> >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b >> _c:C717->_b >> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> >> _i:C703->_h >> >> > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 >> >> > allocMB=15.816 deletesMB=0.203 triggerMB=16 >> >> > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j >> >> > docStoreOffset=0 flushDocs=true flushDeletes=false >> flushDocStores=false >> >> > numDocs=1456 numBufDelTerms=1456 >> >> > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b >> >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f >> >> _h:C1291->_h >> >> > _i:C703->_h >> >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j >> numDocs=1456 >> >> > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 >> >> newFlushedSize=4969789 >> >> > docs/MB=307.202 new/old=29.966% >> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >> >> > isCommit = false] >> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >> >> > isCommit = false] >> >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments >> >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 >> segments >> >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 >> segments >> >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge >> >> > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b >> >> _c:C717->_b >> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> >> _i:C703->_h >> >> > _j:C1456->_j >> >> > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return >> >> > IW 9 [Indexer]: prepareCommit: flush >> >> > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j >> >> docStoreOffset=1456 >> >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 >> >> > numBufDelTerms=509 >> >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b >> _c:C717->_b >> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >> >> _i:C703->_h >> >> > _j:C1456->_j >> >> > IW 9 [Indexer]: flush shared docStore segment _j >> >> > IW 9 [Indexer]: DW: closeDocStore: 2 files to flush to segment _j >> >> > numDocs=1965 >> >> > IW 9 [Indexer]: DW:
Re: IndexWriter.getReader.getVersion behavior
That should be fine! Mike On Fri, Feb 26, 2010 at 3:26 PM, Peter Keegan wrote: > Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's > when the app calls 'getReader' to create external data. > > Peter > > On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote: > >> Great, I'll give it a try. >> Thanks! >> >> >> On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >>> Note that it's a BG merge (not commit)... >>> >>> You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? >>> If you call that, then call .getReader().getVersion(), then close & >>> open the writer, I think (but you better test to be sure!) the next >>> .getReader().getVersion() should always match. >>> >>> Mike >>> >>> On Fri, Feb 26, 2010 at 2:40 PM, Peter Keegan >>> wrote: >>> > Is there a way for the application to wait for the BG commit to finish >>> > before it calls IW.close? If so, would this prevent the extra version? >>> The >>> > extra version causes the app. to think that the external data it >>> committed >>> > is out of synch with the index, which requires the app to do extra >>> > processing to re-synch. >>> > >>> > Thanks, >>> > Peter >>> > >>> > >>> > On Fri, Feb 26, 2010 at 12:40 PM, Michael McCandless < >>> > luc...@mikemccandless.com> wrote: >>> > >>> >> OK -- I can now see what happened. >>> >> >>> >> There was a merge still running, when you called IW.commit (Lucene >>> >> Merge Thread #0). Because IW.commit does not wait for BG merges to >>> >> finish, but IW.close does (by default), this means you'll pick up an >>> >> extra version whenever a merge is running when you call close. >>> >> >>> >> Mike >>> >> >>> >> On Thu, Feb 25, 2010 at 2:52 PM, Peter Keegan >>> >> wrote: >>> >> > I'm pretty sure this output occurred when the version number skipped >>> +1. >>> >> > The line containing ''. separates the close/open >>> IndexWriter. >>> >> > >>> >> > IFD [Indexer]: setInfoStream >>> >> > >>> >> >>> deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@646f9dd9 >>> >> > IW 9 [Indexer]: setInfoStream: >>> >> > dir=org.apache.lucene.store.SimpleFSDirectory@>> >> > pathname>\lresumes1.search.main.1 autoCommit=false >>> >> > >>> >> >>> mergepolicy=org.apache.lucene.index.logbytesizemergepol...@5be44512mergescheduler >>> >> >>> =org.apache.lucene.index.concurrentmergeschedu...@6772cfdframbuffersizemb >>> >> =16.0 >>> >> > maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 >>> >> > maxFieldLength=10 index=_a:C9780 _b:C1204->_b _c:C717->_b >>> >> _d:C1220->_d >>> >> > _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h _i:C703->_h >>> >> > IW 9 [Indexer]: flush at getReader >>> >> > IW 9 [Indexer]: flush: segment=null docStoreSegment=null >>> >> docStoreOffset=0 >>> >> > flushDocs=false flushDeletes=true flushDocStores=false numDocs=0 >>> >> > numBufDelTerms=0 >>> >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b >>> _c:C717->_b >>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >>> >> _i:C703->_h >>> >> > IW 9 [UpdWriterBuild : 9]: DW: RAM: now flush @ usedMB=15.816 >>> >> > allocMB=15.816 deletesMB=0.203 triggerMB=16 >>> >> > IW 9 [UpdWriterBuild : 9]: flush: segment=_j docStoreSegment=_j >>> >> > docStoreOffset=0 flushDocs=true flushDeletes=false >>> flushDocStores=false >>> >> > numDocs=1456 numBufDelTerms=1456 >>> >> > IW 9 [UpdWriterBuild : 9]: index before flush _a:C9780 _b:C1204->_b >>> >> > _c:C717->_b _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f >>> >> _h:C1291->_h >>> >> > _i:C703->_h >>> >> > IW 9 [UpdWriterBuild : 9]: DW: flush postings as segment _j >>> numDocs=1456 >>> >> > IW 9 [UpdWriterBuild : 9]: DW: oldRAMSize=16584704 >>> >> newFlushedSize=4969789 >>> >> > docs/MB=307.202 new/old=29.966% >>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >>> >> > isCommit = false] >>> >> > IFD [UpdWriterBuild : 9]: now checkpoint "segments_b" [10 segments ; >>> >> > isCommit = false] >>> >> > IW 9 [UpdWriterBuild : 9]: LMP: findMerges: 10 segments >>> >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.863048 to 7.613048: 1 >>> segments >>> >> > IW 9 [UpdWriterBuild : 9]: LMP: level 6.2247195 to 6.696363: 9 >>> segments >>> >> > IW 9 [UpdWriterBuild : 9]: CMS: now merge >>> >> > IW 9 [UpdWriterBuild : 9]: CMS: index: _a:C9780 _b:C1204->_b >>> >> _c:C717->_b >>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >>> >> _i:C703->_h >>> >> > _j:C1456->_j >>> >> > IW 9 [UpdWriterBuild : 9]: CMS: no more merges pending; now return >>> >> > IW 9 [Indexer]: prepareCommit: flush >>> >> > IW 9 [Indexer]: flush: segment=_k docStoreSegment=_j >>> >> docStoreOffset=1456 >>> >> > flushDocs=true flushDeletes=true flushDocStores=true numDocs=509 >>> >> > numBufDelTerms=509 >>> >> > IW 9 [Indexer]: index before flush _a:C9780 _b:C1204->_b >>> _c:C717->_b >>> >> > _d:C1220->_d _e:C778->_d _f:C1173->_f _g:C858->_f _h:C1291->_h >>> >> _i:C70
Re: NumericField exact match
Thanks for the answer Uwe, Does it matter precision step when I use NumericRangeQuery for exact matches? I mean if I use the default precision step when indexing that fields it is guaranteed that: 1. With this query I will always hit the docs that contain "val" for the "field"; 2. I will never hit docs with different that have diferent "val" for the "field"; Ivan Uwe Schindler wrote: It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - val is the exact match. This is not slower as this automatically rewrites to a non-scored TermQuery. If you already changed QueryParser, you can also override the method for exactMatches (newTermQuery). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ivan Vasilev [mailto:ivasi...@sirma.bg] Sent: Friday, February 26, 2010 8:21 PM To: LUCENE MAIL LIST Subject: NumericField exact match Hi Guys, Is it possible to make exact searches on fields that are of type NumericField and if yes how? In the LIA book part 2 I found only information about Range searches on such fields and how to Sort them. Example - I have field "size" that can take integers as values. I want to get docs that are with "size:100". For the regular fields "size:100" is OK to pass to Parser but with NumericField it does not work. The only approach to support such fields that I can see is - to have parallel casual Field (example "size2") and to index the same data there. And then when user wants exact search on "size" I to perform "size2:100". Is this the most appropriate way for my case on your opinion? Thanks, Ivan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org __ NOD32 3990 (20090406) Information __ This message was checked by NOD32 antivirus system. http://www.eset.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
recovering payload from fields
I'm trying to store semantic information in payloads at index time. I believe this part is successful - but I'm having trouble getting access to the payload locations after the index is created. I'd like to know the offset in the original text for the token with the payload - and get this information for all payloads that are set in a Field even if they don't relate to the query. I tried (from the highlighting filter): TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body"); while (tokens.incrementToken()) { TermAttribute term = tokens.getAttribute(TermAttribute.class); if (toker.hasAttribute(PayloadAttribute.class)) { PayloadAttribute payload = tokens.getAttribute(PayloadAttribute.class); OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class); } } But the OffsetAttribute never seems to contain any information. In my token filter do I need to do more than: offsetAtt = addAttribute(OffsetAttribute.class); during construction in order to store Offset information? Thanks, -Chris - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: recovering payload from fields
Hello, To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data. best, C>T> On Fri, Feb 26, 2010 at 3:41 PM, Christopher Condit wrote: > I'm trying to store semantic information in payloads at index time. I > believe this part is successful - but I'm having trouble getting access to > the payload locations after the index is created. I'd like to know the > offset in the original text for the token with the payload - and get this > information for all payloads that are set in a Field even if they don't > relate to the query. I tried (from the highlighting filter): > TokenStream tokens = TokenSources.getTokenStream(reader, 0, "body"); > while (tokens.incrementToken()) { >TermAttribute term = tokens.getAttribute(TermAttribute.class); >if (toker.hasAttribute(PayloadAttribute.class)) { > PayloadAttribute payload = > tokens.getAttribute(PayloadAttribute.class); > OffsetAttribute offset = toker.getAttribute(OffsetAttribute.class); >} > } > But the OffsetAttribute never seems to contain any information. > In my token filter do I need to do more than: > offsetAtt = addAttribute(OffsetAttribute.class); > during construction in order to store Offset information? > > Thanks, > -Chris > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- TH!NKMAP Christopher Tignor | Senior Software Architect 155 Spring Street NY, NY 10012 p.212-285-8600 x385 f.212-285-8999
RE: recovering payload from fields
Hi Chris- > To my knoweldge, the character position of the tokens is not preserved by > Lucene - only the ordinal postion of token's within a document / field is > preserved. Thus you need to store this character offset information > separately, say, as Payload data. Thanks for the information. So adding the OffsetAttribute at index time doesn't embed the offset information in the index - it just makes it available to the TokenFilter? I'll try adding the offset from the attribute to the payload.. In terms of getting access to the payloads is the best way to reconstruct the token stream (as the Highlighter does)? Or is than an easier way to just get access to the payloads? Thanks, -Chris - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: NAS vs SAN vs Server Disk RAID
On Feb 25, 2010, at 12:54 AM, Andrew Bruno wrote: > Since the disk IO on the server is high, our datacenter engineers suggested > we look at NAS or SAN, for performance gain, and for future growth. Alternatively, get a stack of RamSan and call it a day: http://www.ramsan.com/products/products.htm If you cannot afford these, something like Sun's storage server is a pretty cost effective solution: http://www.oracle.com/us/products/servers-storage/servers/x64/031210.htm Personally, I would stay away from a SAN based solution and favor local storage. As always, YMMV. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Infinite loop when searching empty index
Is this a bug in Lucene Java as of tr...@915399? int numDocs = reader.numDocs(); // = 0 (empty index) TopDocsCollector collector = TopScoreDocCollector.create(numDocs, true); searcher.search(new MatchAllDocsQuery(), collector); // never returns // Searcher public void search(Query query, Collector collector) throws IOException { search(createWeight(query), null, collector); // never returns } // extends IndexSearcher public void search(Weight weight, Filter filter, final Collector collector) throws IOException { boolean topScorer = (filter == null) true : false; Scorer scorer = weight.scorer(reader, true, topScorer); if (scorer != null && topScorer) { scorer.score(collector); // never returns // Scorer public void score(Collector collector) throws IOException { collector.setScorer(this); int doc; while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 (infinite) collector.collect(doc); } } Thanks for any feedback, Justin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: recovering payload from fields
> Payload Data is accessed through PayloadSpans so using SpanQUeries is the > netry point it seems. There are tools like PayloadSpanUtil that convert other > queries into SpanQueries for this purpose if needed but the api for Payloads > looks it like it goes through Spans is the bottom line. So there's no way to iterate through all the payloads for a given field? I can't use the SpanQuery mechanism because in this case the entire field will be displayed - and I can't search for "*". Is there some trick I'm not thinking of? > this is the tip of the iceberg; a big dangerous iceberg... Yes - I'm beginning to see that... Thanks, -Chris - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org