RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
> Payload Data is accessed through PayloadSpans so using SpanQUeries is the > netry point it seems. There are tools like PayloadSpanUtil that convert other > queries into SpanQueries for this purpose if needed but the api for Payloads > looks it like it goes through Spans is the bottom line. So t

Infinite loop when searching empty index

2010-02-26 Thread Justin
Is this a bug in Lucene Java as of tr...@915399? int numDocs = reader.numDocs(); // = 0 (empty index) TopDocsCollector collector = TopScoreDocCollector.create(numDocs, true); searcher.search(new MatchAllDocsQuery(), collector); // never returns // Searcher public void searc

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Petite Abeille
On Feb 25, 2010, at 12:54 AM, Andrew Bruno wrote: > Since the disk IO on the server is high, our datacenter engineers suggested > we look at NAS or SAN, for performance gain, and for future growth. Alternatively, get a stack of RamSan and call it a day: http://www.ramsan.com/products/products.h

RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
Hi Chris- > To my knoweldge, the character position of the tokens is not preserved by > Lucene - only the ordinal postion of token's within a document / field is > preserved. Thus you need to store this character offset information > separately, say, as Payload data. Thanks for the information. S

Re: recovering payload from fields

2010-02-26 Thread Christopher Tignor
Hello, To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data. best, C>T> On Fri, Feb 26, 2010 at 3:

recovering payload from fields

2010-02-26 Thread Christopher Condit
I'm trying to store semantic information in payloads at index time. I believe this part is successful - but I'm having trouble getting access to the payload locations after the index is created. I'd like to know the offset in the original text for the token with the payload - and get this inform

Re: NumericField exact match

2010-02-26 Thread Ivan Vasilev
Thanks for the answer Uwe, Does it matter precision step when I use NumericRangeQuery for exact matches? I mean if I use the default precision step when indexing that fields it is guaranteed that: 1. With this query I will always hit the docs that contain "val" for the "field"; 2. I will never

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
That should be fine! Mike On Fri, Feb 26, 2010 at 3:26 PM, Peter Keegan wrote: > Can  IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's > when the app calls 'getReader' to create external data. > > Peter > > On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote: > >> Great, I

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan wrote: > Great, I'll give it a try. > Thanks! > > > On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless < > luc

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Note that it's a BG merge (not commit)... > > You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? > If you call that, then call .getReader().getVersion(

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
Note that it's a BG merge (not commit)... You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? If you call that, then call .getReader().getVersion(), then close & open the writer, I think (but you better test to be sure!) the next .getReader().getVersion() should always match.

RE: NumericField exact match

2010-02-26 Thread Uwe Schindler
It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - val is the exact match. This is not slower as this automatically rewrites to a non-scored TermQuery. If you already changed QueryParser, you can also override the method for exactMatches (newTermQuery). - Uwe Schin

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Is there a way for the application to wait for the BG commit to finish before it calls IW.close? If so, would this prevent the extra version? The extra version causes the app. to think that the external data it committed is out of synch with the index, which requires the app to do extra processing

NumericField exact match

2010-02-26 Thread Ivan Vasilev
Hi Guys, Is it possible to make exact searches on fields that are of type NumericField and if yes how? In the LIA book part 2 I found only information about Range searches on such fields and how to Sort them. Example - I have field "size" that can take integers as values. I want to get docs t

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
OK -- I can now see what happened. There was a merge still running, when you called IW.commit (Lucene Merge Thread #0). Because IW.commit does not wait for BG merges to finish, but IW.close does (by default), this means you'll pick up an extra version whenever a merge is running when you call clo

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Ian Lea
Could there be a Version value called LUCENE_LATEST_DANGER_USE_AT_YOUR_OWN_RISK or whatever you want to make it. I understand the argument about backwards compatibility but I'm with Johannes on making things easier for those who have code which doesn't require the compatibility. Like me. I've be

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Ian Lea
NFS. It works fine for simple essentially static lucene indexes and we still use it for that, but things tended to fall apart with dynamic indexes. -- Ian. On Fri, Feb 26, 2010 at 11:06 AM, Marcelo Ochoa wrote: > Hi Ian: >  Only as curiosity ;) >  Which distributed file system are you using o

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Marcelo Ochoa
Hi Ian: Only as curiosity ;) Which distributed file system are you using on top of your NAS storage? Best regards, Marcelo. On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea wrote: > We've run lucene on NAS, although not with indexes anything like as > large as 1Tb, and gave up because NFS and lucen

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Michael McCandless
That would be more natural/convenient, but it'd unfortunately defeat the whole reason Version was added in the first place. By making Version required, we force callers to be explicit to Lucene about what level of back compat is required. This then enables Lucene to improve its defaults with each

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Johannes Zillmann
Just one thought... For me it would be natural to be never confronted with the Version.xx thing in the api unless you really need. so f.e. having new QueryParser("", new KeywordAnalyzer()).parse("content: the"); as a default (probably using Version.LUCENE_CURRENT under the hood), but ha

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor
Robert Muir wrote: such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar f

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Robert Muir
such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar file! On Fri, Feb 26,

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor
Uwe Schindler wrote: Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4

Re: If you could have one feature in Lucene...

2010-02-26 Thread Paul Taylor
Glen Newton wrote: +2 On 25 February 2010 04:45, Avi Rosenschein wrote: Similarity can only be set per index, but I want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only passed to some such as

Re: problem about backup index file

2010-02-26 Thread Michael McCandless
Well, lucene is "write once" and then, eventually, "delete once" ;) Ie files are eventually deleted (when they are merged away). So when you do the incremental backup, any file not listed in the current commit can be removed from your backup (assuming you only want to backup the last commit). Do

ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Uwe Schindler
Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 - 3.0.1 has the same bu