Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Peter Karich
Hi, I was hitting a similar exception (for me it was of type 'long'). But I thought it was because I had a programming mistake. termAtt is reused. Couldn't it be that when two threads accessing the incrementToken method at the same time that problems occur? This exception disappeared when I fixed

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
Hi, > I was hitting a similar exception (for me it was of type 'long'). But I thought it > was because I had a programming mistake. termAtt is reused. > Couldn't it be that when two threads accessing the incrementToken method at > the same time that problems occur? If it is not a problem in the

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Peter Karich
> I assume it was a bug like noted before? Exactly. Nothing to do with Lucene IMHO Peter. -- http://jetsli.de news reader for geeks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-

Query that returns all docs that contain a field

2011-12-19 Thread Paul Taylor
I was looking for a Query that returns all documents that contain a particular field, it doesnt matter what the value of the field is just that the document contains the field. Paul - To unsubscribe, e-mail: java-user-unsubs

Re: Query that returns all docs that contain a field

2011-12-19 Thread Trejkaz
On Mon, Dec 19, 2011 at 9:05 PM, Paul Taylor wrote: > I was looking for a Query that returns all documents that contain a > particular field, it doesnt matter what the value of the field is just that > the document contains the field. If you don't care about performance (or if it runs fast enough

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Peter Karich
BTW: how can I use NumericUtils.longToPrefixCoded in 4.0 ? Peter. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
Hi, NumericUtils is an internal implementation class, you should not use it. What do you want to do? There is no need to call any of its methods during indexing or searching. Everything else is advanced. I the latter case you should RTFM of BytesRef and realted classes (possibly watch the flexible

Re: Query that returns all docs that contain a field

2011-12-19 Thread Michael McCandless
You could also use FieldCache.getDocsWithField; it returns a bit set where the bit is set if that document had that field. Mike McCandless http://blog.mikemccandless.com On Mon, Dec 19, 2011 at 7:32 AM, Trejkaz wrote: > On Mon, Dec 19, 2011 at 9:05 PM, Paul Taylor wrote: >> I was looking for a

RE: Query that returns all docs that contain a field

2011-12-19 Thread Uwe Schindler
Hi, There is also a Query/Filter based on that FieldCache: o.a.l.search.FieldValueFilter, possibly wrapped with ConstantScoreQuery Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless

Re: Query that returns all docs that contain a field

2011-12-19 Thread Paul Taylor
On 19/12/2011 13:39, Uwe Schindler wrote: Hi, There is also a Query/Filter based on that FieldCache: o.a.l.search.FieldValueFilter, possibly wrapped with ConstantScoreQuery Uwe Okay, thanks for all the options. Paul - To u

Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Peter Karich
Hi Uwe, thanks for the talk suggestion(s)*. I was using it for faster term lookups of a long 'id'. How would this be done with 4.0? Before I did it via Term: new Term(fieldName, NumericUtils.longToPrefixCoded(longValue)); How should I generally do "term lookup" in 4.0 as you said in the video t

RE: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Uwe Schindler
> Hi Uwe, > > thanks for the talk suggestion(s)*. > > I was using it for faster term lookups of a long 'id'. How would this be done with > 4.0? Before I did it via Term: > > new Term(fieldName, NumericUtils.longToPrefixCoded(longValue)); If you want to query on a single numeric term value, use

inspecting chinese index using luke

2011-12-19 Thread Peyman Faratin
hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); We are trying to inspect the index in Luke 3.4.0 (have chosen the UTF-8 option in Luke)

RE: Using Lucene to match document sets to each other

2011-12-19 Thread Paul Allan Hill
I'm not sure I understand what your field arrangement would be when you say "[T]he items I'm pulling in from the web contain large bodies of text (descriptions) whereas the products in my catalog consist of shorter fields such as product name, manufacturer, product code, etc. So using the smaller

RE: inspecting chinese index using luke

2011-12-19 Thread Uwe Schindler
Hi, Please look at: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail header

Re: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Simon Willnauer
On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich wrote: > Hi Uwe, > > thanks for the talk suggestion(s)*. > > I was using it for faster term lookups of a long 'id'. How would this be > done with 4.0? Before I did it via Term: > > new Term(fieldName, NumericUtils.longToPrefixCoded(longValue)); > > How

Re: Lucene 4.0 questions, was: shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Simon Willnauer
On Mon, Dec 19, 2011 at 9:04 PM, Simon Willnauer wrote: > On Mon, Dec 19, 2011 at 5:03 PM, Peter Karich wrote: >> Hi Uwe, >> >> thanks for the talk suggestion(s)*. >> >> I was using it for faster term lookups of a long 'id'. How would this be >> done with 4.0? Before I did it via Term: >> >> new

Re: Table Defn and/or ER Diagram of Segment files

2011-12-19 Thread Simon Willnauer
I think you are confusing something here. BDB can be used as a "Directory" implementation but a Directory is a simple "blob" store. BDB only stores binary BLOB which corresponds to a file. AFAIK we dropped the BDB support entirely a couple of releases ago. In Lucene you can think of one large table

Re: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream

2011-12-19 Thread Thushara Wijeratna
Actually, the a single timestamp field is being used by several threads. Sorry, I missed that, and thanks Peter, Uwe both for the explanations. [In my code snippet, I was trying to simplify so missed this. I'm constructing one timestamp field and passing it to all threads in the ctor.] On Mon, Dec

Payload filtering

2011-12-19 Thread Kyley Jex
I'm working on providing searching for annotated (use of UIMA) Medical Documents. In the context of the annotated document, we identify relevant medical terms. We also identify the negation of certain terms. From what I've read and seen in examples, I'm using payloads to associate the annotation