?????? How to use TokenStream build two fields

2013-04-23 Thread 808
Hello! Thank you for your reply.It is my oversight that I did not append the code at (AnalyzeContext.java:124). But when I try to use the StandardAnalyzer to do the same thing ,I met the same Exception. Here is my code(IndexWriter has already been initialized): private static void indexFile(Index

Re: Reading Payloads

2013-04-23 Thread Carsten Schnober
Am 23.04.2013 16:17, schrieb Alan Woodward: > It doesn't sound as though an inverted index is really what you want to be > querying here, if I'm reading you right. You want to get the payloads for > spans at a specific position, but you don't particularly care about the > actual term at that p

Re: Reading Payloads

2013-04-23 Thread Alan Woodward
Hi Carsten, It doesn't sound as though an inverted index is really what you want to be querying here, if I'm reading you right. You want to get the payloads for spans at a specific position, but you don't particularly care about the actual term at that position? You might find that BinaryDocV

Re: Reading Payloads

2013-04-23 Thread Carsten Schnober
Am 23.04.2013 15:27, schrieb Alan Woodward: > There's the SpanPositionCheckQuery family - SpanRangeQuery, SpanFirstQuery, > etc. Is that the sort of thing you're looking for? Hi Alan, thanks for the pointer, this is the right direction indeed. However, these queries are based on a SpanQuery whic

Re: Reading Payloads

2013-04-23 Thread Alan Woodward
There's the SpanPositionCheckQuery family - SpanRangeQuery, SpanFirstQuery, etc. Is that the sort of thing you're looking for? Alan Woodward www.flax.co.uk On 23 Apr 2013, at 13:36, Carsten Schnober wrote: > Am 23.04.2013 13:47, schrieb Carsten Schnober: >> I'm trying to figure out a way to u

org.apache.lucene.classification - bug in SimpleNaiveBayesClassifier

2013-04-23 Thread Alexey Anatolevitch
Hi, Anybody is actively working on the classification package? I was trying it with 4.2.1 and SimpleNaiveBayesClassifier seems to have a bug - the local copy of BytesRef referenced by foundClass is affected by subsequent TermsEnum.iterator.next() calls as the shared BytesRef.bytes changes... I ca

Re: Reading Payloads

2013-04-23 Thread Carsten Schnober
Am 23.04.2013 13:47, schrieb Carsten Schnober: > I'm trying to figure out a way to use a query as Uwe suggested. My > scenario is to perform a query and then retrieve some of the payloads > upon user request, so there no obvious way to wrap this into a query as > I can't know what (terms) to query

Re: How to use TokenStream build two fields

2013-04-23 Thread Simon Willnauer
hey there, I think your english is perfectly fine! Given the info you provided it's very hard to answer your question... I can't look into org.wltea.analyzer.core.AnalyzeContext.fillBuffer(AnalyzeContext.java:124) but apparently there is a nullpointer happening here. maybe you can track that down

How to use TokenStream build two fields

2013-04-23 Thread 808
I am a lucene user from China,so my English is bad.I will try my best to explain my problem. The version I use is 4.2.I have a problem during I use lucene . Here is my code: public void testIndex() throws IOException, SQLException { NewsDao ndao = new NewsDao(); Lis

Re: Reading Payloads

2013-04-23 Thread Carsten Schnober
Am 23.04.2013 13:21, schrieb Michael McCandless: > Actually, term vectors can store payloads now (LUCENE-1888), so if that > field was indexed with FieldType.setStoreTermVectorPayloads they should be > there. > > But I suspect the TokenSources.getTokenStream API (which I think un-inverts > the ter

Re: Reading Payloads

2013-04-23 Thread Michael McCandless
Actually, term vectors can store payloads now (LUCENE-1888), so if that field was indexed with FieldType.setStoreTermVectorPayloads they should be there. But I suspect the TokenSources.getTokenStream API (which I think un-inverts the term vectors to recreate the token stream = very slow?) wasn't f

RE: Reading Payloads

2013-04-23 Thread Uwe Schindler
TermVectors are per-document and do not contain payloads. You are reading the per-document TermVectors which is a "small index" *stored* for each document as a binary blob. This blob only contains the terms of this document with its positions/offsets, but no payloads (offsets are used e.g. for h

Reading Payloads

2013-04-23 Thread Carsten Schnober
Hi, I'm trying to extract payloads from an index for specific tokens the following way (inserting sample document number and term): Terms terms = reader.getTermVector(16504, "term"); TokenStream tokenstream = TokenSources.getTokenStream(terms); while (tokenstream.incrementToken()) { OffsetAttrib