Hi all! Just before diving in the core Lucene code, I would like to ask once again if there are detailed tutorials on SpanQuery execution algorithm, with postings retrieval and positional data matching.
Best Regards, Igor 03.06.13, 21:15, "Igor Shalyminov" <ishalymi...@yandex-team.ru>": > > Hello! > > I've implemented a SpanQuery class that acts like SpanPositionCheckQuery but > also matches payloads. > For example, here is the "gram" field in a single indexed document: > > "gram": N|1|1 sg|1|0 A|2|0 pl|2|0 A|3|0 sg|3|0 > > Every token's meaning is as follows: > N - grammatical annotation | 1 - parse number (payload) | 1 - position > increment > > So, the document has a single word position which has 3 ambiguous parses, #1 > and #2, and #3. Each parse has 2 annotations, "N, sg", "A, pl", and "A, sg". > And my SpanQuery is supposed not to match annotations from different parses, > e.g. "sg & pl" should not be matched, but "N & sg" should be. > > The logic is: > > @Override > protected AcceptStatus acceptPosition(Spans spans) throws IOException { > boolean result = spans.isPayloadAvailable(); > if (result == true) { > Collection<byte[]> payloads = spans.getPayload(); > int first_payload = PayloadHelper.decodeInt(payloads.iterator().next(), > 0); > for (byte[] payload: payloads) { > int decoded_payload = PayloadHelper.decodeInt(payload, 0); > if(decoded_payload != first_payload) { > return AcceptStatus.NO; > } > } > } > return AcceptStatus.YES; > } > > Then, for the query "sg & pl", which is a wrapped unordered SpanNearQuery: > ParseMatchingSpanQuery(SpanNearQuery("gram:sg", "gram:pl", false, -1)) - > acceptPosition is called the first time with payloads array containing ['1', > '2'], and second time - with just a ['3']. The second match actually matches, > and it's totally unintuitive to me. > To my understanding, it should be called with pairs of spans, ideally ['1', > '2'], ['1', '3']. Why does it not?:) > Could you please explain to me the logic of matching with payload checking? > > -- > Best Begards, > Igor > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org