Hoss: It was late this afternooon and I was square-eyed, so I didn't add the detail. The app we're working on first returns a summary list of all the books that match a query, no hit information. Next, the user clicks on a returned title and we show the hits by chapter. That is, a list of chapters and the count of the hits for each. The index is nearing 15G at present, so I *assumed* that I really didn't want to re-query the entire index when I know the particular document I care about already. But what do I know?
Mark: Very most excellent. I'll give it a look in the morning. I hope that the class doesn't need the raw text since I don't have it any more, but your comment "Give it a query it will give you the spans" makes me hopeful. The real issue is that it looks like I'm reverting to my old "C" days. The code I was writing the last couple of days started to look like a program from...well...a long time ago. So I *know* it must be wrong <G>...... It's a real pain in the neck to *think* in Java terms when much of my training was before this new-fangled way of looking at programming problems happened. I suppose I could go into management, but that would be giving in to the dark side.... Thanks all Erick On 1/18/07, Mark Miller <[EMAIL PROTECTED]> wrote:
Just threw together a highlighter that can handle spans (combining a rewrite with dumspans from LIA) and used this: http://issues.apache.org/bugzilla/attachment.cgi?id=15568 Nice spans extractor from Mark (not me <G>). Give it a query it will give you the spans. - Mark Erick Erickson wrote: > Hi again. > > I've been struggling for the last couple of days and getting nowhere, so > it's time to swallow my pride and say "Help".... > > OK, let's say I have a document indexed and I do NOT have access to > the raw > text. I need to find the offset of all the hits for a query on a single > document. Advice was offered a while ago to use getSpans from a > spanquery, > but for the life of me I don't see how to make this work. As I remember, > Erik was talking about rewriting the original query as a set of spans. > > The trouble I'm having is that I sure don't see how to rewrite the > standard > query as a span query, then feed that back into my index for a particular > document (that I have a unique ID for). It seems that the getSpans looks > through my entire index, which is *probably* prohibitive. > > I can make each part of the query into a SpanTermQuery. I can assemble > these > together into a bunch of nested span queries. At the end of this, I > have a > single Span query that I can call getSpans on. But what now? I don't > see how > the spans relate to the document I need to focus on. From what I see > of the > Spans interface, it's intended to look at the entire index rather than be > confined to a subset of the documents (in this case, exactly one. > Guaranteed). > > I've thought about putting the documentID in a MUST clause of a > BooleanQuery, and adding my span query to that, but it doesn't look like > getSpans does me any good there. > > I looked at the SrndQuery family and don't see anything there that > lets me > get the offsets of my matches. > > I don't have the text, so I can't highlight all the hits and count. > > The code I've been writing feels like the wrong solution to the wrong > problem at the wrong time. Given that I know the document ID on the > way in, > is my best bet to roll my own? That is, enumerate the relevant terms > in my > document and measure the distance between the terms and aggregate the > results myself? I'd rather not do that, of course, but can if necessary. > > I *want* someone to say "just call <fill in magic method here>".... > > Any help greatly appreciated... > > Thanks > Erick > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]