I'm posting this primarily hoping to give back a tiny bit to a very helpful community. More likely however, someone else will open my eyes to an easier approach than what I outline below...

I've come up with a very ugly conversion approach from regular Query objects into SpanQuery objects. I then use the converted SpanQuery to get span positions (currently both token #, and start/end position). In effect, I have highlighting for simple queries with a very inefficient approach (yea for me!).

The goal(s) I am trying to accomplish is rather specific I think, so I imagine the use of my hacking is rather limited (i.e. just to me).

At the moment my code:

   * parses the search text (i.e. user entered query)
   * rewrites the resulting query to expand wildcards and such against
     index
   * calls a recursive conversion function with very basic conversion
     understanding
         o TermQuery -> SpanTerm
         o PhraseQuery -> SpanNear
         o others in progress as time permits

Currently, I only process simple query strings like:
"blue green yellow" => SpanOrQuery
"luce* acti*" => SpanOrQuery with wild cards expanded
e.g.: lucene lucent action acting ... all or'ed together in a braindead fashion "luce* acti* \"book rocks\"" => SpanOrQuery combining SpanTerms and SpanNear (no slop) er, hopefully you get the picture, I'm not up to showing a vector of this one... :-)

I would be happy to discuss my approach if there is anyone interested. I assume I am pretty much alone in finding this ineffecient approach useful. For me, it is the functionality that overrides perfomance issues. I have something which can take user search strings and do hit highlighting for the exact hit found. This is really only useful for "termA near 'some phrase'" at the moment, but might become more advanced in the next 2-3 months.

Sean


Paul Elschot wrote:

On Thursday 20 October 2005 00:40, Sean O'Connor wrote:
Hello,
I have user entered search commands which I want to convert to SpanQueries. I have seen in the book "Lucene in Action" that no parser existed at time of publication, but there was someone working on a SpanQuery parser. Can anyone point me to that code, or provide any suggestions?

I want to use SpanQueries for their detail on the number of hits from a query, and more importantly, the location (position start and end) of each hit. My application requires me to do precise hit highlighting. I also need to perform calculations on the number of hits per document, as well as per query (sum of document hits).

You may want to use the getSpans() method of SpanQuery and operate
on the result directly.

It is fairly critical I highlight the hits, and only the hits. From what I've read SpanQueries (with dumpSpans) is a better approach than using 'regular' queries. I _think_ regular queries currently use a highlighter which shows all terms highlighted. This can give more highlighting than actual hits (i.e false positives).

So, that being said, should I stick with SpanQueries? Is there any current work on a parser to convert a string, or regular (Token, Boolean, Phrase, Prefix,...) query into a SpanQuery?

I have written some very duct tape-ish code which will convert basic booleanOR and prefix queries into SpanQueries. I just realized I'm in deeper water than I expected when I tried converting my first query string containing several boolean queries, AND a phrase query. So now I am looking to either help an existing effort, or just continue with my own hacking.

:)

Have a look at the surround query parser in the svn trunk:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/

There is also some code that does highlighting based on Spans,
but I don't know where that is. Hopefully someone else can point you at that.

Regards,
Paul Elschot



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to