Re: SpanQuery parser? Update (ugly hack inside...)

Sean O'Connor Fri, 04 Nov 2005 15:31:33 -0800

I'm posting this primarily hoping to give back a tiny bit to a veryhelpful community. More likely however, someone else will open my eyesto an easier approach than what I outline below...

I've come up with a very ugly conversion approach from regular Queryobjects into SpanQuery objects. I then use the converted SpanQuery toget span positions (currently both token #, and start/end position). Ineffect, I have highlighting for simple queries with a very inefficientapproach (yea for me!).

The goal(s) I am trying to accomplish is rather specific I think, so Iimagine the use of my hacking is rather limited (i.e. just to me).


At the moment my code:

   * parses the search text (i.e. user entered query)
   * rewrites the resulting query to expand wildcards and such against
     index
   * calls a recursive conversion function with very basic conversion
     understanding
         o TermQuery -> SpanTerm
         o PhraseQuery -> SpanNear
         o others in progress as time permits

Currently, I only process simple query strings like:
"blue green yellow" => SpanOrQuery
"luce* acti*" => SpanOrQuery with wild cards expanded

e.g.: lucene lucent action acting ... all or'ed together in abraindead fashion"luce* acti* \"book rocks\"" => SpanOrQuery combining SpanTerms andSpanNear (no slop)er, hopefully you get the picture, I'm not up to showing a vector ofthis one... :-)

I would be happy to discuss my approach if there is anyone interested. Iassume I am pretty much alone in finding this ineffecient approachuseful. For me, it is the functionality that overrides perfomanceissues. I have something which can take user search strings and do hithighlighting for the exact hit found. This is really only useful for"termA near 'some phrase'" at the moment, but might become more advancedin the next 2-3 months.


Sean


Paul Elschot wrote:

On Thursday 20 October 2005 00:40, Sean O'Connor wrote:
Hello,
I have user entered search commands which I want to convert toSpanQueries. I have seen in the book "Lucene in Action" that no parserexisted at time of publication, but there was someone working on aSpanQuery parser. Can anyone point me to that code, or provide anysuggestions?
I want to use SpanQueries for their detail on the number of hitsfrom a query, and more importantly, the location (position start andend) of each hit. My application requires me to do precise hithighlighting. I also need to perform calculations on the number of hitsper document, as well as per query (sum of document hits).
You may want to use the getSpans() method of SpanQuery and operate
on the result directly.
It is fairly critical I highlight the hits, and only the hits. Fromwhat I've read SpanQueries (with dumpSpans) is a better approach thanusing 'regular' queries. I _think_ regular queries currently use ahighlighter which shows all terms highlighted. This can give morehighlighting than actual hits (i.e false positives).
So, that being said, should I stick with SpanQueries? Is there anycurrent work on a parser to convert a string, or regular (Token,Boolean, Phrase, Prefix,...) query into a SpanQuery?
I have written some very duct tape-ish code which will convert basicbooleanOR and prefix queries into SpanQueries. I just realized I'm indeeper water than I expected when I tried converting my first querystring containing several boolean queries, AND a phrase query. So now Iam looking to either help an existing effort, or just continue with myown hacking.
:)

Have a look at the surround query parser in the svn trunk:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/

There is also some code that does highlighting based on Spans,
but I don't know where that is. Hopefully someone else can point you at that.

Regards,
Paul Elschot



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SpanQuery parser? Update (ugly hack inside...)

Reply via email to