Re: Using Lucene with a rather simplistic scoring system?

Ian Lea Fri, 11 Jun 2010 06:52:42 -0700

Others can comment on how to customize scoring, but I wonder if
lucene's default scoring might do the job as is.


If you've got a document in the index (simple translation from your JSON)

class: my.ExampleClass
extends: the.SuperClass
overrides: the.SuperClass.method1() the.SuperClass.method2()
used types: a.Type1 a.Type2
used methods: a.Type1.method32() a.Type1.method23()

then sample queries

"extends: the.SuperClass"
"extends: the.SuperClass overrides: the.SuperClass.method1()"

would both match, but the second one should score higher because it
matches more terms.  The weighting could be done by boosts e.g. if you
care more about overrides

"extends: the.SuperClass overrides: the.SuperClass.method1()^2"


Whatever you do you'll need to play with analyzers if need to keep the
dots and brackets and case-sensitivity.  And will need to make sure
you've got the right must/should/and/or logic in place.

Good luck.  Sounds like an interesting project.


--
Ian.

On Fri, Jun 11, 2010 at 2:35 PM, Marcel Bruch <br...@cs.tu-darmstadt.de> wrote:
> Hi!
>
> We are working on an experimental code-search engine that helps users to
> find example code snippets based on what a developer already typed inside
> her editor. Our “homemade search engine” produces some cool results but its
> performance is somehow limited :-) Thus, we are evaluating whether Lucene
> can solve our performance issues. However, we are not familiar with Lucene
> and I wonder if some of you could help me to learn whether Lucene fits our
> problem well. Thanks in advance for your comments.
>
> The situation is as follows. For each source code file we extract some code
> properties like which types are used inside the code, which methods are
> overridden or which methods are called inside a method body etc. For each
> source code file we get a JSON structure similar to this:
> {
>     “class” : my.ExampleClass
>     “extends” : the.SuperClass
>     “overrides” :
>         - the.SuperClass.method1()
>         - the.SuperClass.method2()
>     “used types”:
>         - a.Type1
>         - a.Type2
>         -   ...
>     “used methods”:
>         - a.Type1.method32()
>         - a.Type1.method23()
>         - ...
> <few more things>
> }
> The scoring function we use is rather simplistic. Given a query (which looks
> somehow identical to the document above) we determine for each feature (i.e.
> “used methods”, “used types”, “overrides” etc.) a simple matching strategy:
> the percentage of overlap between each query-document feature and
> db-document feature. Then we simply multiply each feature-score f_i with an
> individual feature-weight w_i and sum it all up into one overall score.
>
> My questions are: Is it meaningful to use Lucene here in this setup- or put
> different - can I implement that scoring scheme with Lucene easily?  How
> would such a solution look like? By just subclassing Scorer?
>
> Many thanks in advance for advice
>
> All the best,
> Marcel
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using Lucene with a rather simplistic scoring system?

Reply via email to