Others can comment on how to customize scoring, but I wonder if lucene's default scoring might do the job as is.
If you've got a document in the index (simple translation from your JSON) class: my.ExampleClass extends: the.SuperClass overrides: the.SuperClass.method1() the.SuperClass.method2() used types: a.Type1 a.Type2 used methods: a.Type1.method32() a.Type1.method23() then sample queries "extends: the.SuperClass" "extends: the.SuperClass overrides: the.SuperClass.method1()" would both match, but the second one should score higher because it matches more terms. The weighting could be done by boosts e.g. if you care more about overrides "extends: the.SuperClass overrides: the.SuperClass.method1()^2" Whatever you do you'll need to play with analyzers if need to keep the dots and brackets and case-sensitivity. And will need to make sure you've got the right must/should/and/or logic in place. Good luck. Sounds like an interesting project. -- Ian. On Fri, Jun 11, 2010 at 2:35 PM, Marcel Bruch <br...@cs.tu-darmstadt.de> wrote: > Hi! > > We are working on an experimental code-search engine that helps users to > find example code snippets based on what a developer already typed inside > her editor. Our “homemade search engine” produces some cool results but its > performance is somehow limited :-) Thus, we are evaluating whether Lucene > can solve our performance issues. However, we are not familiar with Lucene > and I wonder if some of you could help me to learn whether Lucene fits our > problem well. Thanks in advance for your comments. > > The situation is as follows. For each source code file we extract some code > properties like which types are used inside the code, which methods are > overridden or which methods are called inside a method body etc. For each > source code file we get a JSON structure similar to this: > { > “class” : my.ExampleClass > “extends” : the.SuperClass > “overrides” : > - the.SuperClass.method1() > - the.SuperClass.method2() > “used types”: > - a.Type1 > - a.Type2 > - ... > “used methods”: > - a.Type1.method32() > - a.Type1.method23() > - ... > <few more things> > } > The scoring function we use is rather simplistic. Given a query (which looks > somehow identical to the document above) we determine for each feature (i.e. > “used methods”, “used types”, “overrides” etc.) a simple matching strategy: > the percentage of overlap between each query-document feature and > db-document feature. Then we simply multiply each feature-score f_i with an > individual feature-weight w_i and sum it all up into one overall score. > > My questions are: Is it meaningful to use Lucene here in this setup- or put > different - can I implement that scoring scheme with Lucene easily? How > would such a solution look like? By just subclassing Scorer? > > Many thanks in advance for advice > > All the best, > Marcel > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org