Hi!
We are working on an experimental code-search engine that helps users to
find example code snippets based on what a developer already typed
inside her editor. Our "homemade search engine" produces some cool
results but its performance is somehow limited :-) Thus, we are
evaluating whether Lucene can solve our performance issues. However, we
are not familiar with Lucene and I wonder if some of you could help me
to learn whether Lucene fits our problem well. Thanks in advance for
your comments.
The situation is as follows. For each source code file we extract some
code properties like which types are used inside the code, which methods
are overridden or which methods are called inside a method body etc. For
each source code file we get a JSON structure similar to this:
{
"class" : my.ExampleClass
"extends" : the.SuperClass
"overrides" :
- the.SuperClass.method1()
- the.SuperClass.method2()
"used types":
- a.Type1
- a.Type2
- ...
"used methods":
- a.Type1.method32()
- a.Type1.method23()
- ...
<few more things>
}
The scoring function we use is rather simplistic. Given a query (which
looks somehow identical to the document above) we determine for each
feature (i.e. "used methods", "used types", "overrides" etc.) a simple
matching strategy: the percentage of overlap between each query-document
feature and db-document feature. Then we simply multiply each
feature-score f_i with an individual feature-weight w_i and sum it all
up into one overall score.
My questions are: Is it meaningful to use Lucene here in this setup- or
put different - can I implement that scoring scheme with Lucene easily?
How would such a solution look like? By just subclassing Scorer?
Many thanks in advance for advice
All the best,
Marcel
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org