Many excellent ideas here, thanks. Answering to brought up concepts in reverse order:
# Depending on Solr Yes I'm indeed very happy of having removed this dependency; not least from a product perspective it forced us to address security issues in its various web/ servlet components: strict qa checks happen on anything we need to ship, but this was totally not relevant with how we expect people to use our framework so just a parasyte for our time which can otherwise be dedicated to more interesting aspects. If some complex machinery from Solr is needed to take full advantage of DisMax even outside of the scope of Solr server, ultimately we should propose patches to Apache Lucene to move these into a more suited lucene-query.jar but we can certainly start playing with it by reimplementing or copying a couple of classes. # Providing DSL support for DisjunctionMaxQuery Yes I agree it's very interesting and not necessarily coupled to MLT: it has its own issue HSEARCH-665 and I didn't mean to suggest MLT requires DisMax, sorry for the confusion. Let's treat HSEARCH-665 indipendently: not a blocker for MLT. Guillaume: sounds like you have solid experience with this feature. If you are still considering the option of coaching an intern on such a subject, consider that the Hibernate project participates in GSOC [1] so we could get a paid for smart student. It's a bit late but we still have time to suggest subjects for this year: if you or anyone else is interested to be a mentor this year, please get in touch with me. I don't think implementing just DisMax support is having enough meat to keep a good student busy for months, but it could be one aspect of a slightly more complex goal. # Bringing MLT home I still suspect that a DisMax approach would provide a better scoring model but this is an implementation detail we should iterate on at a second phase. Essentially taking the example of "albino elephants" I agree on the behaviour you described but I think there are some additional aspects to consider when you're evaluating how a partial match "albino" scores against a full match "albino elephant" in a single field, rather than split up, or how "albino" could score less in field A rather than field B, so even swapping positions of termson different fields could provide a less valuable match. Probably better explained with an example on a larger data set but alas I won't be able to craft one soon.. still it's not a blocker at all as in this first phase I think we should 1) have a working solution 2) focus on API effectiveness. Performance and a sofisticated scoring system will necessarily have to follow: I'm unpacking a large data set to play with, I'm pretty sure we'll have plenty of follow up improvements. Emmanuel: if you can address the TODOs in the pull I'd merge it; if you don't have time for that, could we work on top of your commits? -- Sanne 1 - https://community.jboss.org/wiki/GSOC13Ideas#jive_content_id_Hibernate _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev