Re: [hibernate-dev] Hibernate web search

2008-09-23 Thread Adam Warski

Hello,


Writing such a queryparser is really easy if you consider acceptable
to catch "QueryParseException"s
what I usually do in case of such an exception is report a JSF
validation error with some instructions about
query syntax.
Well, Google doesn't show validation exceptions if you mistype the  
query ;) But you get the results you expected anyway. So your query is  
"fixed", it it contains a syntax error.



The default query parser already supports AND and OR keywords, and by
extending it and overriding a simple method you can disable
the similarity symbol "~" if you want to (don't remember the method
name now but can look it up if you need it); this way the default
QueryParser already support all of google's basic syntax (excluding
the advanced fetatures, leaving out stuff like language and website
keywords...).
Yes, you are right, and I suppose writing a simple version which would  
escape any mistyped charaters shouldn't be very hard.


--
Adam
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate web search

2008-09-23 Thread Adam Warski

Hello,

I like the idea of a parser using the Google syntax (you don't have  
to disable explicit fields BTW - recognzing a term:term syntax  
should be doable). The hard problem to crack is what's behind. I  
explain that in Hibernate Search in Action, a lot of good search  
engine do searches in tiers:

- exact search
- phonetic search
- fuzzy search
- replace ANDs with ORs
In one application, where I indexed relatively short names (but they  
could contain spaces etc), along with each name I also indexed a  
special version, which was one term, with all spaces converted to _  
(so the name "Foo Bar" became "foo_bar". When the user submitted the  
query, I also created a special version of his query in the same way,  
and added a fuzzy search on that - and this gave quite good results.


I guess you can simulate part of it by boosting exact fields as  
opposed to approximation fields in the multi field query parser.  
This was not really possible until recently but the  
SearchFactory.getAnalyzer(MyEntity.class) makes it much easier.
Right :) I think it would require some fine-tuning, but once we have  
the query parsed to "our" representation (or maybe we could reuse the  
parsed tree Lucene produces) we can manipulate it as we want. And the  
output we produce could be configurable (which tiers to include, with  
what boost factors etc.)


We should add the Google like feature to the 3.2 list amongst other  
higher level query enhancement like spell checking.
Who wants to take the lead? I have always considered grammar and  
parser developments awkward for my tastes :)
Well, I could do it, if my boss (Mark Newton) and you agree, unless  
there's somebody from the Hibernate team taking care of it :)


--
Adam

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev