On Mon, Jun 21, 2010 at 1:37 PM, Henrik Genssen <henrik.gens...@miadi.net> wrote: > Hi, > > I am trying to optimize the search of users, as google does. > So I thought having a 2. column in DB with a "normalized" version of the > string. > By normalize I mean convert ä to ae etc. > > Someone done this before? > > On my way searching for a solution, I found this lib for a "did you mean" > search: > http://norvig.com/spell-correct.html > Can we integrate this into ORM? and build the "big.txt" from the DB in the DB > instead of disk? > Or is it too much overhead? > > I know, haystack can do this for you, but brings in a lot of overhead (like > the searchengines) to you, > along with a lot of features not all people may need. > > Any one there with an other idea? > > regards > > Henrik > >
This normalization is called stubbing or tokenization in search terminology, and is quite an in-depth topic. If you take a look at the number of tokenizers available in the stock version of solr[1], you'll see why. You rule this out as an option, but you should be looking at the overhead of haystack/solr. The interface to google is simple, and this makes people think that the backend is easy too; it's not. Either plan on spending a lot of time researching search technologies and developing your own solution, or settle for the overhead. Cheers Tom [1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.