On Mon, Jun 21, 2010 at 1:37 PM, Henrik Genssen
<henrik.gens...@miadi.net> wrote:
> Hi,
>
> I am trying to optimize the search of users, as google does.
> So I thought having a 2. column in DB with a "normalized" version of the 
> string.
> By normalize I mean convert ä to ae etc.
>
> Someone done this before?
>
> On my way searching for a solution, I found this lib for a "did you mean" 
> search:
> http://norvig.com/spell-correct.html
> Can we integrate this into ORM? and build the "big.txt" from the DB in the DB 
> instead of disk?
> Or is it too much overhead?
>
> I know, haystack can do this for you, but  brings in a lot of overhead (like 
> the searchengines) to you,
> along with a lot of features not all people may need.
>
> Any one there with an other idea?
>
> regards
>
> Henrik
>
>

This normalization is called stubbing or tokenization in search
terminology, and is quite an in-depth topic. If you take a look at the
number of tokenizers available in the stock version of solr[1], you'll
see why.

You rule this out as an option, but you should be looking at the
overhead of haystack/solr. The interface to google is simple, and this
makes people think that the backend is easy too; it's not. Either plan
on spending a lot of time researching search technologies and
developing your own solution, or settle for the overhead.

Cheers

Tom

[1] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to