Re: Most efficient and scalable search strategy

Avraham Serour Wed, 24 May 2017 07:54:06 -0700

take a look at the postgres fulltext search capabilities, I can't make any
promises for your old postgres version


another common approach is to index the search into elasticsearch and
querying elasticsearch for the searches, it should handles misspellings and
return the search results fast, but you may need to have a cluster of them
with at least a couple of machines. I worked on a system once with single
digits millions of entries with 6 machines (VM) running elasticsearch in a
cluster, search results weren't fast but was reasonable. Once we had to
re-index everything and it took a couple of days. We used django haystack,
you should take a look at that. The problem with this approach is that you
basically have a hook on the django model which copies some data over
elasticsearch everytime you save()


On Tue, May 23, 2017 at 12:14 PM, Mike Dewhirst <mi...@dewhirst.com.au>
wrote:

> Django 1.10, Postgres 9.1 (for the moment), Python 2.7 (for the moment) on
> Ubuntu 16.04
>
> The search I need is an intersection hit.
>
> We have a substance model with fields for ...
>
> - name
> - molecular formula with subscripted unicode numbers
> - CAS number
>
> plus a Synonym model in 1:n with substance and there might typically be
> half a dozen synonyms per substance.
>
> We also have a separateReference_Ozone table containing all the ozone
> depleting substances listed in the Montreal Protocol imported from a csv
> file. Fields include ...
>
> - abbreviation,
> - molecular formula with subscripted unicode numbers,
> - chemical isomer name
> - common chemical name
>
> If any of those Substance fields matches any of those Reference_Ozone
> field we have a hit and that substance is subject to the Montreal Protocol.
>
> In addition, we have to incude ingredient substances and their synonyms if
> our substance is a mixture.
>
> I have looked at SearchVector and thought about SearchVectorField. I have
> also looked at the Postgres docs regarding triggers and decided against
> that path. If we do it it will have to be done on save() in the app.
>
> Since we import the Montreal Protocol substances and there are fewer than
> 400 of them it occurs to me that during import I could make a single
> (large) field which contains the contents of those four other fields. The
> data is almost static for my purposes. There might be new substances added
> but only rarely. It is a read-only table.
>
> I could do the same for substances but the idea is that they will scale
> beyond millions. I'd prefer not to do  a similar dedicated search field for
> them because it would need updating on every save.
>
> We might also have to deal with English/US spelling issues such as 'ph'
> versus 'f'. I haven't examined the content in detail yet.
>
> So the question is, how to quickly/efficiently detect that a substance is
> subject to the Montreal Protocol?
>
> Thanks for any pointers
>
> Mike
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/django-users/065647f9-7347-7d75-d817-83ea795006ed%40dewhirst.com.au.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAFWa6tJu9wH5ngszDseFgVWOQwYNi52uOmZa%2Bk1Q8OBeR3eYzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Most efficient and scalable search strategy

Reply via email to