Re: Parsing a complex query with many parentheses, ORs, ANDs, and (hopefully) NOTs

Malcolm Tredinnick Fri, 21 Nov 2008 20:46:54 -0800


On Fri, 2008-11-21 at 22:37 -0500, Michael Keselman wrote:
> Hello,
> 
> http://pastie.org/private/xvqf5tgjnimiolakhawgg (relevant code)
> 
> Django's QuerySets allow you to do ORs (unions) pretty nicely with  
> Model.manager.filter(Q(...) | Q(...)). However, ANDs don't work quite  
> as nicely in my situation because I want AND to do what  
> Model.manager.filter(Q(...)).filter(Q(...)) would do, but  
> Model.manager.filter(Q(...) & Q(...)) does something very different.  
> It does not compare fields with other rows in the table but rather  
> with the same row against itself. For example, I have a Tag model with  
> a CharField I call 'name'. If I were to do  
> Tag.objects.filter(Q(name='text') & Q(name='password')), it would  
> compare each tag to check if the tag's name equals "text" AND that the  
> same tag's name equals "password".


Yeah, this is the edge-case that isn't handled out of the box. On the
(likely) probability that it's much less common than the alternative
behaviour, we decided to go with the current approach. You use-case
isn't invalid, but it's never the goal to support every possible
use-case.

Speaking as an implementor, I'll also note that it's the really, really
hard case to make it work efficiently in all cases (particularly with
highly nested trees of ANDs and ORs), so it falls close to, or possibly
over the line marked "out of scope for the ORM; custom SQL is your
friend".

I can think of four(!) possible approaches for your problem. In rough
order from least intrusive to the one requiring the most changes (and
roughly least efficient to most efficient), we have the following. I
haven't written code for any of these, so I'm using English rather than
Python. But my intuition is that they should all work.

(1) After you have parsed the query, rewrite it in conjunctive normal
form (CNF). That's a fairly routine transformation. Then each clause in
the CNF can be written as Q(...)|Q(...)|... and can be put in a separate
call to filter() (so you end up with one filter() call per clause in the
CNF). Guaranteed to give the right answer and requires no understanding
of Django's internals. The downside is that more comparisons than are
possibly necessary will be made. Still, in practice, the difference
possibly won't be noticed on a well-index table.

(2) Write your own variation on Q() -- let's call it Q1 here, for
brevity. This requires understanding what's going on when you call
filter() a bit, but since you are looking at a case that isn't normally
handled, you need to roll up your sleeves and dive in. It's not that
bad, really.

Basically, filter() wraps up its parameters in a Q() object and then
calls Query.add_q(). So calling Queryset.filter() is equivalent to an
outer call to Query.add_q(). The way a single call to add_q() knows that
it can (and should) reuse certain aliases is the used_aliases parameter
passed into add_q(). So you could force a particular alias not to be
reused if you adjusted used_aliases. Notice that used_aliases is also
passed to the add_to_query() method of any Q-like object you might pass
in. 

In practice, that means you write a class Q1 that subclasses Q and which
has an add_to_query() method. In your code that constructs the filter
call, you use Q1 when you are creating a conjunction of terms, rather
than Q -- for example, Q1(Q(name="a") & Q(name="b")). Your
Q1.add_to_query() method will then generally call Q.add_q(), except that
after each return it will remove any newly added aliases from
used_aliases so that they aren't reused inside that Q1. You can do this
by deep copying used_aliases before and restoring it afterwards. Just
before finally returning from your add_to_query() method, you might want
to add back all the used aliases so that any outer calls can reuse them.
It's this nested removing and readding of aliases that makes this a hard
problem (I would strongly suggest experimenting with something like (A &
(B|(C&D)) to make sure things work as expected).

(3) You could write your own equivalent to add_q() and call that for
your case. Since filter() calls add_q(), subclass QuerySet and add your
own disjoint_filter() method, say, that calls your own add_q() method --
which need not even be a separate method if you don't want to subclass
Query as well and don't mind calling setup_filter() directly. In your
own add_q() version, you might choose to not update used_aliases for
some or all situations. I think not updating used_aliases at all will
lead to very inefficient queries -- lots of unnecessary tables involved
-- so this could well reduce to option (2), above, in which case I'd go
with option (2). But have a think about it and see.

(4) Since it looks like you're doing the equivalent of text searching
with boolean operators, my fourth solution would be to use a text
searching tool. Solr or Sphinx or something like that which indexes the
tag field in question. The reason for suggesting this option is that
most of those search engines have built-in support for boolean logic
combinators. It would require custom SQL to do the searching (or you
could look at the djangosearch project), but might well save a lot of
effort.

Maybe that will give you some ideas.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Parsing a complex query with many parentheses, ORs, ANDs, and (hopefully) NOTs

Reply via email to