On 11 juil, 15:57, Michel30 <forerunn...@gmail.com> wrote:
> Hi all,
>
> I have a basic search function that uses Q objects.
> After profiling it I found that the actual (mysql) database query
> finishes in fractions of seconds but the iterating after this can take
> up to 50 seconds per 10.000 results.
>
> I have been trying to speed it up but I have had not much results..
>
> My query is this one:
>
>        found_entries = model.objects.filter((Q-objects),
> obsolete=0).order_by('-version','docid')
>
> So far so good, but then I need a dictionary to retrieve only unique
> 'documentid's'.
>
>     rev_dict = {}
>
> This is the part that hurts:
>
>     for d in found_entries:
>         rev_dict[d.documentid] = d



> And then some more sorting and filtering:
>
>     filtered_entries = rev_dict.values()
>     filtered_entries.sort(key=lambda d: d.revisiondate, reverse=True)
>
> Does anyone have some better ideas to achieve this?

Ok, so what you want is a queryset of "model" with distinct docid
having the highest version number, sorted by revisiondate ? The
cleanest solution would be to first write the appropriate SQL query,
then find out how to express it using django. FWIW, the raw SQL query
might look like this (MySQL):

mysql> select f1.pk, f1.docid, f1.version, f1.revisiondate from foo f1
where f1.version=(select max(f2.version) from foo f2 where
f2.docid=f1.docid) order by revisiondate desc;
+----+-------+---------+--------------+
| pk | docid | version | revisiondate |
+----+-------+---------+--------------+
|  7 |     3 |       2 | 2000-02-12   |
|  6 |     1 |       2 | 2000-02-11   |
|  5 |     2 |       3 | 2000-02-10   |
+----+-------+---------+--------------+
3 rows in set (0.00 sec)

NB: table definition and date being:

mysql> explain foo;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| pk           | int(11) | NO   | PRI | NULL    | auto_increment |
| docid        | int(11) | NO   |     | NULL    |                |
| version      | int(11) | NO   |     | NULL    |                |
| revisiondate | date    | NO   |     | NULL    |                |
+--------------+---------+------+-----+---------+----------------+

mysql> select * from foo;
+----+-------+---------+--------------+
| pk | docid | version | revisiondate |
+----+-------+---------+--------------+
|  1 |     1 |       1 | 2000-01-01   |
|  2 |     2 |       1 | 2000-01-02   |
|  3 |     3 |       1 | 2000-01-02   |
|  4 |     2 |       2 | 2000-02-10   |
|  5 |     2 |       3 | 2000-02-10   |
|  6 |     1 |       2 | 2000-02-11   |
|  7 |     3 |       2 | 2000-02-12   |
+----+-------+---------+--------------+
7 rows in set (0.00 sec)


NB: Adding indexes on docid and/or version might help (or damage)
performances, depending on your data set.

Now you just have to learn how to use django's orm F, max and extra.
Or, for a simpler solution (but less portable) solution, use a raw SQL
query to only retrieve distinct pk then a second query to build the
whole queryset, ie:

from django.db import connection

def get_latest_revisions():
    cursor = connection.cursor()
    cursor.execute("""
        SELECT DISTINCT f1.pk  FROM foo as f1
        WHERE f1.version=(SELECT max(f2.version) FROM foo f2 WHERE
f2.docid=f1.docid)
        """)
    ids = [row[0] for row in cursor] # or is it "row['pk']" ???
    cursor.close()
    return model.objects.filter(pk__in=ids).order_by("-revision_date")

This might possibly be done using
manager.raw(your_sql_here).values_list("pk", flat=True) to build the
pk list, but I have never used RawQuerySets so far and the doc doesn't
tell if RawQuerySet supports values_list so you'll have to try by
yourself.

HTH

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to