So,... On Monday 17 June 2013 16:02:05 Shai Berger wrote: > On Thursday 16 May 2013 03:51:05 Shai Berger wrote: > > Ticket #20414[0] and PR #1071[1]. > > Over a month, and no comment from anyone on these; I take it as sign that > nobody has time for it. Unless someone wakes up quite quickly to oppose, > I'm going to commit essentially the PR [...]
Nobody opposed. Tim Graham reviewed the PR and made some helpful comments -- thanks, Tim. But before committing, I wanted to run again some benchmarks, to see how the recent changes in query processing affect these performance enhancements. I found what Anssi had found, way back when[2]: Switching to outputtypehandler makes almost no difference for ints, makes float columns (FloatFields) about 2.5 times faster, and makes decimal columns (DecimalFields and raw expressions) about 2 times slower. I found that baffling, and investigated. Some background: Oracle number columns support 38 significant digits, which is significantly more than supported by Python floats (=C doubles). cx_Oracle supports decimal.Decimal as an input type, but not as an output type (as far as I could see) -- so to get accurate results when selecting such columns, one must read these numbers as strings and convert them to numbers later. There are two ways to do this -- either use an outputtypehandler, which lets us define how we want to process each column; or use numbersAsStrings, which tells cx_Oracle to return _all_ numbers as strings, to be processed later in Python. The part where ints do not improve is explained quite easily: cx_Oracle returns ints as (small, C) ints only if it knows they will fit -- which means, if they have less than 10 digits. IntegerFields (incl. AutoField, OneToOneField, and, surprisingly, even SmallIntegerField) are all NUMBER(11), and INTEGER is actually NUMBER(38); and so they get converted to a python long int. Python makes this transparent, so nobody notices, but the conversion through string happens whether numbersAsString is used or not. If we had chosen NUMBER(9), some C ints would not fit in, but everything would be a lot faster. But that is mostly water under the bridge now[3]. The part where decimals become slower -- I'm still baffled by that. Anssi claimed it was due to calling a function for each value separately (with numbersAsStrings, a function is only called per row). But this makes little sense -- we see the slow-down in a benchmark with one-column rows. I tried to get the best of both world by only using numbersAsStrings when NUMBER() columns are present -- but that doesn't work; the setting needs to be made before the execute() call in order to take effect, and at that time we don't know what columns we are expecting. Using an outputtypehandler opens the door to some user-initiated optimizations; one thing I noticed is that for large queries (returning thousands of rows), performance improves when using a larger cursor arraysize (the default is 100). This is not the typical query for a web application, but the Django ORM is used in batch processing as well. As I said, I prepared a new PR, 1279[4]; the options, as I see them, are: a) Make this change, sacrificing decimals for floats b) Reject the change, sacrificing floats for decimals c) Somehow let users choose (either globally or for specific queries) -- but then we still need to choose the default Your advice is appreciated, Shai. [0] https://code.djangoproject.com/ticket/20414 [1] https://github.com/django/django/pull/1071 [2] https://groups.google.com/d/topic/django-developers/4BNkJyGez9A/discussion [3] This optimization opportunity is no longer even available with Python 3, at least with cx_Oracle 5.1.2 [4] https://github.com/django/django/pull/1279 -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. For more options, visit https://groups.google.com/groups/opt_out.
