On Apr 23, 10:48 am, Lukas Zilka <lu...@zilk4.eu> wrote:
> Hey Russ,
>
> thanks for your reply. I looked at the bulk insertion problem, and
> that made me think that probably using the variables themselves is the
> problem. It does not really make sense to me why anybody bothers to
> use variables in a long SQL query for the IN clause, let alone a bulk
> insertion? Does that benefit anything?
>
> Anyway, I conducted an experiment and tried a simple SQL query with
> lots of items in the IN clause to prove that its the variables
> themselves, not the limitation of IN clause in SQLite, is whats
> causing the problem. I have been successfully able to execute a query
> as "SELECT * FROM projects WHERE id IN (...)" where on the place of
> `...` there were 15,000 numbers. So SQLite has no problem with lot of
> stuff as an argument of the IN operator. Now, the only limitation is
> the length of the SQL query itself. According to the SQLite
> documentation it is 1megabyte, and that suffices for 100k+ elements.
> With that big query, you are right, it would be very impractical and
> slow for the end users to interact with the application (sending 1M of
> data to the webserver will probably be very unresponsive), so I think
> this is a fair limitation that should never be exceeded for this use.
>
> In Django code it seems that it would suffice to make a change in the
> file 'db/models/sql/where.py'. Particularly, condition on the number
> of elements of IN, and, if it is over let's say 100 of them, put them
> into the SQL query directly as a string (e.g. '1,2,3') - not as
> variables('?, ?, ?').
>
> Though, for the future, I still believe there should be a better
> solution than to rely on not reaching this limit. I would propose that
> a temporary table should be created, filled up with the right
> arguments of IN operator, and the query rewritten as a JOIN on the
> original left argument of IN (or for simplicity, and perhaps worse
> performance, a nested query) and this temporary table. That of course
> only if the number of elements on the right side of IN is more than
> some number. But this is for another discussion.
>
> My question is therefore: Will the change in Django code that I
> propose have any bad consequences or do you think it might actually
> work satisfactorily?

First the "write the query as "1, 2, 3", not as "%s, %s, %s", (1, 2,
3). The problem is SQL injection. You could do that for integer
parameters easily, but on the whole it is not a nice way.

The create temp table + join seems hard. But you could do "exists"
query instead. A query like "select * from tbl where id in (a list)"
could be rewritten to "select * from tbl where exists (select 1 from
temp_table where temp_table.id = tbl.id)". This could be handled in
sql/where.py somewhat easily.

For me this issue isn't that important. I don't use SQLite except for
testing. If somebody wants to work on this issue, I must warn that it
is possible (if not likely) that some core developer will say "too
ugly" to this solution. I might be willing to accept the solution if
it was clean enough, as this would nicely abstract away this limit of
SQLite.

So, in short: this idea is definitely worth more investigation.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to