On Apr 23, 10:48 am, Lukas Zilka <lu...@zilk4.eu> wrote: > Hey Russ, > > thanks for your reply. I looked at the bulk insertion problem, and > that made me think that probably using the variables themselves is the > problem. It does not really make sense to me why anybody bothers to > use variables in a long SQL query for the IN clause, let alone a bulk > insertion? Does that benefit anything? > > Anyway, I conducted an experiment and tried a simple SQL query with > lots of items in the IN clause to prove that its the variables > themselves, not the limitation of IN clause in SQLite, is whats > causing the problem. I have been successfully able to execute a query > as "SELECT * FROM projects WHERE id IN (...)" where on the place of > `...` there were 15,000 numbers. So SQLite has no problem with lot of > stuff as an argument of the IN operator. Now, the only limitation is > the length of the SQL query itself. According to the SQLite > documentation it is 1megabyte, and that suffices for 100k+ elements. > With that big query, you are right, it would be very impractical and > slow for the end users to interact with the application (sending 1M of > data to the webserver will probably be very unresponsive), so I think > this is a fair limitation that should never be exceeded for this use. > > In Django code it seems that it would suffice to make a change in the > file 'db/models/sql/where.py'. Particularly, condition on the number > of elements of IN, and, if it is over let's say 100 of them, put them > into the SQL query directly as a string (e.g. '1,2,3') - not as > variables('?, ?, ?'). > > Though, for the future, I still believe there should be a better > solution than to rely on not reaching this limit. I would propose that > a temporary table should be created, filled up with the right > arguments of IN operator, and the query rewritten as a JOIN on the > original left argument of IN (or for simplicity, and perhaps worse > performance, a nested query) and this temporary table. That of course > only if the number of elements on the right side of IN is more than > some number. But this is for another discussion. > > My question is therefore: Will the change in Django code that I > propose have any bad consequences or do you think it might actually > work satisfactorily?
First the "write the query as "1, 2, 3", not as "%s, %s, %s", (1, 2, 3). The problem is SQL injection. You could do that for integer parameters easily, but on the whole it is not a nice way. The create temp table + join seems hard. But you could do "exists" query instead. A query like "select * from tbl where id in (a list)" could be rewritten to "select * from tbl where exists (select 1 from temp_table where temp_table.id = tbl.id)". This could be handled in sql/where.py somewhat easily. For me this issue isn't that important. I don't use SQLite except for testing. If somebody wants to work on this issue, I must warn that it is possible (if not likely) that some core developer will say "too ugly" to this solution. I might be willing to accept the solution if it was clean enough, as this would nicely abstract away this limit of SQLite. So, in short: this idea is definitely worth more investigation. - Anssi -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.