On Thu, Dec 10, 2020 at 8:03 PM Chris Angelico <[email protected]> wrote:
> > Here is an example of how I use it to build an arbitrary long SQL
> request without having to pay for long intermediate strings, both in
> computation on memory.
>
<snip>
> > I often had such cases, but ended up using the more costy str.join .
>
> Is it really more costly? With strings the size of SQL queries
> (keeping in mind that these strings (correctly) contain no actual
> data, just placeholders), I doubt you'll see any significant
> performance hit from this.
I had the same thought -- expecting that it would take some pretty darn big
intermediate strings to be any faster at all. I was not quite right. If
you replace the "join" iterator with plain str.join(), it does run a touch
slower:
def join_iterable(table, columns, values):
request = ''.join(chain(
('INSERT INTO ', table, '('),
join(', ', columns),
(') VALUES (',),
chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in
value)) for value in values))),
(') ON DUPLICATE KEY UPDATE ',),
chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in
columns))),
))
return request
def join_str(table, columns, values):
request = ''.join(chain(
('INSERT INTO ', table, '('),
', '.join(columns),
(') VALUES (',),
chain.from_iterable('), ('.join(', '.join('%s' for v in value) for
value in values)),
(') ON DUPLICATE KEY UPDATE ',),
chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)),
))
return request
In [31]: %timeit join_iterable(table, columns, values)
8.65 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [32]: %timeit join_str(table, columns, values)
13 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
But if you get rid of the whole pile of nested iterators and chain, and
make it a simple str.join, with each bit using plain str.join(), it's
faster still.
def join_str_simple(table, columns, values):
request = "".join(["INSERT INTO ",
f"{table}({', '.join(columns)})",
" VALUES (",
"), (".join(', '.join('%s' for v in value) for value
in values),
") ON DUPLICATE KEY UPDATE ",
", ".join(f"{c}=VALUES({c})" for c in columns),
])
return request
In [33]: %timeit join_str_simple(table, columns, values)
5.09 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And I would argue more readable. (though the double comprehensions are not
great ...)
Granted, you may lose that advantage if you had a lot more values -- but I
expect it wouldn't be a real issue until you had hundreds or more.
> Also, I would be VERY surprised if the cost
> of in-memory string manipulation exceeds the cost of an actual
> database transaction.
>
well, yeah -- these are all very fast.
> But more importantly: Is it any more readable? What you have there is
> pretty opaque. Is the str.join version worse than that?
you be the judge :-)
Code enclosed.
-CHB
--
Christopher Barker, PhD
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
from itertools import chain #, join
def join(sep, iterable):
notfirst=False
for i in iterable:
if notfirst:
yield sep
else:
notfirst=True
yield i
def join_iterable(table, columns, values):
request = ''.join(chain(
('INSERT INTO ', table, '('),
join(', ', columns),
(') VALUES (',),
chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))),
(') ON DUPLICATE KEY UPDATE ',),
chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))),
))
return request
def join_str(table, columns, values):
request = ''.join(chain(
('INSERT INTO ', table, '('),
', '.join(columns),
(') VALUES (',),
chain.from_iterable('), ('.join(', '.join('%s' for v in value) for value in values)),
(') ON DUPLICATE KEY UPDATE ',),
chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)),
))
return request
def join_str_simple(table, columns, values):
# table = 'mytable'
# columns=('id', 'v1', 'v2')
# values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
request = "".join([f"INSERT INTO ",
f"{table}({', '.join(columns)})",
" VALUES (",
"), (".join(', '.join('%s' for v in value) for value in values),
") ON DUPLICATE KEY UPDATE ",
", ".join(f"{c}=VALUES({c})" for c in columns),
])
return request
# args = list(chain.from_iterable(values))
table = 'mytable'
columns=('id', 'v1', 'v2')
values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
request = join_iterable(table, columns, values)
print(request)
print()
request = join_str(table, columns, values)
print(request)
print()
request = join_str_simple(table, columns, values)
print(request)
assert join_iterable(table, columns, values) == join_str(table, columns, values)
assert join_iterable(table, columns, values) == join_str_simple(table, columns, values)
# > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2)_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/OX6DEXWDZ7QI32JJOVKSEA5BQMBJZHHB/
Code of Conduct: http://python.org/psf/codeofconduct/