On Wed, Aug 10, 2011 at 01:32:06PM +0100, Chris Angelico wrote: > On Wed, Aug 10, 2011 at 12:17 PM, <przemol...@poczta.fm> wrote: > > Hello, > > > > I'd like to write a python (2.6/2.7) script which connects to database, > > fetches > > hundreds of thousands of rows, concat them (basically: create XML) > > and then put the result into another table. Do I have any choice > > regarding string concatenation in Python from the performance point of view > > ? > > Since the number of rows is big I'd like to use the fastest possible library > > (if there is any choice). Can you recommend me something ? > > First off, I have no idea why you would want to create an XML dump of > hundreds of thousands of rows, only to store it in another table. > However, if that is your intention, list joining is about as efficient > as you're going to get in Python: > > lst=["asdf","qwer","zxcv"] # feel free to add 399,997 more list entries > xml="<foo>"+"</foo><foo>".join(lst)+"</foo>" > > This sets xml to '<foo>asdf</foo><foo>qwer</foo><foo>zxcv</foo>' which > may or may not be what you're after.
Chris, since this process (XML building) is running now inside database (using native SQL commands) and is one-thread task it is quite slow. What I wanted to do is to spawn several python subprocesses in parallel which will concat subset of the whole table (and then merge all of them at the end). Basically: - fetch all rows from the database (up to 1 million): what is recommended data type ? - spawn X python processes each one: - concat its own subset - merge the result from all the subprocesses This task is running on a server which has many but slow cores and I am trying to divide this task into many subtasks. Regards Przemyslaw Bak (przemol) ---------------------------------------------------------------- Doladuj telefon przez Internet! Sprawdz >> http://linkint.pl/f2a06 -- http://mail.python.org/mailman/listinfo/python-list