On 25/04/18 03:26, Evuraan wrote: > Please consider this situation : > Each line in "massive_input.txt" need to be churned by the > "time_intensive_stuff" function, so I am trying to background it.
What kind of "churning" is involved? If its compute intensive threading may not be the right answer, but if its I/O bound then threading is probably ok. > import threading > > def time_intensive_stuff(arg): > # some code, some_conditional > return (some_conditional) What exactly do you mean by some_conditional? Is it some kind of big decision tree? Or if/else network? Or is it dependent on external data (from where? a database? network?) And you return it - but what is returned? - an expression, a boolean result? Its not clear what the nature of the task is but that makes a big difference to how best to parallelise the work. > with open("massive_input.txt") as fobj: > for i in fobj: > thread_thingy = thread.Threading(target=time_intensive_stuff, args=(i,) > ) > thread_thingy.start() > > With above code, it still does not feel like it is backgrounding at > scale, Can you say why you feel that way? What measurements have you done? What system observations(CPU, Memory, Network etc)? What did you expect to see and what did you see. Also consider that processing a huge number of lines will generate a huge number of subprocesses or threads. There is an overhead to each thread and your computer may not have enough resources to run them all efficiently. It may be better to batch the lines so each subprocess handles 10, or 50 or 100 lines (whatever makes sense). Put a loop into your time intensive function to process the list of input values and return a list of outputs. And your external loop needs an inner loop to create the batches. The number of entries in the batch can be parametrized so that you can experiment to find the most cost effective size.. > I am sure there is a better pythonic way. I suspect the issues are not Python specific but are more generally about paralleling large jobs. > How do I achieve something like this bash snippet below in python: > > time_intensive_stuff_in_bash(){ > # some code > : > } > > for i in $(< massive_input.file); do > time_intensive_stuff_in_bash i & disown > : > done Its the same except in bash you start a whole new process so instead of using threading you use concurrent. But did you try this in bash? Was it faster than using Python? I would expect the same issues of too many processes to arise in bash. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor