Multiprocessing and file I/O
Hi All, I am trying to speed up some code which reads a bunch of data from a disk file. Just for the fun of it, I thought to try and use parallel I/O to split the reading of the file between multiple processes. Although I have been warned that concurrent access by multiple processes to the same file may actually slow down the reading of the file, I was curious to try some timings by varying the number of processes which read the file. I know almost nothing of multiprocessing, so I was wondering if anyone had some very simple snippet of code which demonstrates how to read a file using multiprocessing. My idea was to create a "big" file by doing: fid = open("somefile.txt", "wb") fid.write("HELLO\n"*1e7) fid.close() and then using fid.seek() to point every process I start to a position inside the file and start reading from there. For example, with 4 processes and a 10 MB file, I would tell the first process to read from byte 0 to byte 2.5 million, the second one from 2.5 million to 5 million and so on. I just have an academic curiosity :-D Any suggestion is very welcome, either to the approach or to the actual implementation. Thank you for your help. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing and file I/O
Hi Igor, On May 24, 1:10 pm, Igor Katson wrote: > Infinity77 wrote: > > Hi All, > > > I am trying to speed up some code which reads a bunch of data from > > a disk file. Just for the fun of it, I thought to try and use parallel > > I/O to split the reading of the file between multiple processes. > > Although I have been warned that concurrent access by multiple > > processes to the same file may actually slow down the reading of the > > file, I was curious to try some timings by varying the number of > > processes which read the file. I know almost nothing of > > multiprocessing, so I was wondering if anyone had some very simple > > snippet of code which demonstrates how to read a file using > > multiprocessing. > > > My idea was to create a "big" file by doing: > > > fid = open("somefile.txt", "wb") > > fid.write("HELLO\n"*1e7) > > fid.close() > > > and then using fid.seek() to point every process I start to a position > > inside the file and start reading from there. For example, with 4 > > processes and a 10 MB file, I would tell the first process to read > > from byte 0 to byte 2.5 million, the second one from 2.5 million to 5 > > million and so on. I just have an academic curiosity :-D > > > Any suggestion is very welcome, either to the approach or to the > > actual implementation. Thank you for your help. > > > Andrea. > > If the thing you would want to speed up is the processing of the file > (and not the IO), I would make one process actually read the file, and > feed the other processes with the data from the file through a queue. No, the processing of the data is fast enough, as it is very simple. What I was asking is if anyone could share an example of using multiprocessing to read a file, along the lines I described above. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing and file I/O
Hi Paul & All, On May 24, 4:16 pm, Paul Boddie wrote: > On 24 Mai, 16:13, Infinity77 wrote: > > > > > No, the processing of the data is fast enough, as it is very simple. > > What I was asking is if anyone could share an example of using > > multiprocessing to read a file, along the lines I described above. > > Take a look at this section in an article about multi-threaded > processing of large files: > > http://effbot.org/zone/wide-finder.htm#a-multi-threaded-python-solution Thank you for the pointer, I have read the article and the follow-ups with much interest... it's unfortunate Python is no more on the first place though :-D I'll see if I can come up with a faster implementation of my (f2py- fortran-based) Python module using multiprocessing. Thank you. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
docstrings => RestructuredText => Sphinx => Nice html docs (?)
Hi All, I apologize in advance if my question sounds dumb. I googled back and forth but my google-fu today is not working very well... I have seen the new style Python html documentation, which is extremely nice, and by reading here and there I have seen that it has been generated using Georg Brandl's Sphinx package (http:// pypi.python.org/pypi/Sphinx). Now, I have an open source application I am documenting, and I would like to use the new html style, so I thought: to use Sphinx to obtain the same result. The problem is, I don't know if there is a tool out there which will extract docstrings from my module and convert them to a RestructuredText formal which may be (almost) directly fed into Sphinx to get the html help. Maybe I am asking the impossible, or maybe it is much easier than I thought, but I can't find a solution, so I thought to ask here... Thank you in advance for your suggestions. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Where is PyMethod_GET_CLASS in Python 3?
Hi All, When building C extensions In Python 2.X, there was a magical PyMethod_GET_CLASS implemented like this: #define PyMethod_GET_CLASS(meth) \ (((PyMethodObject *)meth) -> im_class) It looks like Python 3 has wiped out the "im_class" attribute. Which is the alternative was to handle this case in Python 3? How do I find to which class this particular method belongs to? BTW, it's very, very, *very* hard to find any possible reference to help migrating existing C extensions from Python 2.X to Python 3. Thank you for your suggestions. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Where is PyMethod_GET_CLASS in Python 3?
Hi, On Dec 15, 9:22 pm, Terry Reedy wrote: > On 12/15/2009 11:08 AM, Infinity77 wrote: > > > Hi All, > > > When building C extensions In Python 2.X, there was a magical > > PyMethod_GET_CLASS implemented like this: > > > #define PyMethod_GET_CLASS(meth) \ > > (((PyMethodObject *)meth) -> im_class) > > > It looks like Python 3 has wiped out the "im_class" attribute. > > For bound methods, renamed to __class__ to be consistent with other > objects. Unbound methods were eliminated as extra cruft. First of all, thank you for your answer. However, being a complete newbie in writing C extension, I couldn't seem to find a way to do what I asked in the first place: Try 1: # define PyMethod_GET_CLASS(meth) \ (((PyMethodObject *)meth) -> __class__) error C2039: '__class__' : is not a member of 'PyMethodObject' Try 2: PyObject * magicClass = method -> __class__ error C2039: '__class__' : is not a member of '_object' I know I am doing something stupid, please be patient :-D . Any suggestion is more than welcome. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: WxPython versus Tkinter.
On Jan 24, 9:57 pm, Robin Dunn wrote: > On Jan 23, 4:31 pm, "Martin v. Loewis" wrote: > > > > WxPython Challenge 1 code updated... > > > > * Fixed tab traveral > > > * Removed hand-holding code > > > * Removed some cruft > > > > https://sites.google.com/site/thefutureofpython/home/code-challenges > > > > Good luck! > > > Still crashes the interpreter. > > The crash on Linux is due to SetSingleStyle removing the all items and > the columns when the mode of the listctrl is changed, and then the > code continues on with the assumption that the columns still exist and > the crash happens when an item is added to column zero and there is no > column zero. Apparently the native widget on Windows doesn't have > that limitation. BTW, if the linux wx packages had the runtime > assertions turned on then you would have seen a Python exception with > some clues that would probably help solve the problem. I don't know > about others but on Ubuntu you can install the *wx*-dbg packages to > get a version with the assertions turned on. Hopefully that will > change starting with 2.9 as wx now turns on the assertions by default > for builds configured normally, and the wx-dev team recommends that > the assertions are not turned off, except in rare circumstances. > > BTW, on behalf of the wxPython community I'd like to apologize for the > havoc caused by the flaming troll escaping from his cage. In general > wxPython users are much less militant and zealotty and honor > everyone's freedom to choose which ever UI tool kit works the best for > their own needs. I have been involved in the wxPython development for many years (mostly on implementation of custom widgets, in the AGW library), and I share Robin's concerns about this kind of "publicity" given to wxPython. Python comes with TK as a "battery included" UI toolkit. I'm perfectly fine with this, as I am not going to use it anyway. Whether in the future TK will be replaced by PyGTK, PyQT, PySide, etc... in the standard library, it won't make any difference to those aficionados developers who use wxPython. We'll still download the wxPython binaries/sources/whatever and use it to develop our own GUIs. There is simply no match between wxPython and X (substitute X with whatever GUI toolkit you like). This is obviously my very biased opinion. It is very unfortunate that this topic "wxPython vs. Tkinter" has drifted to another flame war, as there is really no point in this kind of discussion. As a general rule, a GUI-newbie should try all the GUI toolkits out there and settle with the one which looks easier/nicer/ more convenient/more feature rich. As usual, it is a matter of personal taste. For those experiencing with wxPython for the first time, I highly suggest you to join our wxPython mailing list: you'll find a friendly place, with many experienced developers answering questions and a BDFL who's there (almost) every day offering solutions for the toughest problems. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Linux servers, network and file names
Hi All, I apologize in advance if this sounds like a stupid question but I am really no expert at all in network things, and I may be looking in the wrong direction altogether. At work we have a bunch of Linux servers, and we can connect to them with our Windows PCs over a network. Now, let's assume we only have one Linux server, let's call it SERVER. Everyone of us, on our Windows PC, can map this server as a network drive choosing whatever Windows "drive letter" he wants. For example, I could map SERVER to be "Y:/", my colleague might call it "Z:/" and so on. The problem arises in one of my little applications, which allows the user to choose a file living in SERVER and do some calculations with it; then, this file name gets saved in a common database (common in the sense that it is shared between Windows users, ourselves). Now, I choose this file myself, the FileDialog (a window representing a file selector dialog) will return something like this (let's ignore the back/forward slashes, this is not an issue): Y:/Folder/FileName.txt If my colleague does it, he will get: Z:/Folder/FileName.txt Even if I am able to work out the server name using the Windows drive letter (and I was able to do it, now I don't remember anymore how to do it), what I get is: For me: //SERVER/gavana/Folder/FileName.txt Colleague: //SERVER/Colleague/Folder/FileName.txt So, no matter what I do, the file name stored in the database is user- dependent and not universal and common to all of us. Am I missing something fundamental? I appreciate any suggestion, even a Windows-only solution (i.e., based on PyWin32) would be perfect. Thank you in advance for your help. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Linux servers, network and file names
Hi Tim, On Apr 22, 4:04 pm, Tim Golden wrote: > On 22/04/2010 15:13, Infinity77 wrote: > > [I] choose this file myself, the FileDialog (a window representing a file > > selector dialog) will return something like this (let's ignore the > > back/forward slashes, this is not an issue): > > > Y:/Folder/FileName.txt > > > If my colleague does it, he will get: > > > Z:/Folder/FileName.txt > > > Even if I am able to work out the server name using the Windows drive > > letter (and I was able to do it, now I don't remember anymore how to > > do it), what I get is: > > > For me: //SERVER/gavana/Folder/FileName.txt > > Colleague: //SERVER/Colleague/Folder/FileName.txt > > Why do you and your colleagues have different share names? I have no idea, I will ask our IT, they set it up :-) > Can you just use a UNC like the above and end up with the same path? By the few tests I run up to now, seems like I can... it seems odd however for a file name stored somewhere (I mean, the string representing the file name) to have my name in it. Feels fragile and incorrect. But whatever ;-) Thank you Tim for your help. Andrea. -- http://mail.python.org/mailman/listinfo/python-list
Re: Linux servers, network and file names
Hi Martin & All, On Apr 23, 9:50 am, "Martin P. Hellwig" wrote: > On 04/22/10 15:13, Infinity77 wrote: > > > > For me: //SERVER/gavana/Folder/FileName.txt > > Colleague: //SERVER/Colleague/Folder/FileName.txt > > > So, no matter what I do, the file name stored in the database is user- > > dependent and not universal and common to all of us. > > If that user dependent part happens to be equal to the login name, then > what you could do is replace is with the username variable (I believe > %USERNAME% on windows) instead. The funny thing is that the user dependent part *is* the login name, but not the Windows one, it is the *Linux SERVER* one, and no mapping has been done between Windows logins and Linux usernames. IT... Anyway, it seems like Tim's suggestion is working (for the moment), so I'll stick with it as this network/filenames issue has already taken me a ridiculous amount of time to fix :-) Thank you guys for your help! Andrea. -- http://mail.python.org/mailman/listinfo/python-list