Script Optimization
Can anyone provide some advice/suggestions to make a script more precise/efficient/concise, etc.? The script verifies checksums, renames dirs/files, edits checksum filenames, and htm page filenames, all from mp3cd format ([0-9][0-9] [0-9]xxx.yyy) to something more usable: for ripping an mp3cd to disk (everything works, but I think it's a bit bulky). It's too long to post here (160 lines) so here's the link: http://uppit.com/d2/CKOYHE/af78a6bd3e21a19d5871abb9b879/utils.py (if that doesn't work: http://uppit.com/CKOYHE) Thanks in advance, lev -- http://mail.python.org/mailman/listinfo/python-list
Re: Script Optimization
> * Remove newlines introduced by email > * Move imports to start of file used imports of the edited script you sent. > * Change indentation from 8 spaces to 4 I like using tabs because of the text editor I use, the script at the end is with 4 though. > * Move main() to bottom of script > * Remove useless "pass" and "return" lines I replaced the return nothing lines with passes, but I like keeping them in case the indentation is ever lost - makes it easy to go back to original indentation > * Temporarily change broken "chdir" line removed as many instances of chdir as possible (a few useless ones to accomodate the functions - changed functions to not chdir as much), that line seems to work... I made it in case the script is launched with say: 'python somedir\someotherdir\script.py' rather than 'python script.py', because I need it to work in it's own and parent directory. > * Split lines so they fit into 80 chars > * Add spaces after commas > * Use path.join instead of string interpolation in all cases when possible - done > * rename rename() to rename_md5() because rename() shadows a function > imported from os. renamed all functions to more understandable names (without collisions) > * Rename vars shadowing imported names renamed almost all vars to more understandable names > * Improve logic for checking when to print help the example you gave me does pretty much the exact same thing as before... (the options are either false or true depending on if the argument was used, if false for both then no logic was done and help is shown, which would be exactly the same if the did_something var remained false. > * Create emtpy md5 listing file if one doesn't exist I intended it to be a script to help ripping a specific mp3cd to disk, not necessarily create checksum files, because i intend to include the checksums file. > * Add a comment for a dodgy-looking section The 4 folders to be renamed are intentional (this is for a specific mp3cd with 4 album folders) I added comments to explain what I was doing with the dictionary[x][1] [1][0], and also what the indexes for the strings are used for ([3:] to remove the 001 in 001Track.mp3, etc.) Thanks for the advice so far, lev #!/usr/bin/env python import md5 from glob import glob from optparse import OptionParser from os import chdir, path, rename, remove from sys import argv, exit def verify_checksum_set(checksums): checksums = open(checksums, 'r') changed_files = {} missing_files = [] for fline in checksums.readlines(): line = fline.split(' *') original_sum = line[0].upper() try: new_sum = calculate_checksum(line[1].strip()) if new_sum == original_sum: print '.', pass else: changed_files[line[1]] = (original_sum, new_sum) pass except IOError: missing_files.append(line[1]) pass pass checksums.close() changed_files_keys = changed_files.keys() changed_files_keys.sort() missing_files.sort() print '\n' if len(changed_files) != 0: print 'File(s) changed:' for key in changed_files_keys: print key.strip('\n'), 'changed from:\n\t', changed_files[key][0], \ 'to\n\t', changed_files[key][1] pass print '\n\t', len(changed_files), 'file(s) changed.\n' pass if len(missing_files) != 0: print 'File(s) not found:' for x in range(len(missing_files)): print '\t', missing_files[x] pass print '\n\t', len(missing_files), 'file(s) not found.\n' pass if not len(changed_files) and not len(missing_files): print "\n\tChecksums Verified\n" pass pass def calculate_checksum(file_name): file_to_check = open(file_name, 'rb') chunk = 8196 checksum = md5.new() while (True): chunkdata = file_to_check.read(chunk) if not chunkdata: break checksum.update(chunkdata) pass file_to_check.close() return checksum.hexdigest().upper() def rename_file_set(new_dir_names, checksums): file_info = md5format(checksums) dirlist = glob('00[1-4]Volume [1-4]') dirlist.sort() for x in range(4): rename(dirlist[x], new_dir_names[x]) print '\t', dirlist[x], 'renamed to:', new_dir_names[x] chdir(new_dir_names[x]) for old_file_name in glob ('*.mp3'): # old_file_name[3:] is part of removing numbering: '001Track ...' new_file_name = old_file_name[3:] rename(old_f
Re: Script Optimization
On May 4, 10:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Sun, 04 May 2008 17:01:15 -0300, lev <[EMAIL PROTECTED]> escribió: > > >> * Change indentation from 8 spaces to 4 > > I like using tabs because of the text editor I use, the script at > > the end is with 4 though. > > Can't you configure it to use 4 spaces per indent - and not use "hard" tabs? > > >> * Remove useless "pass" and "return" lines > > I replaced the return nothing lines with passes, but I like > > keeping them in case the indentation is ever lost - makes it easy to > > go back to original indentation > > I can't think of a case when only indentation "is lost" - if you have a crash > or something, normally you lose much more than indentation... Simple backups > or a SCM system like cvs/svn will help. So I don't see the usefulness of > those "pass" statements; I think that after some time using Python you'll > consider them just garbage, as everyone else. > > >> * Temporarily change broken "chdir" line > > removed as many instances of chdir as possible (a few useless ones > > to accomodate the functions - changed functions to not chdir as much), > > that line seems to work... I made it in case the script is launched > > with say: 'python somedir\someotherdir\script.py' rather than 'python > > script.py', because I need it to work in it's own and parent > > directory. > > You can determine the directory where the script resides using > > import os > basedir = os.path.dirname(os.path.abspath(__file__)) > > This way it doesn't matter how it was launched. But execute the above code as > soon as possible (before any chdir) > > > checksums = open(checksums, 'r') > > for fline in checksums.readlines(): > > You can directly iterate over the file: > > for fline in checksums: > > (readlines() reads the whole file contents in memory; I guess this is not an > issue here, but in other cases it may be an important difference) > Although it's perfectly valid, I would not reccomend using the same name for > two different things (checksums refers to the file name *and* the file itself) > > > changed_files_keys = changed_files.keys() > > changed_files_keys.sort() > > missing_files.sort() > > print '\n' > > if len(changed_files) != 0: > > print 'File(s) changed:' > > for key in changed_files_keys: > > You don't have to copy the keys and sort; use the sorted() builtin: > > for key in sorted(changed_files.iterkeys()): > > Also, "if len(changed_files) != 0" is usually written as: > > if changed_files: > > The same for missing_files. > > > for x in range(len(missing_files)): > > print '\t', missing_files[x] > > That construct range(len(somelist)) is very rarely used. Either you don't > need the index, and write: > > for missing_file in missing_files: > print '\t', missing_file > > Or you want the index too, and write: > > for i, missing_file in enumerate(missing_files): > print '%2d: %s' % (i, missing_file) > > > def calculate_checksum(file_name): > > file_to_check = open(file_name, 'rb') > > chunk = 8196 > > Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't > remember the value. I usually write 1024*1024 when I want exactly 1M. > > -- > Gabriel Genellina Thank you Gabriel, I did not know about a number of the commands you posted, the use of 8196 was error on my part. I will change the script to reflect your corrections later tonight, I have another project I need to finish/comment/submit for corrections later on, so I will be using the version of the script that I will come up with tonight. Thank you for your invaluable advice, The python community is the first online community that I have had this much help from, Thank you all. -- http://mail.python.org/mailman/listinfo/python-list
Ironpython experience
I'm an on and off Python developer and use it as one of the tools. Never for writing "full-blown" applications, but rather small, "one-of- a-kind" utilities. This time I needed some sort of backup and reporting utility, which is to be used by the members of our team once or twice a day. Execution time is supposed be negligible. The project was an ideal candidate to be implemented in Python. As expected the whole script was about 200 lines and was ready in a 2 hours (the power of Python!).Then I downloaded Ironpython and relatively painlessly (except the absence of zlib) converted the Python code to Ironpython. Works fine and Ironython really is Python. But... The CPython 2.6 script runs 0.1 seconds, while Ironpython 2.6 runs about 10 seconds. The difference comes from the start-up, when all these numerous dlls/assemblies are loaded and JITed. Is there any way to speed up the process. -- http://mail.python.org/mailman/listinfo/python-list
M2Crypto for 2.4
People! Have somebody build M2Crypto for 2.4 on windows? If yes, please tell if there are any problems. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to distribute a Python app together with its dependencies?
If Python for Windows you can use Py2Exe package. It works very well in simple cases and requires a few tweaks to make it recognize some dependencies. -- http://mail.python.org/mailman/listinfo/python-list
Re: stable algorithm with complexity O(n)
1. Comparison sorts have n*ln(n) complexity - does not do 2. Counting sort has the complexity O(d), where d is domain (in our case n^2) - does not do. 3. Radix sorts have the complexity O(n*k), where k is number of bits in integer. (32?) There are 2 variants: a. most significant digit (MSD), b. least significant digit (LSD). The LSD radix sort is stable. Good luck. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie Threading Question
On Jul 13, 8:33 am, Sparky <[EMAIL PROTECTED]> wrote: > It seems strange, but I can't find a list of operating systems which > support / don't support threading in Python. Can anyone point me in > the right direction? > > Thanks, > Sam Here is the list (from Python documentation of thread module): == 7.4 thread -- Multiple threads of control This module provides low-level primitives for working with multiple threads (a.k.a. light-weight processes or tasks) -- multiple threads of control sharing their global data space. For synchronization, simple locks (a.k.a. mutexes or binary semaphores) are provided. The module is optional. It is supported on Windows, Linux, SGI IRIX, Solaris 2.x, as well as on systems that have a POSIX thread (a.k.a. ``pthread'') implementation. For systems lacking the thread module, the dummy_thread module is available. It duplicates this module's interface and can be used as a drop-in replacement. == -- http://mail.python.org/mailman/listinfo/python-list
email module windows and suse
Hi, all! I have to make a custom email module, based on the standard one. The custom module has to be able to work with extremely large mails (1GB +), having memory "footprint" much smaller. The modified program has to work in SUSE environment, while the development is done under Windows. I'm not too good with linux and do not know if speedup in Windows translates one-to-one into speedup in SUSE. For example, if the bottleneck is IO, in windows I can spawn a separate thread or 2 to do "read-ahead". Are threads available and as effective in SUSE as they are in Windows? I'd appreciate any suggestions concerning the modifications of the module and concerning cross platform development. -- http://mail.python.org/mailman/listinfo/python-list
Re: email module windows and suse
On Apr 13, 3:55 pm, Tim Roberts <[EMAIL PROTECTED]> wrote: > Lev Elbert <[EMAIL PROTECTED]> wrote: > > >I have to make a custom email module, based on the standard one. The > >custom module has to be able to work with extremely large mails (1GB > >+), having memory "footprint" much smaller. > > Then you have a design problem right from the start. It is extremely rare > to find a mail server today that will transmit email messages larger than a > few dozen megabytes. Even on a 100 megabit network, it's takes a minute > and a half for a 1GB message to go from the server to the user's > workstation. > > What are you really trying to do here? In most cases, you would be better > off storing your attachments on a web server and transmitting links in the > email. > > >The modified program has to work in SUSE environment, while the > >development is done under Windows. I'm not too good with linux and do > >not know if speedup in Windows translates one-to-one into speedup in > >SUSE. For example, if the bottleneck is IO, in windows I can spawn a > >separate thread or 2 to do "read-ahead". > > We would need more information on your processing to advise you on this. > Disk I/O is slow, network I/O is slower. You can't go any faster than your > slowest link. > > >Are threads available and as effective in SUSE as they are in Windows? > > Threads are available in Linux. There is considerable debate over the > relative performace improvement. > -- > Tim Roberts, [EMAIL PROTECTED] > Providenza & Boekelheide, Inc. Thank you. I have a 100mb mail file. I just made a very simple expiremnt the message_from_file method boils down to a loop: 1while True: 2data = fp.read(block_size) 3if not data: 4break 5feedparser.feed(data) 6 Total time is 21 seconds (lines 1-6), while processing (non IO) lines 3-5 is 20 seconds. This means, that no IO optimization would help. This also explains the following fact: changing the block_size from 8K to 1M has almost no processing time impact. Also multithreading wouldn't help. I beleive I have to change Message class (more exactly: derive another class, which would store pieces on a disk. -- http://mail.python.org/mailman/listinfo/python-list
Re: Server applications - avoiding sleep
You can make it a service, which has an advantage, that it survives logouts. SOME PROGRAMMING IS REQIURED. If I need something running the "fast and dirty" way, I run a regular python application as window application (start pythonw.exe). As a way o communication (start, stop, pause) I use tray. If you nee an example of such a program, I can send one. "rodmc" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I have written a small server application (for Windows) which handles > sending and receiving information from an instant messaging client and > a database. This server needs to run 24/7, however it stops when the > computer screen is locked. > > I assume there is a way to make it run in the background 24/7 but how > do I go about doing this? > > At present the application runs from within a wxPython GUI, however > this is only used to start and stop it. It could be entire faceless and > the GUI only used to execute it. > > Best, > > rod > -- http://mail.python.org/mailman/listinfo/python-list