Script Optimization

2008-05-03 Thread lev
Can anyone provide some advice/suggestions to make a script more
precise/efficient/concise, etc.?
The script verifies checksums, renames dirs/files, edits checksum
filenames, and htm page filenames, all from mp3cd format ([0-9][0-9]
[0-9]xxx.yyy) to something more usable: for ripping an mp3cd to disk
(everything works, but I think it's a bit bulky).

It's too long to post here (160 lines) so here's the link:
http://uppit.com/d2/CKOYHE/af78a6bd3e21a19d5871abb9b879/utils.py
(if that doesn't work: http://uppit.com/CKOYHE)

Thanks in advance,
lev
--
http://mail.python.org/mailman/listinfo/python-list


Re: Script Optimization

2008-05-04 Thread lev
> * Remove newlines introduced by email
> * Move imports to start of file
used imports of the edited script you sent.
> * Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.
> * Move main() to bottom of script
> * Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation
> * Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someotherdir\script.py' rather than 'python
script.py', because I need it to work in it's own and parent
directory.
> * Split lines so they fit into 80 chars
> * Add spaces after commas
> * Use path.join instead of string interpolation
in all cases when possible - done
> * rename rename() to rename_md5() because rename() shadows a function
> imported from os.
renamed all functions to more understandable names (without
collisions)
> * Rename vars shadowing imported names
renamed almost all vars to more understandable names
> * Improve logic for checking when to print help
the example you gave me does pretty much the exact same thing as
before... (the options are either false or true depending on if the
argument was used, if false for both then no logic was done and help
is shown, which would be exactly the same if the did_something var
remained false.
> * Create emtpy md5 listing file if one doesn't exist
I intended it to be a script to help ripping a specific mp3cd to
disk, not necessarily create checksum files, because i intend to
include the checksums file.
> * Add a comment for a dodgy-looking section
The 4 folders to be renamed are intentional (this is for a
specific mp3cd with 4 album folders)

I added comments to explain what I was doing with the dictionary[x][1]
[1][0], and also what the indexes for the strings are used for ([3:]
to remove the 001 in 001Track.mp3, etc.)


Thanks for the advice so far,
lev

#!/usr/bin/env python

import md5
from glob import glob
from optparse import OptionParser
from os import chdir, path, rename, remove
from sys import argv, exit

def verify_checksum_set(checksums):
checksums = open(checksums, 'r')
changed_files = {}
missing_files = []
for fline in checksums.readlines():
line = fline.split(' *')
original_sum = line[0].upper()
try:
new_sum = calculate_checksum(line[1].strip())
if  new_sum == original_sum:
print '.',
pass
else:
changed_files[line[1]] = (original_sum, new_sum)
pass
except IOError:
missing_files.append(line[1])
pass
pass
checksums.close()
changed_files_keys = changed_files.keys()
changed_files_keys.sort()
missing_files.sort()
print '\n'
if len(changed_files) != 0:
print 'File(s) changed:'
for key in changed_files_keys:
print key.strip('\n'), 'changed from:\n\t',
changed_files[key][0], \
'to\n\t', changed_files[key][1]
pass
print '\n\t', len(changed_files), 'file(s) changed.\n'
pass
if len(missing_files) != 0:
print 'File(s) not found:'
for x in range(len(missing_files)):
print '\t', missing_files[x]
pass
print '\n\t', len(missing_files), 'file(s) not found.\n'
pass
if not len(changed_files) and not len(missing_files):
print "\n\tChecksums Verified\n"
pass
pass

def calculate_checksum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196
checksum = md5.new()
while (True):
chunkdata = file_to_check.read(chunk)
if not chunkdata:
break
checksum.update(chunkdata)
pass
file_to_check.close()
return checksum.hexdigest().upper()

def rename_file_set(new_dir_names, checksums):
file_info = md5format(checksums)
dirlist = glob('00[1-4]Volume [1-4]')
dirlist.sort()
for x in range(4):
rename(dirlist[x], new_dir_names[x])
print '\t', dirlist[x], 'renamed to:', new_dir_names[x]
chdir(new_dir_names[x])
for old_file_name in glob ('*.mp3'):
# old_file_name[3:] is part of removing numbering:
'001Track ...'
new_file_name = old_file_name[3:]
rename(old_f

Re: Script Optimization

2008-05-06 Thread lev
On May 4, 10:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Sun, 04 May 2008 17:01:15 -0300, lev <[EMAIL PROTECTED]> escribió:
>
> >> * Change indentation from 8 spaces to 4
> > I like using tabs because of the text editor I use, the script at
> > the end is with 4 though.
>
> Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?
>
> >> * Remove useless "pass" and "return" lines
> > I replaced the return nothing lines with passes, but I like
> > keeping them in case the indentation is ever lost - makes it easy to
> > go back to original indentation
>
> I can't think of a case when only indentation "is lost" - if you have a crash 
> or something, normally you lose much more than indentation... Simple backups 
> or a SCM system like cvs/svn will help. So I don't see the usefulness of 
> those "pass" statements; I think that after some time using Python you'll 
> consider them just garbage, as everyone else.
>
> >> * Temporarily change broken "chdir" line
> > removed as many instances of chdir as possible (a few useless ones
> > to accomodate the functions - changed functions to not chdir as much),
> > that line seems to work... I made it in case the script is launched
> > with say: 'python somedir\someotherdir\script.py' rather than 'python
> > script.py', because I need it to work in it's own and parent
> > directory.
>
> You can determine the directory where the script resides using
>
> import os
> basedir = os.path.dirname(os.path.abspath(__file__))
>
> This way it doesn't matter how it was launched. But execute the above code as 
> soon as possible (before any chdir)
>
> > checksums = open(checksums, 'r')
> > for fline in checksums.readlines():
>
> You can directly iterate over the file:
>
>  for fline in checksums:
>
> (readlines() reads the whole file contents in memory; I guess this is not an 
> issue here, but in other cases it may be an important difference)
> Although it's perfectly valid, I would not reccomend using the same name for 
> two different things (checksums refers to the file name *and* the file itself)
>
> > changed_files_keys = changed_files.keys()
> > changed_files_keys.sort()
> > missing_files.sort()
> > print '\n'
> > if len(changed_files) != 0:
> > print 'File(s) changed:'
> > for key in changed_files_keys:
>
> You don't have to copy the keys and sort; use the sorted() builtin:
>
>  for key in sorted(changed_files.iterkeys()):
>
> Also, "if len(changed_files) != 0" is usually written as:
>
>  if changed_files:
>
> The same for missing_files.
>
> > for x in range(len(missing_files)):
> > print '\t', missing_files[x]
>
> That construct range(len(somelist)) is very rarely used. Either you don't 
> need the index, and write:
>
> for missing_file in missing_files:
>  print '\t', missing_file
>
> Or you want the index too, and write:
>
> for i, missing_file in enumerate(missing_files):
>  print '%2d: %s' % (i, missing_file)
>
> > def calculate_checksum(file_name):
> > file_to_check = open(file_name, 'rb')
> > chunk = 8196
>
> Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't 
> remember the value. I usually write 1024*1024 when I want exactly 1M.
>
> --
> Gabriel Genellina

Thank you Gabriel, I did not know about a number of the commands you
posted, the use of 8196 was error on my part. I will change the script
to reflect your corrections later tonight, I have another project I
need to finish/comment/submit for corrections later on, so I will be
using the version of the script that I will come up with tonight.

Thank you for your invaluable advice,
The python community is the first online community that I have had
this much help from, Thank you all.
--
http://mail.python.org/mailman/listinfo/python-list


Ironpython experience

2009-12-23 Thread Lev
I'm an on and off Python developer and use it as one of the tools.
Never for writing "full-blown" applications, but rather small, "one-of-
a-kind" utilities. This time I needed some sort of backup and
reporting utility, which is to be used by the members of our team
once or twice a day. Execution time is supposed be negligible. The
project was an ideal candidate to be implemented in Python.  As
expected the whole script was about 200 lines and was ready in a 2
hours (the power of Python!).Then I downloaded Ironpython and
relatively painlessly (except the absence of zlib) converted the
Python code to Ironpython. Works fine and Ironython really is Python.
But...

The CPython 2.6 script runs 0.1 seconds, while Ironpython 2.6 runs
about 10 seconds. The difference comes from the start-up, when all
these numerous dlls/assemblies are loaded and JITed.

Is there any way to speed up the process.
-- 
http://mail.python.org/mailman/listinfo/python-list


M2Crypto for 2.4

2004-12-01 Thread Elbert Lev
People!

Have somebody build M2Crypto for 2.4 on windows? If yes, please tell
if there are any problems.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to distribute a Python app together with its dependencies?

2008-11-30 Thread Lev Elbert
If Python for Windows you can use Py2Exe package. It works very well
in simple cases and requires a few tweaks to make it recognize some
dependencies.
--
http://mail.python.org/mailman/listinfo/python-list


Re: stable algorithm with complexity O(n)

2008-12-13 Thread Lev Elbert
1. Comparison sorts have n*ln(n) complexity - does not do
2. Counting sort has the complexity O(d), where d is domain (in our
case n^2) - does not do.
3. Radix sorts have the complexity O(n*k), where k is number of bits
in integer. (32?) There are 2 variants:
a. most significant digit (MSD),
b. least significant digit (LSD).

The LSD radix sort is stable.

Good luck.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie Threading Question

2008-07-13 Thread Lev Elbert
On Jul 13, 8:33 am, Sparky <[EMAIL PROTECTED]> wrote:
> It seems strange, but I can't find a list of operating systems which
> support / don't support threading in Python. Can anyone point me in
> the right direction?
>
> Thanks,
> Sam

Here is the list (from Python documentation of thread module):
==
7.4 thread -- Multiple threads of control

This module provides low-level primitives for working with multiple
threads (a.k.a. light-weight processes or tasks) -- multiple threads
of control sharing their global data space. For synchronization,
simple locks (a.k.a. mutexes or binary semaphores) are provided.

The module is optional. It is supported on Windows, Linux, SGI IRIX,
Solaris 2.x, as well as on systems that have a POSIX thread (a.k.a.
``pthread'') implementation. For systems lacking the thread module,
the dummy_thread module is available. It duplicates this module's
interface and can be used as a drop-in replacement.
==



--
http://mail.python.org/mailman/listinfo/python-list


email module windows and suse

2008-04-13 Thread Lev Elbert
Hi, all!

I have to make a custom email module, based on the standard one. The
custom module has to be able to work with extremely large mails (1GB
+), having memory "footprint" much smaller.
The modified program has to work in SUSE environment, while the
development is done under Windows.  I'm not too good with linux and do
not know if speedup in Windows translates one-to-one into speedup in
SUSE. For example, if the bottleneck is IO, in windows I can spawn a
separate thread or 2 to do "read-ahead".

Are threads available and as effective in SUSE as they are in Windows?

I'd appreciate any suggestions concerning the modifications of the
module and concerning cross platform development.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: email module windows and suse

2008-04-13 Thread Lev Elbert
On Apr 13, 3:55 pm, Tim Roberts <[EMAIL PROTECTED]> wrote:
> Lev Elbert <[EMAIL PROTECTED]> wrote:
>
> >I have to make a custom email module, based on the standard one. The
> >custom module has to be able to work with extremely large mails (1GB
> >+), having memory "footprint" much smaller.
>
> Then you have a design problem right from the start.  It is extremely rare
> to find a mail server today that will transmit email messages larger than a
> few dozen megabytes.  Even on a 100 megabit network, it's takes a minute
> and a half for a 1GB message to go from the server to the user's
> workstation.
>
> What are you really trying to do here?  In most cases, you would be better
> off storing your attachments on a web server and transmitting links in the
> email.
>
> >The modified program has to work in SUSE environment, while the
> >development is done under Windows.  I'm not too good with linux and do
> >not know if speedup in Windows translates one-to-one into speedup in
> >SUSE. For example, if the bottleneck is IO, in windows I can spawn a
> >separate thread or 2 to do "read-ahead".
>
> We would need more information on your processing to advise you on this.
> Disk I/O is slow, network I/O is slower.  You can't go any faster than your
> slowest link.
>
> >Are threads available and as effective in SUSE as they are in Windows?
>
> Threads are available in Linux.  There is considerable debate over the
> relative performace improvement.
> --
> Tim Roberts, [EMAIL PROTECTED]
> Providenza & Boekelheide, Inc.

Thank you.

I have a 100mb mail file. I just made a very simple expiremnt

the message_from_file method boils down to a loop:
1while True:
2data = fp.read(block_size)
3if not data:
4break
5feedparser.feed(data)
6
Total time is 21 seconds (lines 1-6), while processing (non IO) lines
3-5 is 20 seconds. This means, that no IO optimization would help.
This also explains the following fact: changing the  block_size from
8K to 1M has almost no processing time impact. Also multithreading
wouldn't help.

I beleive I have to change Message class (more exactly: derive another
class, which would store pieces on a disk.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Server applications - avoiding sleep

2006-03-19 Thread Lev Elbert
You can make it a service, which has an advantage, that it survives logouts. 
SOME PROGRAMMING IS REQIURED. If I need something running the "fast and 
dirty" way, I run a regular python application as window application (start 
pythonw.exe).
As a way o communication (start, stop, pause) I use tray. If you nee an 
example of such a program, I can send one.

"rodmc" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>I have written a small server application (for Windows) which handles
> sending and receiving information from an instant messaging client and
> a database. This server needs to run 24/7, however it stops when the
> computer screen is locked.
>
> I assume there is a way to make it run in the background 24/7 but how
> do I go about doing this?
>
> At present the application runs from within a wxPython GUI, however
> this is only used to start and stop it. It could be entire faceless and
> the GUI only used to execute it.
>
> Best,
>
> rod
> 


-- 
http://mail.python.org/mailman/listinfo/python-list