Program inefficiency?

2007-09-29 Thread hall . jeff
I wrote the following simple program to loop through our help files
and fix some errors (in case you can't see the subtle RE search that's
happening, we're replacing spaces in bookmarks with _'s)

the program works great except for one thing. It's significantly
slower through the later files in the search then through the early
ones... Before anyone criticizes, I recognize that that middle section
could be simplified with a for loop... I just haven't cleaned it
up...

The problem is that the first 300 files take about 10-15 seconds and
the last 300 take about 2 minutes... If we do more than about 1500
files in one run, it just hangs up and never finishes...

Is there a solution here that I'm missing? What am I doing that is so
inefficient?

# File: masseditor.py

import re
import os
import time

def massreplace():
editfile = open("pathname\editfile.txt")
filestring = editfile.read()
filelist = filestring.splitlines()
##errorcheck = re.compile('(a name=)+(.*)(-)+(.*)(>)+')
for i in range(len(filelist)):
source = open(filelist[i])
starttext = source.read()
interimtext = replacecycle(starttext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
interimtext = replacecycle(interimtext)
finaltext = replacecycle(interimtext)
source.close()
source = open(filelist[i],"w")
source.write(finaltext)
source.close()
##if errorcheck.findall(finaltext)!=[]:
##print errorcheck.findall(finaltext)
##print filelist[i]
if i == 100:
print "done 100"
print time.clock()
elif i == 300:
print "done 300"
print time.clock()
elif i == 600:
print "done 600"
print time.clock()
elif i == 1000:
print "done 1000"
print time.clock()
print "done"
print i
print time.clock()

def replacecycle(starttext):
p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)( )+(.*)(">)+')
p2= re.compile('(name=")+(.*)( )+(.*)(">)+')
p3= re.compile('(href=|HREF=)+(.*)(#)+(.*)(\')+(.*)(">)+')
p4= re.compile('(name=")+(.*)(\')+(.*)(">)+')
p5= re.compile('(href=|HREF=)+(.*)(#)+(.*)(-)+(.*)(">)+')
p6= re.compile('(name=")+(.*)(-)+(.*)(">)+')
p7= re.compile('(href=|HREF=)+(.*)(#)+(.*)(<)+(.*)(">)+')
p8= re.compile('(name=")+(.*)(<)+(.*)(">)+')
p7= re.compile('(href=|HREF=")+(.*)(#)+(.*)(:)+(.*)(">)+')
p8= re.compile('(name=")+(.*)(:)+(.*)(">)+')
p9= re.compile('(href=|HREF=")+(.*)(#)+(.*)(\?)+(.*)(">)+')
p10= re.compile('(name=")+(.*)(\?)+(.*)(">)+')
p100= re.compile('(a name=)+(.*)(-)+(.*)(>)+')
q1= r"\1\2\3\4_\6\7"
q2= r"\1\2_\4\5"
interimtext = p1.sub(q1, starttext)
interimtext = p2.sub(q2, interimtext)
interimtext = p3.sub(q1, interimtext)
interimtext = p4.sub(q2, interimtext)
interimtext = p5.sub(q1, interimtext)
interimtext = p6.sub(q2, interimtext)
interimtext = p7.sub(q1, interimtext)
interimtext = p8.sub(q2, interimtext)
interimtext = p9.sub(q1, interimtext)
interimtext = p10.sub(q2, interimtext)
interimtext = p100.sub(q2, interimtext)

return interimtext

massreplace()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
I did try moveing the re.compile's up and out of the replacecylce()
but it didn't impact the time in any meaningful way (2 seconds
maybe)...

I'm not sure what an shell+sed script is... I'm fairly new to Python
and my only other coding experience is with VBA... This was my first
Python program

In case it helps... We started with only 6 loops of replacecycle() but
had to keep adding progressively more as we found more and more links
with lots of spaces in them... As we did that, the program's time grew
progressively longer but the length grew multiplicatively with the
added number of cycles... This is exactly what I would have expected
and it leads me to believe that the problem does not lie in the
replacecycle() def but in the masseditor() def... *shrug*

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
XP is the OS... the files are split across a ton of subdirectories
already...

I'm actually starting to think there's a problem with certain files,
however...

We create help files for clients using RoboHelp... RoboHelp has Source
HTML and then "webhelp" html which is what actually goes to the
client... I'm trying to mass maintenance the "source" files... Right
now, my program works but you've got to delete the webhelp files
first... I figured that (based on the exponential growth in processing
time) it was the additional number of files... However, after
streamlining the codes I got the following results

done 300
4.1904767226e-006
done 600
7.97062280262
done 900
22.3963802662
done 1200
29.9211888662
done
1375
35.3465962853

with the webhelp deleted and

done 300
4.1904767226e-006
done 600
7.6259175398
done 900
13.3994678095
still processing 10 minutes later

with the webhelp intact

Since the system didn't hang sometime after 1375 (and in fact, still
hasn't made it there), I can only assume that it hit one of the
webhelp files and freaked out...

The thing that's really weird is that the files it's hanging on appear
to be some of the most basic files in the whole system (small, not
alot going on... no hits on the RE search)... So I may just tell the
users to delete the webhelp and have robohelp recreate it after
they've run the program...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
no swaps... memory usage is about 14k (these are small Html files)...
no hard drive cranking away or fan on my laptop going nutty... CPU
usage isn't even pegged... that's what makes me think it's not some
sort of bizarre memory leak... Unfortunately, it also means I'm out of
ideas...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
For anyone that cares, I figured out the "problem"... the webhelp
files that it hits the wall on are the compiled search files... They
are the only files in the system that have line lengths that are
RIDICULOUS in length... I'm looking at one right now that has 32767
characters all on one line...

I'm absolutely certain that that's the problem...

Thanks for everyone's help

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
The search is trying to replace the spaces in our bookmarks (and the
links that go to those bookmarks)...

The bookmark tag looks like this:



and the bookmark tag looks like this



some pitfalls I've already run up against...
SOMETIMES (but not often) the a and the href (or name) is split across
a line... this led me to just drop the ")+')
and the corresponding name replace and then the one corner case we ran
into of
p100= re.compile('(a name=)+(.*)(-)+(.*)(>)+')

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program inefficiency?

2007-09-29 Thread hall . jeff
It think he's saying it should look like this:

# File: masseditor.py

import re
import os
import time

p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)(\w\'\?-<:)+(.*)(">)+')
p2= re.compile('(name=")+(.*)(\w\'\?-<:)+(.*)(">)+')
p100= re.compile('(a name=)+(.*)(-)+(.*)(>)+')
q1= r"\1\2\3\4_\6\7"
q2= r"\1\2_\4\5"

def massreplace():
editfile = open("C:\Program Files\Credit Risk Management\Masseditor
\editfile.txt")
filestring = editfile.read()
filelist = filestring.splitlines()

for i in range(len(filelist)):
source = open(filelist[i])
starttext = source.read()

for i in range (13):
interimtext = p1.sub(q1, starttext)
interimtext= p2.sub(q2, interimtext)
interimtext= p100.sub(q2, interimtext)
source.close()
source = open(filelist[i],"w")
source.write(finaltext)
source.close()

massreplace()

I'll try that and see how it works...

-- 
http://mail.python.org/mailman/listinfo/python-list


Pulling data from a .asps site

2007-11-27 Thread hall . jeff
There's a government website which shows public data for banks. We'd
like to pull the data down programmatically but the data is "hidden"
behind .aspx...

Is there anyway in Python to hook in directly to a browser (firefox or
IE) to do the following...

1) Fill the search criteria
2) Press the "Search" button
3) Press another button (the CSV button) on the resulting page
4) Then grab the data out of the notepad file that pops up

If this is a wild good chase, let me know... (or if there's a better
way besides Python... I may have to explore writing a firefox plug-in
or something)...
-- 
http://mail.python.org/mailman/listinfo/python-list


easy_install

2009-02-08 Thread hall . jeff
For the life of me I can not figure out how to get easy_install to
work. The syntax displayed on the web page does not appear to work
properly.

easy_install c:\MySQL_python-1.2.2-py2.4-win32.egg

Is there a simpler way to install a python egg? Or am I missing
something with easy_install?
--
http://mail.python.org/mailman/listinfo/python-list


Re: easy_install

2009-02-08 Thread hall . jeff
On Feb 8, 9:27 am, "Diez B. Roggisch"  wrote:
> hall.j...@gmail.com wrote:
> > For the life of me I can not figure out how to get easy_install to
> > work. The syntax displayed on the web page does not appear to work
> > properly.
>
> > easy_install c:\MySQL_python-1.2.2-py2.4-win32.egg
>
> It usually works for me - so what does "not appear to work properly"
> actually mean?
>
> Diez

http://peak.telecommunity.com/DevCenter/EasyInstall#downloading-and-installing-a-package

seems to imply that after installation I can goto a command prompt and
type

easy_install c:\MySQL_python-1.2.2-py2.4-win32.egg

I tried doing this in the python interpreter and on a straight "cmd"
command prompt (the site doesn't really specify). I also tried "import
easy_install" and then

easy_install c:\MySQL_python-1.2.2-py2.4-win32.egg
easy_install ("c:\MySQL_python-1.2.2-py2.4-win32.egg")

and a couple other permutations and never got it to run (error
messages for the first group were "invalid syntax" and were various
flavors of "module not callable" for the second group).

--
http://mail.python.org/mailman/listinfo/python-list


Re: easy_install

2009-02-09 Thread hall . jeff
I had it downloaded and sitting in the root c:\ but didn't get it to
run because I didn't think about the \scripts folder not being in the
Path. Problem solved and fixed. Thank you all for your help.

On a side note, "easy_install MySQL-python" produced the following
messages:
Searching for MySQL-python
Reading http://pypi.python.org/simple/MySQL_python/
Reading http://sourceforge.net/projects/mysql-python
Reading http://sourceforge.net/projects/mysql-python/
Best match: MySQL-python 1.2.3b1
Downloading 
http://osdn.dl.sourceforge.net/sourceforge/mysql-python/MySQL-python-1.2.3b1.tar.gz
Processing MySQL-python-1.2.3b1.tar.gz
Running MySQL-python-1.2.3b1\setup.py -q bdist_egg --dist-dir c:
\docume~1\jhall\locals~1\temp\easy_install-t_ph9k\MySQL-
python-1.2.3b1\egg-dist-tmp-3gtuz9
error: The system cannot find the file specified

installing from the hard drive worked fine, however.
--
http://mail.python.org/mailman/listinfo/python-list


Re: socket error: connection refused?

2008-06-23 Thread hall . jeff
It's a security conflict. You should be able to run it again and have
it work. Our company's cisco does the same thing (even after we
approve the app)
--
http://mail.python.org/mailman/listinfo/python-list


tuple.index() and tuple.count()

2008-06-23 Thread hall . jeff
Before the inevitable response comes, let me assure you I've read
through the posts from Guido about this. 7 years ago Guido clearly
expressed a displeasure with allowing these methods for tuple. Let me
lay out (in a fresh way) why I think we should reconsider.

1) It's counterintuitive to exclude them: It makes very little sense
why an indexable data structure wouldn't have .index() as a method. It
makes even less sense to not allow .count()
2) There's no technical reason (that I'm aware of) why these can't be
added
3) It does not (contrary to one of Guido's assertions) require any
relearning of anything. It's a new method that could be added without
breaking any code whatsoever (there isn't even a UserTuple.py to
break)
4) The additional documentation is relatively minute (especially since
it could be copied and pasted virtually verbatim from the list methods
5) It's MORE Pythonic to do it this way (more intuitive, less
boilerplate)
6) It jives with the help file better. One of Guido's many stated
reasons was that tuples are for heterogeneous sequences and lists are
for homogeneous sequences. While this may be hypothetically true, the
help file does not come close to pointing you in this direction nor
does the implementation of the language.

example: "Tuples have many uses. For example: (x, y) coordinate pairs,
employee records from a database, etc. Tuples, like strings, are
immutable: it is not possible to assign to the individual items of a
tuple (you can simulate much of the same effect with slicing and
concatenation, though). It is also possible to create tuples which
contain mutable objects, such as lists." is a quote from the help
file. Not only does it never mention homogeneous vs. heterogeneous but
mentions both immutable and mutable which draws your mind and
attention to that aspect.

While tuples and lists may have different uses based on convention,
there's really only two reasons to ever use a tuple: Efficiency or
dictionary keys (or some similar immutability requirement). The
implementation contains absolutely NOTHING to reinforce the idea that
lists are for homogeneous data. The implementation of the language
contains EVERY indication that tuples are second class citizens only
to be used for those limited functions above (in fact, efficiency
isn't even talked about in the documentation... I pieced that together
from other threads). Tuples could have been implemented as frozenlist
just as easily.

The lack of .index() and .count() appears to be primarily motivated by
a subtle and silent (at least in the documentation) desire to push
towards coding "best practice" rather than for any technical reason.
While I'm certainly not a "change for change sake" kind of guy and I
understand the "bang for your buck" thinking, I'm just not seeing the
rational for stopping this so forcibly. I get the impression that if a
perfect working patch was submitted, Guido might still reject it which
just seems odd to me.

Again, I'm not trying to raise a stink or open old wounds, I just ran
across it in an app, started doing some research and was thoroughly
confused (for the record, I'm using the tuples as dictionary keys and
had a desire to do k.count() for some edit checks and realized I had
to convert the thing to a list first to run count() )
--
http://mail.python.org/mailman/listinfo/python-list


Re: tuple.index() and tuple.count()

2008-06-23 Thread hall . jeff
never mind... a coworker pointed me to this

http://bugs.python.org/issue1696444

apparently they're there in py3k...
--
http://mail.python.org/mailman/listinfo/python-list


Preferred method for "Assignment by value"

2008-04-15 Thread hall . jeff
As a relative new comer to Python, I haven't done a heck of a lot of
hacking around with it. I had my first run in with Python's quirky (to
me at least) tendency to assign by reference rather than by value (I'm
coming from a VBA world so that's the terminology I'm using). I was
surprised that these two cases behave so differently

test = [[1],[2]]
x = test[0]
x[0] = 5
test
>>> [[5],[2]]
x = 1
test
>>>[[5],[2]]
x
>>> 1

Now I've done a little reading and I think I understand the problem...
My issue is, "What's the 'best practise' way of assigning just the
value of something to a new name?"

i.e.
test = [[1,2],[3,4]]
I need to do some data manipulation with the first list in the above
list without changing 
obviously x = test[0] will not work as any changes i make will alter
the original...
I found that I could do this:
x = [] + test[0]

that gets me a "pure" (i.e. unconnected to test[0] ) list but that
concerned me as a bit kludgy

Thanks for you time and help.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Preferred method for "Assignment by value"

2008-04-15 Thread hall . jeff
Thank you both, the assigning using slicing works perfectly (as I'm
sure you knew it would)... It just didn't occur to me because it
seemed a little nonintuitive... The specific application was

def dicttolist (inputdict):
finallist=[]
for k, v in inputdict.iteritems():
temp = v
temp.insert(0,k)
finallist.append(temp)

return finallist

to convert a dictionary to a list. We deal with large amounts of
bankdata which the dictionary is perfect for since loan number is a
perfect key... at the end, though, I have to throw it into a csv file
and the csv writer doesn't like dictionaries (since the key is an
iterable string it iterates over each value in the key)

by changing temp = v[:] the code worked perfectly (although changing
temp.insert(0,k) to temp = [k] + temp also worked fine... I didn't
like that as I knew it was a workaround)

Thanks again for the help
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Preferred method for "Assignment by value"

2008-04-15 Thread hall . jeff
I think the fundamental "disconnect" is this issue of mutability and
immutability that people talk about (mainly regarding tuples and
whether they should be thought of as static lists or not)

Coming from VBA I have a tendency to think of everything as an
array...

So when I create the following

test=[1,2],[3,4],[5,6] I'm annoyed to find out that I can change do
the following
test[1][1] = 3
but i can't do
test[1] = [3,3]
and so I throw tuples out the window and never use them again...

The mental disconnect I had (until now) was that my original tuple was
in affect "creating" 3 objects (the lists) within a 4th object (the
tuple)... Previously, I'd been thinking of the tuple as one big object
(mentally forcing them into the same brain space as multi-dimensional
arrays in VBA)

This was a nice "aha" moment for me...
-- 
http://mail.python.org/mailman/listinfo/python-list