On 22/08/12 20:28, Pete O'Connell wrote:
Hi. The next step for me to parse the file as I want to is to change
lines that look like this:
f 21/21/21 22/22/22 24/24/23 23/23/24
into lines that look like this:
f 21 22 23 24

In English, what is the rule you are applying here? My guess is:

"Given three numbers separated by slashes, ignore the first two numbers
and keep the third."

E.g. "17/25/97" => 97.

Am I close?


Below is my terribly slow loop for doing this. Any suggestions about
how to make this code more efficient would be greatly appreciated

What makes you say it is "terribly slow"? Perhaps it is as fast as it
could be under the circumstances. (Maybe it takes a long time because
you have a lot of data, not because it is slow.)

The first lesson of programming is not to be too concerned about speed
until your program is correct.

Like most such guidelines, this is not entirely true -- you don't want
to write code which is unnecessarily slow. But the question you should
be asking is, "is it fast enough?" rather than "is it fast?".

Also, the sad truth is that Python tends to be slower than some other
languages. (It's also faster than some other languages too.) But the
general process is:

1) write something that works correctly;

2) if it is too slow, try to speed it up in Python;

3) if that's still too slow, try using something like cython or PyPy

4) if all else fails, now that you have a working prototype, re-write
it again in C, Java, Lisp or Haskell.

Once they see how much more work is involved in writing fast C code,
most people decide that "fast enough" is fast enough :)


with open(fileName) as lines:
     theGoodLines = [line.strip("\n") for line in lines if "vn" not in
line and "vt" not in line and line != "\n"]

I prefer to write code in chains of filters.

with open(fileName) as lines:
    # get rid of leading and trailing whitespace, including newlines
    lines = (line.strip() for line in lines)
    # ignore blanks
    lines = (line in lines if line)
    # ignore lines containing "vn" or "vt"
    theGoodLines = [line in lines if not ("vn" in line or "vt" in line)]

Note that only the last step is a list comprehension using [ ], the others
are generator expressions using ( ) instead.

Will the above be faster than your version? I have no idea. But I think it
is more readable and understandable. Some people might disagree.


for i in range(len(theGoodLines)):
     if theGoodLines[i][0] == "f":
         aGoodLineAsList = theGoodLines[i].split(" ")
         theGoodLines[i] = aGoodLineAsList[0] + " " +
aGoodLineAsList[1].split("/")[-1] + " " +
aGoodLineAsList[2].split("/")[-1] + " " +
aGoodLineAsList[3].split("/")[-1] + " " +
aGoodLineAsList[4].split("/")[-1]


Start with a helper function:

def extract_last_item(term):
    """Extract the item from a term like a/b/c"""
    return term.split("/")[-1]


for i, line in enumerate(theGoodLines):
    if line[0] == "f":
        terms = line.split()
        theGoodLines[i] = " ".join([extract_last_item(t) for t in terms])



See how you go with that.



--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to