It's ...
... my first Python program! So please be gentle (no fifty ton weights on the head!), but tell me if it's properly "Pythonic", or if it's a dead parrot (and if the latter, how to revive it). I'm working from Beazley's /Python: Essential Reference/ (2nd ed. 2001), so my first newbie question is how best to find out what's changed from version 2.1 to version 2.5. (I've recently installed 2.5.4 on my creaky old Win98SE system.) I expect to be buying the 4th edition when it comes out, which will be soon, but before then, is there a quick online way to find this out? Having only got up to page 84 - where we can actually start to read stuff from the hard disk - I'm emboldened to try to learn to do something useful, such as removing all those annoying hard tab characters from my many old text files (before I cottoned on to using soft tabs in my text editor). This sort of thing seems to work, in the interpreter (for an ASCII text file, named 'h071.txt', in the current directory): stop = 3 # Tab stops every 3 characters from types import StringType # Is this awkwardness necessary? detab = lambda s : StringType.expandtabs(s, stop) # Or use def f = open('h071.txt') # Do some stuff to f, perhaps, and then: f.seek(0) print ''.join(map(detab, f.xreadlines())) f.close() Obviously, to turn this into a generally useful program, I need to learn to write to a new file, and how to parcel up the Python code, and write a script to apply the "detab" function to all the files found by searching a Windows directory, and replace the old files with the new ones; but, for the guts of the program, is this a reasonable way to write the code to strip tabs from a text file? For writing the output file, this seems to work in the interpreter: g = open('temp.txt', 'w') g.writelines(map(detab, f.xreadlines())) g.close() In practice, does this avoid creating the whole string in memory at one time, as is done by using ''.join()? (I'll have to read up on "opaque sequence objects", which have only been mentioned once or twice in passing - another instance perhaps being an xrange()?) Not that that matters much in practice (in this simple case), but it seems elegant to avoid creating the whole output file at once. OK, I'm just getting my feet wet, and I'll try not to ask too many silly questions! First impressions are: (1) Python seems both elegant and practical; and (2) Beazley seems a pleasantly unfussy introduction for someone with at least a little programming experience in other languages. -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Wed, 24 Jun 2009 20:53:49 +0100, I wrote: >[...] my first newbie question is how best to find out >what's changed from version 2.1 to version 2.5. >[...] is there a quick online way to find this out? One way seems to be: <http://www.python.org/doc/2.3/whatsnew/> <http://www.python.org/doc/2.4/whatsnew/> <http://www.python.org/doc/2.5/whatsnew/> ... although there doesn't seem to be any <http://www.python.org/doc/2.2/whatsnew/> ... ah! ... <http://www.python.org/doc/2.2.3/whatsnew/> "What's New in Python 2.2" -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Wed, 24 Jun 2009 16:40:29 -0400, "J. Cliff Dyer" wrote: >On Wed, 2009-06-24 at 20:53 +0100, Angus Rodgers wrote: >> [...] >> from types import StringType # Is this awkwardness necessary? > >Not anymore. You can just use str for this. > >> detab = lambda s : StringType.expandtabs(s, stop) # Or use def > >First, use def. lambda is a rarity for use when you'd rather not assign >your function to a variable. > >Second, expandtabs is a method on string objects. s is a string object, >so you can just use s.expandtabs(stop) How exactly do I get detab, as a function from strings to strings (for a fixed tab size)? (This is aside from the point, which you make below, that the whole map/join idea is a bit of a no-no - in some other context, I might want to isolate a method like this.) >Third, I'd recommend passing your tabstops into detab with a default >argument, rather than defining it irrevocably in a global variable >(which is brittle and ugly) No argument there - I was just messing about in the interpreter, to see if the main idea worked. >> f = open('h071.txt') # Do some stuff to f, perhaps, and then: >> f.seek(0) > >f is not opened for writing, so if you do stuff to the contents of f, >you'll have to put the new version in a different variable, so f.seek(0) >doesn't help. If you don't do stuff to it, then you're at the beginning >of the file anyway, so either way, you shouldn't need to f.seek(0). I seemed to find that if I executed f.xreadlines() or f.readlines() once, I was somehow positioned at the end of the file or something, and had to do the f.seek(0) - but maybe I did something else silly. >> print ''.join(map(detab, f.xreadlines())) > >Sometime in the history of python, files became iterable, which means >you can do the following: > >for line in f: >print detab(line) > >Much prettier than running through join/map shenanigans. This is also >the place to modify the output before passing it to detab: > >for line in f: ># do stuff to line >print detab(line) > >Also note that you can iterate over a file several times: > >f = open('foo.txt') >for line in f: >print line[0] # prints the first character of every line >for line in f: >print line[1] #prints the second character of every line >> f.close() This all looks very nice. >> For writing the output file, this seems to work in the interpreter: >> >> g = open('temp.txt', 'w') >> g.writelines(map(detab, f.xreadlines())) >> g.close() >> > >Doesn't help, as map returns a list. Pity. Oh, well. >You can use itertools.imap, or you >can use a for loop, as above. This is whetting my appetite! >The terms to look for, rather than opaque sequence objects are >"iterators" and "generators". OK, will do. >Glad you're enjoying Beazley. I would look for something more >up-to-date. Python's come a long way since 2.1. I'd hate for you to >miss out on all the iterators, booleans, codecs, subprocess, yield, >unified int/longs, decorators, decimals, sets, context managers and >new-style classes that have come since then. I'll get either Beazley's 4th ed. (due next month, IIRC), or Chun, /Core Python Programming/ (2nd ed.), or both, unless someone has a better suggestion. (Eventually I'll migrate from Windows 98SE(!), and will need info on Python later than 2.5, but that's all I need for now.) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Wed, 24 Jun 2009 22:12:33 +0100, I wrote: >How exactly do I get detab, as a function from strings to strings >(for a fixed tab size)? (It's OK - this has been explained in another reply. I'm still a little hazy about what exactly objects are in Python, but the haze will soon clear, I'm sure, especially after I have written more than one one-line program!) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Wed, 24 Jun 2009 14:10:54 -0700, Scott David Daniels wrote: >Angus Rodgers wrote: > >> from types import StringType # Is this awkwardness necessary? >Nope I'm starting to see some of the mental haze that was confusing me. >Also, expandtabs is an instance method, so the roundabout is not needed. > > def detab(s): > return s.expandtabs(stop) I'd forgotten where Beazley had explained that "methods such as ... s.expandtabs() always return a new string as opposed to mod- ifying the string s." I must have been hazily thinking of it as somehow modifying s, even though my awkward code itself depended on a vague understanding that it didn't. No point in nailing this polly to the perch any more! >I'd simply use: > for line in f: > print detab(line.rstrip()) >or even: > for line in f: > print line.rstrip().expandtabs(stop) I'll read up on iterating through files, somewhere online for the moment, and then get a more up-to-date textbook. And I'll try not too ask too many silly questions like this, but I wanted to make sure I wasn't getting into any bad programming habits right at the start - and it's a good thing I did, because I was! >Nope. But you could use a generator expression if you wanted: > g.writelines(detab(line) for line in f) Ah, so that actually does what I was fondly hoping my code would do. Thanks! I must learn about these "generator" thingies. -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Wed, 24 Jun 2009 22:43:01 +0100, I wrote: >No point in nailing this polly to the perch any more! Indeed not, so please skip what follows (I've surely been enough of an annoying newbie, already!), but I've just remembered why I wrote my program in such an awkward way. I wanted to be able to import the type name t (StringType in this case) so that I could simply use t.m() as the name of one of its methods [if "method" is the correct term]; but in this case, where m is expandtabs(), an additional parameter (the tab size) is needed; so, I used the lambda expression to get around this, entirely failing to realise that (as was clearly shown in the replies I got), if I was going to use "lambda" at all (not recommended!), then it would be a lot simpler to write the function as lambda s : s.m(), with or without any additional parameters needed. (It didn't really have anything to do with a separate confusion as to what exactly "objects" are.) >I wanted to make sure I wasn't getting into any bad programming >habits right at the start I'm just trying to make sure I really understand how I screwed up. (In future, I'll try to work through a textbook with exercises. But I thought I'd better try to get some quick feedback at the start, because I knew that I was fumbling around, and that it was unlikely to be necessary to use such circumlocutions.) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: [SPAM] It's ...
Someone has gently directed me to the Tutor mailing list: <http://mail.python.org/mailman/listinfo/tutor> which I hadn't known about. I've joined, and will try to confine my initial blundering experiments to there. Sorry about the spam spam spam spam, lovely spam, wonderful spam! -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Thu, 25 Jun 2009 17:53:51 +0100, I wrote: >On Thu, 25 Jun 2009 10:31:47 -0500, Kirk Strauser > wrote: > >>At 2009-06-24T19:53:49Z, Angus Rodgers writes: >> >>> print ''.join(map(detab, f.xreadlines())) >> >>An equivalent in modern Pythons: >> >>>>> print ''.join(line.expandtabs(3) for line in file('h071.txt')) > >I guess the code below would also have worked in 2.1? >(It does in 2.5.4.) > > print ''.join(line.expandtabs(3) for line in \ > file('h071.txt').xreadlines()) Possibly silly question (in for a penny ...): does the new feature, by which a file becomes iterable, operate by some kind of coercion of a file object to a list object, via something like x.readlines()? -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Thu, 25 Jun 2009 17:56:47 +0100, I burbled incoherently: >[...] does the new feature, >by which a file becomes iterable, operate by some kind of coercion >of a file object to a list object, via something like x.readlines()? Sorry to follow up my own post yet again (amongst my weapons is a fanatical attention to detail when it's too late!), but I had better rephrase that question: Scratch "list object", and replace it with something like: "some kind of iterator object, that is at least already implicit in 2.1 (although the term 'iterator' isn't mentioned in the index to the 2nd edition of Beazley's book)". Something like that! 8-P -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Thu, 25 Jun 2009 10:31:47 -0500, Kirk Strauser wrote: >At 2009-06-24T19:53:49Z, Angus Rodgers writes: > >> print ''.join(map(detab, f.xreadlines())) > >An equivalent in modern Pythons: > >>>> print ''.join(line.expandtabs(3) for line in file('h071.txt')) I guess the code below would also have worked in 2.1? (It does in 2.5.4.) print ''.join(line.expandtabs(3) for line in \ file('h071.txt').xreadlines()) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Thu, 25 Jun 2009 17:56:47 +0100, I found a new way to disgrace myself, thus: >[...] something like x.readlines()? ^ I don't know how that full stop got in there. Please ignore it! -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Thu, 25 Jun 2009 18:22:48 +0100, MRAB wrote: >Angus Rodgers wrote: >> On Thu, 25 Jun 2009 10:31:47 -0500, Kirk Strauser >> wrote: >> >>> At 2009-06-24T19:53:49Z, Angus Rodgers writes: >>> >>>> print ''.join(map(detab, f.xreadlines())) >>> An equivalent in modern Pythons: >>> >>>>>> print ''.join(line.expandtabs(3) for line in file('h071.txt')) >> >> I guess the code below would also have worked in 2.1? >> (It does in 2.5.4.) >> >> print ''.join(line.expandtabs(3) for line in \ >> file('h071.txt').xreadlines()) >> >That uses a generator expression, which was introduced in 2.4. Sorry, I forgot that list comprehensions need square brackets. The following code works in 2.1 (I installed version 2.1.3, on a different machine, to check!): f = open('h071.txt') # Can't use file('h071.txt') in 2.1 print ''.join([line.expandtabs(3) for line in f.xreadlines()]) (Of course, in practice I'll stick to doing it the more sensible way that's already been explained to me. I'm ordering a copy of Wesley Chun, /Core Python Programming/ (2nd ed., 2006), to learn about version 2.5.) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: It's ...
On Sat, 27 Jun 2009 03:32:12 -0300, "Gabriel Genellina" wrote: >Iterators were added in Python 2.2. Just my luck. :-) >See PEP 234 http://www.python.org/dev/peps/pep-0234/ You've got to love a language whose documentation contains sentences beginning like this: "Among its chief virtues are the following four -- no, five -- no, six -- points: [...]" -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Fri, 26 Jun 2009 18:58:27 -0700 (PDT), powah wrote: >On Jun 26, 4:51 pm, Chris Rebert wrote: >> On Fri, Jun 26, 2009 at 12:43 PM, powah wrote: >> > How to change the first character of the line to uppercase in a text >> > file? >> > [...] >> >> We're not in the business of doing homework. Some hints though: >> >> `s.upper()` converts the string in variable `s` to all upper case >> (e.g. "aBcD".upper() --> "ABCD") >> `for line in afile:` iterates over each line in a file object. >> [...] >> >> And here are the docs on working with files: >> http://docs.python.org/library/functions.html#open >> http://docs.python.org/library/stdtypes.html#file-objects >> >> That should be enough to get you started. > >Thank you for your hint. >This is my solution: >f = open('test', 'r') >for line in f: >print line[0].upper()+line[1:], I know this is homework, so I didn't want to say anything (especially as I'm a newcomer, also just starting to learn the language), but it seems OK to mention that if you hunt around some more in the standard library documentation, you'll find an even shorter way to write this. -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Fri, 26 Jun 2009 18:58:27 -0700 (PDT), powah wrote: >Thank you for your hint. >This is my solution: >f = open('test', 'r') >for line in f: >print line[0].upper()+line[1:], Will your program handle empty lines of input correctly? -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Sat, 27 Jun 2009 11:39:28 +0100, I asked rhetorically: >On Fri, 26 Jun 2009 18:58:27 -0700 (PDT), powah > wrote: > >>Thank you for your hint. >>This is my solution: >>f = open('test', 'r') >>for line in f: >>print line[0].upper()+line[1:], > >Will your program handle empty lines of input correctly? Strangely enough, it seems to do so, but why? -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Sat, 27 Jun 2009 13:02:47 +0200, Peter Otten <__pete...@web.de> wrote: >Angus Rodgers wrote: > >> On Sat, 27 Jun 2009 11:39:28 +0100, I asked rhetorically: >> >>>Will your program handle empty lines of input correctly? >> >> Strangely enough, it seems to do so, but why? > >Because there aren't any. When you read lines from a file there will always >be at least the newline character. Otherwise it would indeed fail: > >>>> for line in "peter\npaul\n\nmary".splitlines(): >... print line[0].upper() + line[1:] >... >Peter >Paul >Traceback (most recent call last): > File "", line 2, in >IndexError: string index out of range Hmm ... the \r\n sequence at the end of a Win/DOS file seems to be treated as a single character. -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Sat, 27 Jun 2009 12:13:57 +0100, I wrote: >the \r\n sequence at the end of a Win/DOS file Of course, I meant the end of a line of text, not the end of the file. (I promise I'll try to learn to proofread my posts. This is getting embarrassing!) -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Sat, 27 Jun 2009 12:13:57 +0100, I wrote: >Hmm ... the \r\n sequence at the end of a Win/DOS file seems to be >treated as a single character. For instance, if test001A.txt is this file: abc xyz Bd ef gH ij and test001E.py is this: f = open('test001A.txt', 'r') for line in f: print repr(line) then the output from "python test001E.py > temp.txt" is this: 'abc xyz\n' 'Bd ef\n' '\n' 'gH ij\n' and indeed the output from "print repr(f.read())" is this: 'abc xyz\nBd ef\n\ngH ij\n' How do you actually get to see the raw bytes of a file in Windows? OK, this seems to work: f = open('test001A.txt', 'rb') # Binary mode print repr(f.read()) Output: 'abc xyz\r\nBd ef\r\n\r\ngH ij\r\n' Indeed, when a Windows file is opened for reading in binary mode, the length of an "empty" line is returned as 2. This is starting to make some sense to me now. -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: change the first character of the line to uppercase in a text file
On Sat, 27 Jun 2009 13:49:57 +0200, Peter Otten <__pete...@web.de> wrote: >Angus Rodgers wrote: > >> On Sat, 27 Jun 2009 13:02:47 +0200, Peter Otten >> <__pete...@web.de> wrote: >> >>>Angus Rodgers wrote: >>> >>>> On Sat, 27 Jun 2009 11:39:28 +0100, I asked rhetorically: >>>> >>>>>Will your program handle empty lines of input correctly? >>>> >>>> Strangely enough, it seems to do so, but why? >>> >>>Because there aren't any. When you read lines from a file there will >>>always be at least the newline character. Otherwise it would indeed fail: >>> >>>>>> for line in "peter\npaul\n\nmary".splitlines(): >>>... print line[0].upper() + line[1:] >>>... >>>Peter >>>Paul >>>Traceback (most recent call last): >>> File "", line 2, in >>>IndexError: string index out of range >> >> Hmm ... the \r\n sequence at the end of a Win/DOS > >line > >> seems to be treated as a single character. > >Yes, but "\n"[1:] will return an empty string rather than fail. Yes, I understood that, and it's logical, but what was worrying me was how to understand the cross-platform behaviour of Python with regard to the different representation of text files in Windows and Unix-like OSs. (I remember getting all in a tizzy about this the last time I tried to do any programming. That was in C++, about eight years ago. Since then, I've only written a couple of short BASIC programs for numerical analysis on a TI-84+ calculator, and I feel as if I don't understand ANYTHING any more, but I expect it'll come back to me. Sorry about my recent flurry of confused posts! If I have any silly questions of my own, I'll post then to the Tutor list, but in this instance, I imagined I knew what I was talking about, and didn't expect to get into difficulties ...) 8-P -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Buffer pair for lexical analysis of raw binary data
Partly as an educational exercise, and partly for its practical benefit, I'm trying to pick up a programming project from where I left off in 2001. It implemented in slightly generalised form the "buffer pair" scheme for lexical analysis described on pp. 88--92 of Aho et al., /Compilers: Principles, Techniques and Tools/ (1986). (I'm afraid I don't have a page reference for the 2007 second edition. Presumably it's also in Knuth somewhere.) Documentation for one of the C++ header files describes it thus (but I never quite got the hang of C++, so some of the language- specific details may be very poorly conceived): "An object incorporates a handle to a file, opened in read-only mode, and a buffer containing (by default) raw binary data from that file. The constructor also has an option to open a file in text mode. The buffer may, optionally, consist of several segments, linked to one another in cyclic sequence. The number of segments is a constant class member, nblocks (1 <= nblocks <= 32,767). A second constant class member, block (1 <= block <= 32,767) gives the size of each of the segments in bytes. The purpose of creating a buffer in cyclically linked segments is to allow reference to the history of reading the file, even though it is being read sequentially. The bare class does not do this itself, but is designed so that classes derived from it may incorporate one or more pointers to parts of the buffer that have already been read (assuming these parts have not yet been overwritten). If there were only one segment, the length of available history would periodically be reduced to zero, when the buffer is re- freshed. In general, the available history occupies at least a fraction (nblocks - 1)/nblocks of a full buffer." Aho et al. describe the scheme thus (p. 90): "Two pointers to the input buffer are maintained. The string of characters between the two pointers is the current lexeme. Initially, both pointers point to the first character of the next lexeme to be found. One, called the forward pointer, scans ahead until a match for a pattern is found. Once the next lexeme is determined, the forward pointer is set to the character at its right end. After the lexeme is processed, both pointers are set to the character immediately past the lexeme." [There follows a description of the use of "sentinels" to test efficiently for pointers moving past the end of input to date.] I seem to remember (but my memory is still very hazy) that there was some annoying difficulty in coding the raw binary input file reading operation in C++ in an implementation-independent way; and I'm reluctant to go back and perhaps get bogged down again in whatever way I got bogged down before; so I would prefer to use Python for the whole thing, if possible (either using some existing library, or else by recoding it all myself in Python). Does some Python library already provide some functionality like this? (It's enough to do it with nblocks = 2, as in Aho et al.) If not, is this a reasonable thing to try to program in Python? (At the same time as learning the language, and partly as a fairly demanding exercise intended to help me to learn it.) Or should I just get my hands dirty with some C++ compiler or other, and get my original code working on my present machine (possibly in ANSI C instead of C++), and call it from Python? -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list
Re: Buffer pair for lexical analysis of raw binary data
On 28 Jun 2009 08:00:23 -0700, a...@pythoncraft.com (Aahz) wrote: >In article <0qec45lho8lkng4n20sb1ad4eguat67...@4ax.com>, >Angus Rodgers wrote: >> >>Partly as an educational exercise, and partly for its practical >>benefit, I'm trying to pick up a programming project from where >>I left off in 2001. It implemented in slightly generalised form >>the "buffer pair" scheme for lexical analysis described on pp. >>88--92 of Aho et al., /Compilers: Principles, Techniques and >>Tools/ (1986). (I'm afraid I don't have a page reference for the >>2007 second edition. Presumably it's also in Knuth somewhere.) >> >> [...] >> >>Does some Python library already provide some functionality like >>this? (It's enough to do it with nblocks = 2, as in Aho et al.) > >Not AFAIK, but there may well be something in the recipes or PyPI; have >you tried searching them? Searching for "buffer" at <http://pypi.python.org/pypi> (which I didn't know about) gives quite a few hits (including reflex 0.1, "A lightweight regex-based lexical scanner library"). By "recipes", do you mean <http://code.activestate.com/recipes/langs/python/> (also new to me)? There is certainly a lot of relevant code there (e.g. "Recipe 392150: Buffered Stream with Multiple Forward-Only Readers"), which I can try to learn from, even if I can't use it directly. Thanks! -- Angus Rodgers -- http://mail.python.org/mailman/listinfo/python-list