Alan, Back when various UNIX (later also included in other Operating environments like Linux and the Mac OS and even Microsoft) utilities came along, the paradigm was a bit different and some kinds of tasks were seen as being done with a pipeline of often small and focused utilities. You mentioned SED which at first seems like a very simple tool but if you look again, it can replace lots of other tools mostly as you can write one-liners with lots of power. AWK, in some sense, was even more powerful and can emulate so many others.
But it came with a cost compared to some modern languages where by attaching a few modules, you can do much of the same in fewer passes over the data. I am not sure if I mentioned it here, but I was once on a project that stored all kinds of billing information in primitive text files using a vertical bar as record separator. My boss, who was not really a programmer, started looking at analyzing the data fairly primitively ended up writing huge shell scripts (ksh, I think) that remotely went to our computers around the world and gathered the files and processed them through pipelines that often were 10 or more parts as he selectively broke each line into parts, removed some and so on. He would use /bin/echo, cut, grep, sed, and so on. The darn thing ran for hours which was fine when it was running at midnight in Missouri, but not so much when it ran the same time in countries like Japan and Israel where the users were awake. I got lots of complaints and showed him how his entire mess could be replaced mostly by a single AWK script and complete in minutes. Of course, now, with a fast internet and modern languages that can run threads in parallel, it probably would complete in seconds. Maybe I would have translated that AWK to python after all, but these days I am studying Kotlin so maybe ... As I see it, many languages have a trade-off. The fact that AWK decided to allow a variable to be used without any other form of declaration, was a feature. It could easily lead to errors if you spelled something wrong. But look at Python. You can use a variable to hold anything just by using it. If you spell it wrong later when putting something else in it, no problem. You now have two variables. If you try to access the value of a non-initialized variable, you get an error. But many more strongly-typed languages would catch more potential errors. If you store an int in a variable and later mistakenly put a string in the same variable name, python is happy. And that can be a GOOD feature for programmers but will not catch some errors. Initializing variables to 0 really only makes sense for numeric variables. When a language allows all kinds of "objects" you might need an object-specific default initialization and for some objects, that makes no sense. As you note, the POSIX compliant versions of AWK do also initialize, if needed, to empty strings. But I wonder how much languages like AWK are still used to make new programs as compared to a time they were really useful. So many people sort of live within one application in a GUI rather than work at a textual level in a shell where many problems can rapidly be done with a few smaller tools, often in a pipeline. Avi -----Original Message----- From: Python-list <python-list-bounces+avigross=verizon....@python.org> On Behalf Of Alan Gauld via Python-list Sent: Wednesday, March 24, 2021 5:28 AM To: python-list@python.org Subject: Re: convert script awk in python On 23/03/2021 14:40, Avi Gross via Python-list wrote: > $1 == 113 { > if (x || y || z) > print "More than one type $8 atom."; > else { > x = $2; y = $3; z = $4; > istep++; > } > } > > I am a tod concerned as to where any of the variables x, y or z have > been defined at this point. They haven't been, they are using awk's auto-initialization feature. The variables are defined in this bit of code. The first time we see $1 == 113 we define the variables. On subsequent appearances we print the warning. > far as I know has not been called. Weird. Maybe awk is allowing an > uninitialized variable to be tested for in your code but if so, you > need to be cautious how you do this in python. It's standard behaviour in any POSIX compliant awk, variables are initialised to empty strings/arrays or zero as appropriate to first use. The original AWK book has already been mentioned, which covers nawk. I'll add the O'Reilly book "sed & awk" which covers the POSIX version and includes several extensions not covered in the original book. (It also covers sed but that's irrelevant here) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list