Peter Otten <__pete...@web.de> writes: > On 25/03/2021 08:14, Loris Bennett wrote: > >> I'm not doing that, but I am trying to replace a longish bash pipeline >> with Python code. >> >> Within Emacs, often I use Org mode[1] to generate date via some bash >> commands and then visualise the data via Python. Thus, in a single Org >> file I run >> >> /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | >> \ >> xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print >> $3 " " $9}' | sed 's/%//g' >> >> The raw numbers are formatted by Org into a table >> >> | cpu_eff | mem_eff | >> |---------+---------| >> | 96.6 | 99.11 | >> | 93.43 | 100.0 | >> | 91.3 | 100.0 | >> | 88.71 | 100.0 | >> | 89.79 | 100.0 | >> | 84.59 | 100.0 | >> | 83.42 | 100.0 | >> | 86.09 | 100.0 | >> | 92.31 | 100.0 | >> | 90.05 | 100.0 | >> | 81.98 | 100.0 | >> | 90.76 | 100.0 | >> | 75.36 | 64.03 | >> >> I then read this into some Python code in the Org file and do something like >> >> df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0]) >> cpu_data = df.loc[: , "cpu_eff"] >> mem_data = df.loc[: , "mem_eff"] >> >> ... >> >> n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5)) >> n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5)) >> >> which generates nice histograms. >> >> I decided rewrite the whole thing as a stand-alone Python program so >> that I can run it as a cron job. However, as a novice Python programmer >> I am finding translating the bash part slightly clunky. I am in the >> middle of doing this and started with the following: >> >> sacct = subprocess.Popen(["/usr/bin/sacct", >> "-u", user, >> "-S", period[0], "-E", period[1], >> "-o", "jobid", "-X", >> "-s", "COMPLETED", "-n"], >> stdout=subprocess.PIPE, >> ) >> >> jobids = [] >> >> for line in sacct.stdout: >> jobid = str(line.strip(), 'UTF-8') >> jobids.append(jobid) >> >> for jobid in jobids: >> seff = subprocess.Popen(["/usr/bin/seff", jobid], >> stdin=sacct.stdout, >> stdout=subprocess.PIPE, >> ) > > The statement above looks odd. If seff can read the jobids from stdin > there should be no need to pass them individually, like: > > sacct = ... > seff = Popen( > ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE, > universal_newlines=True > ) > for line in seff.communicate()[0].splitlines(): > ...
Indeed, seff cannot read multiple jobids. That's why had 'xargs' in the original bash code. Initially I thought of calling 'xargs' via Popen, but this seemed very fiddly (I didn't manage to get it working) and anyway seemed a bit weird to me as it is really just a loop, which I can implement perfectly well in Python. Cheers, Loris >> seff_output = [] >> for line in seff.stdout: >> seff_output.append(str(line.strip(), "UTF-8")) >> >> ... >> >> but compared the to the bash pipeline, this all seems a bit laboured. >> >> Does any one have a better approach? >> >> Cheers, >> >> Loris >> >> >>> -----Original Message----- >>> From: Cameron Simpson <c...@cskk.id.au> >>> Sent: Wednesday, March 24, 2021 6:34 PM >>> To: Avi Gross <avigr...@verizon.net> >>> Cc: python-list@python.org >>> Subject: Re: convert script awk in python >>> >>> On 24Mar2021 12:00, Avi Gross <avigr...@verizon.net> wrote: >>>> But I wonder how much languages like AWK are still used to make new >>>> programs as compared to a time they were really useful. >>> >>> You mentioned in an adjacent post that you've not used AWK since 2000. >>> By contrast, I still use it regularly. >>> >>> It's great for proof of concept at the command line or in small scripts, and >>> as the innards of quite useful scripts. I've a trite "colsum" script which >>> does nothing but generate and run a little awk programme to sum a column, >>> and routinely type "blah .... | colsum 2" or the like to get a tally. >>> >>> I totally agree that once you're processing a lot of data from places or >>> where a shell script is making long pipelines or many command invocations, >>> if that's a performance issue it is time to recode. >>> >>> Cheers, >>> Cameron Simpson <c...@cskk.id.au> >> >> Footnotes: >> [1] https://orgmode.org/ >> > -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de -- https://mail.python.org/mailman/listinfo/python-list