On 25/03/2021 08:14, Loris Bennett wrote:
I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.
Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python. Thus, in a single Org
file I run
/usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \
xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 "
" $9}' | sed 's/%//g'
The raw numbers are formatted by Org into a table
| cpu_eff | mem_eff |
|---------+---------|
| 96.6 | 99.11 |
| 93.43 | 100.0 |
| 91.3 | 100.0 |
| 88.71 | 100.0 |
| 89.79 | 100.0 |
| 84.59 | 100.0 |
| 83.42 | 100.0 |
| 86.09 | 100.0 |
| 92.31 | 100.0 |
| 90.05 | 100.0 |
| 81.98 | 100.0 |
| 90.76 | 100.0 |
| 75.36 | 64.03 |
I then read this into some Python code in the Org file and do something like
df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
cpu_data = df.loc[: , "cpu_eff"]
mem_data = df.loc[: , "mem_eff"]
...
n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
which generates nice histograms.
I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job. However, as a novice Python programmer
I am finding translating the bash part slightly clunky. I am in the
middle of doing this and started with the following:
sacct = subprocess.Popen(["/usr/bin/sacct",
"-u", user,
"-S", period[0], "-E", period[1],
"-o", "jobid", "-X",
"-s", "COMPLETED", "-n"],
stdout=subprocess.PIPE,
)
jobids = []
for line in sacct.stdout:
jobid = str(line.strip(), 'UTF-8')
jobids.append(jobid)
for jobid in jobids:
seff = subprocess.Popen(["/usr/bin/seff", jobid],
stdin=sacct.stdout,
stdout=subprocess.PIPE,
)
The statement above looks odd. If seff can read the jobids from stdin
there should be no need to pass them individually, like:
sacct = ...
seff = Popen(
["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
...
seff_output = []
for line in seff.stdout:
seff_output.append(str(line.strip(), "UTF-8"))
...
but compared the to the bash pipeline, this all seems a bit laboured.
Does any one have a better approach?
Cheers,
Loris
-----Original Message-----
From: Cameron Simpson <c...@cskk.id.au>
Sent: Wednesday, March 24, 2021 6:34 PM
To: Avi Gross <avigr...@verizon.net>
Cc: python-list@python.org
Subject: Re: convert script awk in python
On 24Mar2021 12:00, Avi Gross <avigr...@verizon.net> wrote:
But I wonder how much languages like AWK are still used to make new
programs as compared to a time they were really useful.
You mentioned in an adjacent post that you've not used AWK since 2000.
By contrast, I still use it regularly.
It's great for proof of concept at the command line or in small scripts, and
as the innards of quite useful scripts. I've a trite "colsum" script which
does nothing but generate and run a little awk programme to sum a column,
and routinely type "blah .... | colsum 2" or the like to get a tally.
I totally agree that once you're processing a lot of data from places or
where a shell script is making long pipelines or many command invocations,
if that's a performance issue it is time to recode.
Cheers,
Cameron Simpson <c...@cskk.id.au>
Footnotes:
[1] https://orgmode.org/
--
https://mail.python.org/mailman/listinfo/python-list