Dear GNU Coreutils maintainers,

I am writing to propose a backward-compatible enhancement that could
improve modern scripting enviroments while maintaining complete
compatiblily with existing workflows without any impact on performance.

PROBLEM:

Although the output of coreutils are meant to be human readable
only, many scripts today use/pipe them to other commands for various
kinds of automation. This leads to brittle solutions involving
complex awk/sed/grep gymnastics that break when the output format
changes slightly. While "everything is text" philosophy has served
GNU/Unix/Linux well, structured data processing has become important in
modern computing.

Even Microsoft people recognized this more than 20 years ago and added
built in structured output into MS Powershell from day one completely
eliminating text parsing entirely. Cloud tools like Docker, kubectl, MS
Github, Google Gcloud and increasing number of cli tools are providing
JSON output options as flags, as well as shells like Nushell, who have
reimplemented most of the coreutils to output structured data. This is
not unpresidented in the industry.

PROPOSAL: stdoutm and stderrm

I would like to propose the addition of two new optional machine
readable output streams (in addition to already present human readable
streams):

    - stdout (fd 1): human readable output
    - stderr (fd 2): human readable errors
    - stdoutm (fd 3): machine readable output (NEW)
    - stderrm (fd 4): machine readable errors (NEW)

The machine readable output format and conventions needs to be
established. JSON is the most obvious choice with battle-tested parsers
and tools, and immediately available for the scripting ecosystem. This
could be implemented incrementally, starting with "high-usage" commands
like (ls, ps, df, du) and then gradually expand coverage.

If the structured output is generated only when fd3/4 are open, there
should be not performance penalty and all existing behavior will
identical. It also doesn't require any flags or arguments.

EXAMPLES:

    # Traditional usage - UNCHANGED
    ls -l

    # Structured output
    ls 3>&1 1>/dev/null > metadata.json

    # Structured output scripting
    ls 3>&1 1>/dev/null | fx 'this.filter(x => x.size > 1048576)'
    ls 3>&1 1>/dev/null | jq '.[] | select(.size > 1048576)'

    # Traditional brittle approach (unreadable)
    ls -la | awk '$5 > 1048576 {print $9}' | grep -v '^d'

    # Structured error handling
    find / -name "*.txt" 4>&1 3>/dev/null | jq '.[] | select(.error == 
"EACCES")'

This eliminates unreliable fragile regex based approaches, provides
structured error handling, integrated with already present tools like
fx, jq and python scripts making sure existing scripts are not affected
at all (while gradually transitioning to structured output).

Would the maintainer team be interested in discussing further?

Thank you for your time and considerations.


    Annada

Reply via email to