Aloha all,
FWIW, as a user actively pursuing reproducible research with Org
and a contributor to documentation about Org and Babel intended
for other users (rather than Org mode elisp coders) I'd be pleased
if Org's code custodians look favorably on this proposal.
+1
All the best,
Tom
Matt <m...@excalamus.com> writes:
Apologies for the book. I've been sitting on this stuff for two
months and am wondering how to proceed.
IANAL but AFAIK/CT, my contract contains an exception for making
contributions to projects like Org. I've gotten confirmation
from my manager and by HR. However, until the CEO signs the FSF
disclaimer, I can't officially contribute. I'm confident that I
can publish changes (e.g. to a personal website); the FSF just
can't accept my changes (yet).
I could start working on ob-shell.el separately now and publish
that myself. It's not clear how I would then contribute those
changes back once I'm cleared by the FSF. I'm inclined towards
some refactoring and I'm not sure how I could break that down in
such a way that if it took two more months to get the copyright
stuff worked out that anyone could make sense of the changes. I
would much prefer to gradually submit patches, discuss them, and
then eventually have them merged in turn. If I have a heap of
changes elsewhere, I feel like it would be harder to have a
conversion about them.
Regardless, I think I should define test cases first. If those
are considered valid, then any refactoring would be moot if they
pass, right? With (agreed upon) test cases, it shouldn't matter
if I refactor as long as functionality remains the same?
Overall, what advice do you have?
It looks to me like ob-shell.el could use some love in other
respects besides async evaluation. I've detailed where below,
partly for my own organization and partly for posterity, but
mainly because this isn't my house, so to speak, and I don't
want to barge in and start rearranging the furniture and eating
whatever's in the fridge. Or, is it like Worg in that once I
have the keys I can drive where I like, so long as there're no
crashes?
I'm interested in people's thoughts on these notes on
ob-shell.el:
* Tests
There are several code paths, like shebang, cmdline, and basic
execution, which don't always have something to do with one
another in the code. Having tests would be really helpful to
make sure everything jives. Before doing anything with the code
base, I intend to create tests for all the functionality.
* 2D-array
I documented two options for the =:var= header[fn:1]. The
ob-shell.el code actually defines a third option for 2D-arrays.
I couldn't get it to work. It was one of several things not
documented anywhere, at least as far as I could find, and which
I had to figure out straight from the code. Between not being
great at shell scripting and having a hard time deciphering that
ob-shell.el code, I'm not sure 2D-arrays are actually fully or
correctly implemented.
* M-up behavior <<M-up>>
The =org-babel-load-session:shell= code path only works when
M-up is used on a code block[fn:2]. Is M-up actually documented
anywhere? Furthermore, the =org-babel-load-session:shell= only
works for the "shell" language, which is not actually a "proper"
shell (i.e. it's not listed in =org-babel-shell-names=). The
M-up defaults to using $ESHELL or shell-file-name through the
=shell= command.
For example, try calling M-up on these:
#+comment: (opaquely) calls the system shell
#+begin_src shell :session my-shell
echo "hello, world!" #+end_src
#+comment: fails
#+begin_src sh :session my-sh
echo "hello, world!" #+end_src
#+comment: fails
#+begin_src bash :session my-bash
echo "hello, world!" #+end_src
To fix this, there needs to be an
=org-babel-load-session:<lang>= for each language in
=org-babel-shell-names=. This would probably make the most
sense in =org-babel-shell-initialize=. However, that function
[[org-babel-shell-initialize][needs attention]].
* Refactoring <<refactoring>>
The ob-shell.el code appears inconsistent to me. Clearly, this
is somewhat subjective. I've tried to give a rationale for each
point to make it less so. My goal is to be maintainer of
ob-shell.el, but that's not forever. These things were an
obstacle for me and my aim is to remove them for the next
person.
** =org-babel-shell-initialize= <<org-babel-shell-initialize>>
As alluded to elsewhere, =org-babel-shell-initialize= doesn't
appear to adhere to the (elisp) Coding Conventions,
#+begin_quote
• Constructs that define a function or variable should be
macros, not functions, and their names should start with
‘define-’. The macro should receive the name to be defined
as the first argument. That will help various tools find the
definition automatically. Avoid constructing the names in
the macro itself, since that would confuse these tools.
#+end_quote
The =org-babel-shell-initialize= function defines
=org-babel-execute:<lang>=,
=org-babel-variable-assignments:<lang>=, and
=org-babel-default-header-args:<lang>= for the "languages" in
=org-babel-shell-names=. As it stands, that
=org-babel-shell-initialize= is a function does no harm (aside
from being confusing by way of straying from convention).
However, if the [[M-up][M-up]] issue is to be resolved, it seems
to me that =org-babel-shell-initialize= should be updated to
match the convention (i.e. be a macro).
** Organization
Definitions are introduced in different orders. For example,
the =org-babel-shell-initialize= function whose sole purpose is
to work with =org-babel-shell-names= is defined before the
reader knows what =org-babel-shell-names= is. Later, this
pattern of defining the component pieces after they're used is
reversed. For example, =org-babel-load-session:shell= relies on
=org-babel-prep-session:shell= which is defined first. I find
defining terms before they're used makes a document more easy to
comprehend than otherwise. I want to reorder the definitions.
Similarly, some functions could use breaking out. The most
important is probably =org-babel-sh-evaluate= which handles the
various header arguments. There are various paths of execution
each requiring separate logic, yet all live in the one large
function. Breaking these out would allow them to have separate
docstrings and, I expect, be easier to understand since the
logic of one would be (lexically) separated from the rest.
Other functionality might be better served by consolidating
functions. There's a lot of fiddly code dedicated to variable
assignment. Actually, much of the ob-shell.el file is related to
variable assignment. The assignments are done using separate
functions, yet all apply to the same task. They'll never be used
for anything else. Do they need to be split out? Is there a
technical reason? I don't see one. Does it help comprehension?
When split out as they are, I found it hard to make sense of the
context they're used in since they're defined apart from the
logic that uses them (i.e. what header uses them, what form does
the header take, etc.). I think it's worth seeing if there's a
better way to present that part of the code base.
** Naming
The following apply to all shells, not just "sh" and should be
updated to be "shell". This looks like cruft from when
ob-shell.el was called ob-sh.el AFAICT.
- =org-babel-sh-evaluate=
- =org-babel-sh-eoe-indicator=
- =org-babel-sh-eoe-output=
- =org-babel--variable-assignments:sh-generic=
- =org-babel-sh-var-to-sh=
- =org-babel-sh-var-to-string=
- =org-babel-sh-initiate-session=
- =org-babel-sh-evaluate=
- =org-babel-sh-strip-weird-long-prompt=
Generally speaking, I find the Org Babel code base tricky to
read (especially at first). I spent a good deal of time
untangling what lived where and who did what. I can play along
fine now that I'm familiar. However, since understanding took
longer than I think was necessary, I want to detail the pain
points as they have made contributing to Babel harder.
Overall, Babel somewhat breaks the (elisp) Coding Conventions
for naming,
#+begin_quote
You should choose a short word to distinguish your program from
other Lisp programs. #+end_quote
I understand the variable/function name prefix to be the file
name, typically. The file name is often the package name, or
more precisely the feature provided by the file[fn:3]. For Org
Babel, there's not a solid file-to-prefix relation. We say "Org
Babel", but the main functionality is in ob-core and the various
"ob-" files either extend or implement implied behavior (e.g.
=org-babel-<lang>-execute=). Is the "program" ob-core, ob-lang,
or the whole suite of files? This is a subjective question
which the Org Babel "program" answers with, "the whole suite of
files". All components, across all "ob-" files, bear the name
"org-babel-". This is still something that trips me up: is the
current symbol core or not? Who is responsible for what?
I would expect the core API to have its own prefix. The
extensions would then define their code and have a different
prefix, "ob-<lang>-". This way, readers/contributors could open
the pertinent ob-* file, see the expected symbol prefix (e.g.
"ob-shell-") and another prefix (e.g. "org-babel-") and be able
to distinguish which is which. As it stands, ob-core.el could
be renamed to org-babel.el or the "org-babel-" prefix could be
changed to "ob-core-".
Another possible solution, or a stopgap, would be to have a
document detailing the Org Babel API[fn:4].
* Process interaction
Emacs has several different ways of interacting with processes.
The ob-shell.el code uses a few of them. Since async is another
way to interact with a process, a single process pattern could
be used. The goal would be to make each of the different
functionalities provided by ob-shell.el have a similar
implementation. The expectation is that this would benefit
maintenance.
* Footnotes
[fn:1]
https://orgmode.org/worg/org-contrib/babel/languages/ob-doc-shell.html#orgfa6b7c5
[fn:2] M-up is bound to =org-metaup-hook= and
=ob-core:org-babel-load-in-session= by default.
[fn:3] It's not clear to me if there's a technical definition
for an
Emacs package.
[fn:4] I may extend my personal notes into a document detailing
the
Org API. http://excalamus.com/2021-11-03-org_babel.html
--
Thomas S. Dye
https://tsdye.online/tsdye