Hi!

----

CC:'ing gatekeepers since it's AFAIK their decision to accept or
reject this (and I am praying to whatever Discworld god currently
cares about programmers (Bel-Shamharoth ?) that this proposal gets
a "thumbs up" since parts of the ksh93-integration prototype004
Makefiles heavily depend on SHELL=/bin/ksh (which is thought as
intermediate step to the use of SHELL=/bin/ksh93, e.g. this
RFE)).

My basic proposal is to allow the usage of either SHELL=/bin/sh
or SHELL=/bin/ksh93 in OS/Net Makefiles as _choice_ for all OS/Net
projects.
Allowing SHELL=/bin/ksh93 in OS/Net Makefiles focusses at using POSIX
shell syntax constructs and some straightforward extensions to this
standard, including floating-point math, typed variables, functions
with local variables, associative arrays (e.g. strings as index value
in arrays) and extended pattern matching (e.g. ~(modifier)pattern,
e.g. ~(E) means the pattern is a extended regular expressions for
[ ... ] and [[ ... ]] tests instead of shell patterns (where 'E'
means extended regular expressions, 'G' regular expressions, 'F'
is like fgrep, 'S' for shell pattern (e.g. the default) and things
like 'i' are for case-insensitive, 'l' for left anchor, 'r' for
right anchor etc.))).


I've collected some arguments "pro" and "contra" for this discussion:
* Pro SHELL=/usr/bin/ksh93+SHELL=/usr/bin/ksh
- Simplify shell script(-lets) in Makefiles, e.g. shorter, cleaner,
more compact and more readable shell code which doesn't require
brain-melting stunts and the usage of lots of external filters like
awk/sed/etc. to do some simple tasks (e.g. no more
$ (x="hello world" ; echo "$x" | awk "{ print "\$2" }") # to get the
2nd word in a string ; $ (x="hello world" ; print ${x//*[$IFS]/}) #
would be one of multiple ways to get the same effect in ksh93 (in
this case saving three fork()s and a pipe chain)).

- No longer any line/string/argument list length limitation for
shell scripts (e.g. in the case of builtin commands even command
argument lists no longer have any limitations ; we've hit several
of these limitations in usr/src/lib/libast/Makefile.com since the
object list is quite large)

- Much faster execution thanks to many builtin commands and less
|fork()|'ing.
(I don't have exact benchmarks right now... doing any benchmarks on
multiuser machines or on machines which are under heavy stress by
other jobs is  more or less for /dev/null. I've seen improvemnts in
the range of 15-18mins on my Ultra5 which needs more than a day to
crawl over OS/Net but I would not call this a reliable source because
this machines is heavily paging during such a build (and I shouldn't do
profiling work or provide numbers (at least not after burning my claws
with the "... save 10.1MB in the kernel via -xstrconst"-thing)).
Based on the dtrace fork-o-meter I know that we're saving around 1600
forks just by replacing SHELL=/bin/sh with SHELL=/bin/ksh93 (not
counting any builtin usage) but that doesn't give any timing values or
results from tweaking of the whole tree including the usage of ksh93
builtins.

- POSIX shell operators, e.g. $( ... ), (( ... )), even nesting is
allowed (note that this doesn't trigger the creation of a sub-process,
e.g. ksh93 avoids |fork()| if possible).

- POSIX operators for "test"/"[ ... ]"

- Pipelines etc. properly create process groups (bourne shell doesn't do
that which quickly leads to problems with monitoring (parallel) builds)

- Associative arrays (quite usefull to replace _some_ common
mini-sed/awk scripts). In general the size of these arrays and
strings have no length/size limitation (like Solaris sed/awk has)
and can therefore avoid existing limits (note: I am not suggesting to
replace all sed/awk usage but things like $ (echo "foo bar bark bark" |
awk "{ print \$2 }") # is AFAIK an overkill (e.g. the size of the r/w
data of sed/awk is far larger than the amount of data being processed)).

- ksh93 is multibyte-locale aware and properly operates on character
boundaries (the old bourne shell treats character==byte). This would
provide the future option to use something like "en_US.UTF-8" (or
other multibyte locales) as (default) build locale in the future
or at least the ability to process such data during the OS/Net
build (or use multibyte locales for testing).

- More powerfull pattern matching (e.g. things like
$ [ "`echo $foo | egrep x=.*`" != "" ] && ... # could be replaced with $
[[ $foo = ~(E)x=.* ]] && ... # (saving at least three |fork()|s)).

- Various string operators like "+=" can replace the bourne shell
usage (e.g. i="${i}hello" could be replaced with i+="hello").

- Following Stephen Hahn's "invitation" to get "rid" of the old ksh
usage
(http://mail.opensolaris.org/pipermail/opensolaris-code/2007-January/003811.html)
... question is whether this can be extended to /usr/bin/sh, too (erm,
note: I am not suggesting to do a tree-wide removal of /usr/bin/sh -
this proposal is only to allow projects to choose between "/usr/bin/sh"
and "/usr/bin/ksh93" in OS/Net Makefiles) ...

- ksh93/ksh is upwards-compatible to /usr/bin/sh and therefore switching
over any existing Makefile doesn't harm.

- ksh93 will be used in any case during the OS/Net build, at least to
generate the localisation (=l10n) files for libshell, libcmd, libdll,
libpp, libast and likely for any future ksh93 script which uses
ksh93/bash's builtin localisation features (e.g. $"..." strings
instead of /usr/bin/gettext (see ksh93(1)) and "ksh93 -D" (or "shcomp
-D") is used to extract and compile the matching message catalogs) and
therefore I think it doesn't harm to use it in the Makefiles, too.

- ksh93 has extended debugging facilities which may be very usefull to
debug Makefile problems

- ksh93 will get DTrace support soon (which would allow users to monitor
build activities via DTrace)


* Contra:
- Adds a possible "flavor of the day"-shell to OS/Net.
(Note: I know that Sun+OS/Net has a large history of things which have
fallen into disrepair (like the smartcard framework) but I hope we can
do this better in this case: Remember that I'd like to provide all
neccesary bits (scripts (like "buildksh93.ksh"), documentation etc.)
over the time that other people can maintain and update ksh93 in
OS/Net, ksh93 is "guarded" by it's own testsuite (which is accessible
within OS/Net and even shipped as part of the demo package) and that
AFAIK the "commands" group that April is part of is accepting
responsibility for bug fixing).

- Potentially requires learning POSIX shell syntax, plus some ksh93
extensions (to get some benefits from the change; note that ksh93 is
backwards-compatible to /bin/sh (bourne shell) which means existing
constructs do not require any changes)

- May introduce build instabilities if there are bugs in the shell

- <Insert more horrible things here>

Comments/suggestions/etc. welcome...

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [EMAIL PROTECTED]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to