2014-11-08 20:19 GMT-08:00 Jim Meyering <j...@meyering.net>: > On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <egg...@cs.ucla.edu> wrote: >> hex_printf_() >> { >> hex_printf_format=$(printf '%s\n' "$1" | sed ' >> s/^/_/ >> s/$/_/ >> s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g >> s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g >> s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g >> s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g >> s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g >> s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g >> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g >> s/^_// >> s/_$// >> ') >> shift >> printf "$hex_printf_format" "$@" >> } > > How elegantly twisted ;-) > I like it. > > Do you have time to write the complete patch? > I'd like to make a pre-release snapshot tomorrow.
I tried it, and found that this new function makes the multibyte-white-space test fail with GNU sed. Here's a simplified example showing where it goes wrong. This shows that only the first \x285 is transformed into \x2,05: $ printf '%s\n' '_\x285\x285\n_' \ |sed 's/\([^\\]\(\\\\\)*\\x[0-3]\)[ 048cC]\([0-7]\)/\1,0\3/g' _\x2,05\x285\n_ The intent was that it transform both, of course. The trouble arises when the regexp consumes all 3 hex digits. Then there is no longer a non-backslash remaining to be consumed on 2nd and subsequent iterations. There is also a portability problem in that Solaris 5.10's /bin/sed seems unable to handle some of that code. For example, using that same example with its /bin/sed, neither \x285 string is transformed.