Re: Patch for unicode in varnames...

2017-06-15 Thread Eduardo Bustamante
I'll leave my progress on the Unicode identifiers patch here (or the PR in Github, if you fancy that: https://github.com/dualbus/bash/pull/2/files). I won't have much time to work on this for a few weeks, so it's up to you all to complete it :-) It has markers on the places where it needs work (ma

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/13/17 5:55 PM, tetsu...@scope-eye.net wrote: > > ...Though Apple now sticks to Bash 3.2 to avoid GPL v3 right? Makes 'em > kind of an odd use-case, and maybe a bit irrelevant with respect to the > future direction of Bash. Maybe, but that wasn't the question. Mac OS X arguably has the large

Re: Patch for unicode in varnames...

2017-06-13 Thread tetsujin
...Though Apple now sticks to Bash 3.2 to avoid GPL v3 right? Makes 'em kind of an odd use-case, and maybe a bit irrelevant with respect to the future direction of Bash. - Original Message - From: chet.ra...@case.edu To:"L A Walsh" , "Greg Wooledge" Cc:"bug-bash" , Sent:Tue, 13 Jun 2017

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 4:08 PM, L A Walsh wrote: >Thanks for the history lesson (for those of us who were too lazy to > goog it ;-)). > Still, on what OS has it shown the most growth or popularity? The entire set of Linux machines is up there, but Mac OS X (and its descendents like iOS and tvOS) has many

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 10:40 AM, PePa wrote: > On 06/06/2560 21:20, Greg Wooledge wrote: >> Scripts that can only *run* in a UTF-8 encoding-locale are a bad idea. > > Even currently, when functions in a bash script are beyond ASCII, they > can still be run anywhere. I would imagine it would be the same when >

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 4:08 AM, Peter & Kelly Passchier wrote: > > > On 06/06/2560 14:37, George wrote: >> Broadly speaking I think the approach taken in Eduardo's patch >> (interpreting the byte sequence according to the rules of its character >> encoding) is better than the approach taken in current version

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/6/17 3:37 AM, George wrote: > that approach only works if you know the > correct character encoding to use when processing the script. The information > has to be provided in the script somehow. It can be provided in the usual way: by specifying the appropriate locale via assignments to the

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 8:40 PM, Peter & Kelly Passchier wrote: > On 06/06/2560 05:39, George wrote: >> So if you had "Pokémon" as an identifier in a Latin-1-encoded script (byte >> value 0xE9 between the "k" and "m") and then tried running that script in a >> UTF-8 locale, that byte sequence (0xE9 0x6D) would

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 6:39 PM, George wrote: > If Bash did go the route of using the locale to set the character encoding of > a script, I think it would be best to have a mechanism a script can use > to define the character encoding for the whole script file up front, rather > than setting LC_CTYPE to proc

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/5/17 4:52 AM, George wrote: > the character type for the character conversion is derived from the user's > locale > (which means there's not a reliable mechanism in place to run a script in a > locale whose character encoding doesn't match that of the script.) There isn't today. The burden

Re: Patch for unicode in varnames...

2017-06-13 Thread Chet Ramey
On 6/4/17 2:47 PM, L A Walsh wrote: > >Clarification, please, but it looks like with your > patch below, Unicode in variable names might be fairly close > to being achieved? Seeing how it was done for functions, > gave you insight into how variables could be done, yes? No. It's not `don

Re: Patch for unicode in varnames...

2017-06-09 Thread L A Walsh
Dethrophes wrote: This would be a bad idea in the same way that having control characters in filenames is a bad idea, just because you can do something doesn't mean you should. Not true. While POSIX has *discussed* disallowing control characters in filenames, using Unicode in filenames is

Re: Patch for unicode in varnames...

2017-06-07 Thread Peter & Kelly Passchier
On 08/06/2560 04:33, Dethrophes wrote: > This would be a bad idea in the same way that having control characters in > filenames is a bad idea, just because you can do something doesn't mean you > should. I would personally advocate NOT to use it in code. But then, I am biassed by upbringing towa

Re: Patch for unicode in varnames...

2017-06-07 Thread Dethrophes
Instead of talking in terms of seriousness, it may be more use to think in terms of formality. Even in gramitically strong and formal languages variable and function names are restricted in the characters they may use. This is not just because it makes the parsing simpler but because it simpli

Re: Patch for unicode in varnames...

2017-06-07 Thread George
On Tue, 2017-06-06 at 10:20 -0400, Greg Wooledge wrote: > (OK, in reality, I am not taking any of this seriously.  This entire > proposal and discussion are like some bizarre fantasy land to me.  Bash > is a SHELL, for god's sake.  Not a serious programming language.  Even > serious programming lan

Re: Patch for unicode in varnames...

2017-06-06 Thread L A Walsh
Greg Wooledge wrote: On Tue, Jun 06, 2017 at 12:02:41PM -0700, L A Walsh wrote: Bash *is* the linux shell. It's being adopted elsewhere, but it seems to have first grown in use in the linux community. Bash predates Linux. Bash was first released in 1989. Linux wasn't released until

Re: Patch for unicode in varnames...

2017-06-06 Thread Greg Wooledge
On Tue, Jun 06, 2017 at 12:02:41PM -0700, L A Walsh wrote: > Bash *is* the linux > shell. It's being adopted elsewhere, but it seems to have first grown > in use in the linux community. Bash predates Linux. Bash was first released in 1989. Linux wasn't released until 1991. Bash is the GNU shel

Re: Patch for unicode in varnames...

2017-06-06 Thread L A Walsh
Greg Wooledge wrote: On Tue, Jun 06, 2017 at 07:01:23AM -0700, L A Walsh wrote: George wrote: On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote: George wrote: On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: On 05/06/2560 15:52, George wro

Re: Patch for unicode in varnames...

2017-06-06 Thread dualbus
On Tue, Jun 06, 2017 at 10:20:03AM -0400, Greg Wooledge wrote: [...] > (OK, in reality, I am not taking any of this seriously. This entire > proposal and discussion are like some bizarre fantasy land to me. Bash > is a SHELL, for god's sake. Not a serious programming language. Even > serious pr

Re: Patch for unicode in varnames...

2017-06-06 Thread PePa
On 06/06/2560 21:20, Greg Wooledge wrote: > Scripts that can only *run* in a UTF-8 encoding-locale are a bad idea. Even currently, when functions in a bash script are beyond ASCII, they can still be run anywhere. I would imagine it would be the same when variable names are also allowed to be in so

Re: Patch for unicode in varnames...

2017-06-06 Thread Greg Wooledge
On Tue, Jun 06, 2017 at 07:01:23AM -0700, L A Walsh wrote: > George wrote: > >On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote: > >>George wrote: > >>>On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: > On 05/06/2560 15:52, George wrote: > >there's not a reliable mechanism

Re: Patch for unicode in varnames...

2017-06-06 Thread L A Walsh
George wrote: On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote: George wrote: On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: On 05/06/2560 15:52, George wrote: there's not a reliable mechanism in place to run a script in a locale whose character encoding doesn't match t

Re: Patch for unicode in varnames...

2017-06-06 Thread Peter & Kelly Passchier
On 06/06/2560 14:37, George wrote: > Broadly speaking I think the approach taken in Eduardo's patch > (interpreting the byte sequence according to the rules of its character > encoding) is better than the approach taken in current versions of Bash > (letting 0x80-0xFF slide through the parser) -

Re: Patch for unicode in varnames...

2017-06-06 Thread Peter & Kelly Passchier
On 06/06/2560 14:37, George wrote: > As it stands, it's possible in Bash to use bytes in the 0x80-0xFF range as > part of function names, for instance, because the Bash parser treats all of > these byte values as valid "word" characters. This makes the Bash parser > fairly "encoding neutral", whi

Re: Patch for unicode in varnames...

2017-06-06 Thread George
On Mon, 2017-06-05 at 16:16 -0700, L A Walsh wrote: > George wrote: > > > > On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: > >    > > > > > > On 05/06/2560 15:52, George wrote: > > > > > > > > > > > there's not a reliable mechanism in place to run a script in a locale > >

Re: Patch for unicode in varnames...

2017-06-05 Thread Peter & Kelly Passchier
On 06/06/2560 05:39, George wrote: > So if you had "Pokémon" as an identifier in a Latin-1-encoded script (byte > value 0xE9 between the "k" and "m") and then tried running that script in a > UTF-8 locale, that byte sequence (0xE9 0x6D) would actually be invalid in > UTF-8, so Eduardo's patch wou

Re: Patch for unicode in varnames...

2017-06-05 Thread L A Walsh
George wrote: On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: On 05/06/2560 15:52, George wrote: there's not a reliable mechanism in place to run a script in a locale whose character encoding doesn't match that of the script From my experience running such scri

Re: Patch for unicode in varnames...

2017-06-05 Thread George
On Mon, 2017-06-05 at 15:59 +0700, Peter & Kelly Passchier wrote: > On 05/06/2560 15:52, George wrote: > > > > there's not a reliable mechanism in place to run a script in a locale > > whose character encoding doesn't match that of the script > From my experience running such scripts is no problem

Re: Patch for unicode in varnames...

2017-06-05 Thread dualbus
On Mon, Jun 05, 2017 at 04:52:19AM -0400, George wrote: [...] > To hazard a guess: Each call to legal_identifier() and assignment() in > the patched code requires copying the parameter and translating it to > a wide-character string (with no provision for skipping the added work > as a build option

Re: Patch for unicode in varnames...

2017-06-05 Thread Peter & Kelly Passchier
On 05/06/2560 15:52, George wrote: > there's not a reliable mechanism in place to run a script in a locale > whose character encoding doesn't match that of the script >From my experience running such scripts is no problem, but correct rendering it might depend on the client/editor.

Re: Patch for unicode in varnames...

2017-06-05 Thread George
On Sun, 2017-06-04 at 11:47 -0700, L A Walsh wrote: > dualbus wrote: > > > > I hadn't realized that bash already supports Unicode in function names! > > FWIW: > > > >   bash-4.4$  > >   Lēv=? > >   Φ=0.618033988749894848 > >    > > > > With this terrible patch: > > > > dualbus@debian:~/src/gnu/

Re: Patch for unicode in varnames...

2017-06-04 Thread L A Walsh
dualbus wrote: I hadn't realized that bash already supports Unicode in function names! FWIW: bash-4.4$ Lēv=? Φ=0.618033988749894848 With this terrible patch: dualbus@debian:~/src/gnu/bash$ PAGER= git diff Clarification, please, but it looks like with your patch below, U