Re: [PATCH] CJK ambiguous width for non-Unicode charsets
Documentation change to go with the newlib patch at http://www.cygwin.com/ml/newlib/2010/msg00604.html: * setup2.sgml (setup-locale-ov): Document CJK ambiguous width change for non-Unicode charsets. * new-features.sgml (ov-new1.7.8): Mention CJK ambiguous width change. (Btw, "Drop support for Windows NT4 prior to Service Pack 4" appears twice in new-features.sgml). Andy ambiwidth2-doc.patch Description: Binary data
locale initialization issue
Hi, I stumbled across an issues with locale initialization when the "C" locale is specified in the environment. $ cat test.c #include #include #include #include int main(void) { char cs[8]; puts(nl_langinfo(CODESET)); printf("%i\n", wctomb(cs, 0x80)); return 0; } The program doesn't call setlocale, so it should be using the "C" locale with its ASCII charset, which means the wctomb() call with a codepoint outside the ASCII range should fail. And that's exactly what happens as long as the locale set in the environment is something other than "C", e.g.: $ LC_ALL=C.UTF-8 ./test ANSI_X3.4-1968 -1 $ LC_ALL=en_GB.ISO-8859-15 ./test ANSI_X3.4-1968 -1 However, if the environment locale is "C", the charset is still reported as ASCII (aka ANSI_X3.4-1968), but the wctomb call suddenly succeeds: $ LC_ALL=C ./a ANSI_X3.4-1968 2 That's due to a combination of three things: Cygwin newlib starts with the __wctomb and __mbtowc function pointers set to the UTF-8 variants (for conversions during early Cygwin initialization), yet the LC_CTYPE locale is set to "C", and setlocale() does nothing if the requested locale is the same as the previous one. Hence, with the locale set to "C" in the environment, both the setlocale call from initial_setlocale(), which asks for the environment locale for filename conversion, and the setlocale() just before main() that sets the "C" locale, end up doing nothing. Thus the conversion functions remain set to the UTF-8 variants instead of being set to the ASCII ones as intended for the "C" locale. The attached small patch addresses this by starting with the LC_CTYPE locale set to "C.UTF-8" and lc_ctype_charset set accordingly too. This means that setting the "C" locale is recognised as a change and that the conversion function pointers are updated accordingly. It also has the happy side effect that the setlocale call from initial_setlocale() will be short-circuited if the default "C.UTF-8" locale has not been overridden in the environment. Additionally, I think it's time to drop the "temporarily" #if 0'd code for making UTF-8 the charset for the "C" locale. It's a newlib patch, but it's entirely Cygwin-specific, so it seemed more appropriate to send it here. * libc/locale/locale.c [__CYGWIN__] (current_categories, lc_ctype_charset): Start with the LC_CTYPE locale set to "C.UTF-8", to match initial __wctomb and __mbtowc settings. (lc_message_charset, loadlocale): Settle on ASCII as the "C" charset. Andy lc_ctype.patch Description: Binary data
Add locale.exe option for querying Windows UI languages
The attached patch adds a --interface/-i option to locale.exe that makes the --system/-s and --user/-u options print the respective default UI language instead of the default locale. * locale.cc: Add --interface option for printing Windows default UI languages. For background, here's what Windows' various default locales and languages do: - LOCALE_USER_DEFAULT: This reflects the setting on the Formats tab of the (Windows 7) Region&Language control panel, which affects the format of times, dates, numbers, and currency. - LOCALE_SYSTEM_DEFAULT: This reflects the "Language for non-Unicode programs" on the Adminstrative tab of Region&Language control panel, which also determines the ANSI and OEM codepages. - GetUserDefaultUILanguage(): This is the current user's Windows UI language, also called display language. On Windows installs with multiple UI languages, a setting for this appears on the "Keyboards and Languages" tab of the Region&Language control panel. - GetSystemDefaultUILanguage(): The is the system-wide UI language used for things that aren't user-specific, e.g. the login screen. As far as I know it's determined at Windows install time and can''t be changed. (The latter two APIs are available from Windows 2000 onwards.) Looking at those, and if we wanted to base the Cygwin locale settings on the Windows ones, I think LC_NUMERIC, LC_TIME, and LC_MONETARY should be determined by LOCALE_USER_DEFAULT, but LC_MESSAGES should be determined by GetUserDefaultUILanguage(). Not sure about LC_CTYPE and LC_COLLATE, but I suppose it would make sense for character classification and sorting to match the UI language. See also this blog post by MS's "Dr International" Michael Kaplan: http://blogs.msdn.com/b/michkap/archive/2006/05/13/596971.aspx Andy ui_lang.patch Description: Binary data
Re: Add locale.exe option for querying Windows UI languages
On 8 October 2011 16:03, Corinna Vinschen wrote: > > On Oct 8 10:24, Andy Koppe wrote: >> The attached patch adds a --interface/-i option to locale.exe that >> makes the --system/-s and --user/-u options print the respective >> default UI language instead of the default locale. >> >> * locale.cc: Add --interface option for printing Windows default UI >> languages. >> >> For background, here's what Windows' various default locales and languages >> do: >> >> - LOCALE_USER_DEFAULT: This reflects the setting on the Formats tab of >> the (Windows 7) Region&Language control panel, which affects the >> format of times, dates, numbers, and currency. >> >> - LOCALE_SYSTEM_DEFAULT: This reflects the "Language for non-Unicode >> programs" on the Adminstrative tab of Region&Language control panel, >> which also determines the ANSI and OEM codepages. >> >> - GetUserDefaultUILanguage(): This is the current user's Windows UI >> language, also called display language. On Windows installs with >> multiple UI languages, a setting for this appears on the "Keyboards >> and Languages" tab of the Region&Language control panel. >> >> - GetSystemDefaultUILanguage(): The is the system-wide UI language >> used for things that aren't user-specific, e.g. the login screen. As >> far as I know it's determined at Windows install time and can''t be >> changed. > > I like the idea of the patch, but I'm wondering if this is the right > approach. I wasn't aware of the difference between the LOCALE_FOO_DEFAULT > values and what the GetFooDefaultUILanguage functions return, otherwise > I would have probably used the GetFooDefaultUILanguage functions right from > the start. > > What I mean is this. The locale -u/-s functionality was supposed to be > used to set the $LANG value preferredly. Since LANG means language in > the first place, the UI language is a much more natural choice for the > default -s/-u functionality, isn't it? Makes sense. > Therefore, afaics, it would be better if we change locale to use the > GetFooDefaultUILanguage functions by default, and we add a modifier > (-r/--region?) to switch to LOCALE_FOO_DEFAULT. > > Either way, the usage output will have to be improved. Maybe we should > explicitely state that the values printed refer to the Windows values, > and that one of them is the UI locale and the other is the... hmm... > how to say it..., maybe the "region settings locale" or so. How about having one option for each of the Windows settings, and dividing the help output into groups, like so: POSIX locale options: -a, --all-localesList all available supported locales -c, --category-name List information about given category NAME -k, --keyword-name Print information about given keyword NAME -m, --charmaps List all available character maps Windows locale options: -u, --user-lang Print user default UI language -s, --system-langPrint system default UI language -f, --format Print user format setting for times, numbers & currency -n, --non-unicodePrint system locale for non-Unicode programs -U, --utfAttach ".UTF-8" to the result Other options: -v, --verboseMore verbose output -h, --help This text >> Looking at those, and if we wanted to base the Cygwin locale settings >> on the Windows ones, I think LC_NUMERIC, LC_TIME, and LC_MONETARY >> should be determined by LOCALE_USER_DEFAULT, but LC_MESSAGES should be >> determined by GetUserDefaultUILanguage(). Not sure about LC_CTYPE and >> LC_COLLATE, but I suppose it would make sense for character >> classification and sorting to match the UI language. > > The system should not set the LC_xxx values at all. From my POV the > system should only default to some $LANG, while setting the LC_xxx > values is the job of the user if the $LANG value doesn't suffice. Not sure about that, but it's not really worth discussing unless we do decide to follow the Windows setting(s) by default. Andy
Re: Add support for creating native windows symlinks
On 4 December 2011 07:07, Russell Davis wrote: > This was discussed before here: > http://cygwin.com/ml/cygwin/2008-03/msg00277.html > > These were the reasons given for not using native symlinks to create > cygwin symlinks, along with my responses: > > - By default, only administrators have the right to create native > symlinks. Admins running with restricted permissions under UAC don't > have this right. > > This is true, however the feature can be made optional through the > CYGWIN environment variable (just like winsymlinks). For users that > can add the permission or disable UAC, the use of native symlinks is a > huge step towards making cygwin more unified with the rest of Windows. > > - When creating a native symlink, you have to define if this symlink > points to a file or a directory. This makes no sense given that > symlinks often are created before the target they point to. > > Also true. However, the type only matters for Windows' usage of the > symlink -- cygwin already treats both the types the same. For example, > if a native symlink of type `file` actually points to a directory, it > will still work fine inside cygwin. It won't work for Win32 programs > that try to access it, but that's still no worse than the status quo > -- Win32 programs already can't use cygwin symlinks. > > Since cygwin already supports reading of native symlinks, I was able > to add support for this with a fairly small change. Some edge cases > probably still need to be handled (disabling for older versions of > windows and unsupported file systems), but I wanted to get this out > there for review. The patch is attached. Those aren't all the issues with using native symlinks as Cygwin symlinks. POSIX symlinks of course are supposed to point to POSIX paths, whereas native links point to Windows paths, with the following consequences: - Native links can't point to special Cygwin paths such as /proc and /dev, although I guess that could be fudged. - If the meaning of the POSIX path changes due to Cygwin mount point changes, native symlinks won't reflect that and point to the wrong thing. - Native relative links can't cross drive boundaries, whereas relative POSIX paths can reach the whole filesystem. I think the better approach here is to have an ln-like utility that creates Windows symlinks, as proposed by Daniel Colascione at http://cygwin.com/ml/cygwin/2011-04/msg00059.html. Perhaps it could be added to cygutils if it was knocked into appropriate shape. (The main advantage over using Windows facilities, in particular cmd.exe's mklink builtin, would be an ln-like interface and Cygwin charset support.) Andy
Re: console enhancements: mouse events
2009/11/7 Corinna Vinschen: >> Mintty roughly does the following for Ctrl(+Shift)+symbol combinations: >> - obtain the keymap using GetKeyboardState() >> - set the state of the Ctrl key to released >> - invoke ToUnicode() to get the character code according to the keyboard >> layout >> - if the character code is one of [\]_^? send the corresponding control code >> - otherwise, set the state of both Ctrl and Alt to pressed (this is >> equivalent to AltGr), and try ToUnicode() again >> >> The last step means that e.g. Ctrl+9 on a German keyboard will send >> ^]. The proper combination would be Ctrl+AltGr+9, but since >> AltGr==Ctrl+Alt, that can't be distinguished from AltGr+9 without >> Ctrl. (Well, not without somewhat dodgy trickery anyway.) > > How does that work for ^^? The ^ key is a deadkey on the german keyboard > layout, so the actual char value is only generated after pressing the key > twice. Just curious. ToUnicode actually delivers the ^ character right away when pressing the key, but with return value -1 to signify that it's a dead key and that the next key will be modified accordingly. So for Ctrl+^, mintty sends ^^ right away and then clears the dead key state using a trick picked up from http://blogs.msdn.com/michkap/archive/2006/04/06/569632.aspx: feed VK_DECIMALs into ToUnicode until it stops returning dead-key characters. (Yep, it's a terrible API for an unintuitive feature. See also http://blogs.msdn.com/michkap/archive/2005/01/19/355870.aspx) Andy
Re: console enhancements: mouse events
Thomas Wolff: >>> Note: This works on my home PC (Windows XP Home) but it's not effective >>> on my work PC (Windows XP Professional) where the mouse wheel scrolls the >>> Windows console (which it doesn't on the other machine); I don't know how >>> to disable or configure this. I've come across a similar issue in mintty: on some machines, if the scrollbar is enabled, mousewheel events never reach the window's event loop. I never really got to the bottom of it, but I think it's to do with mouse drivers: some appear to send mousewheel events straight to a window's scrollbar if there is one, no matter where in the window the mouse is positioned. >> [Ctrl+AltGr+key stuff] > Thanks Andy for pointing to the part of mintty code handling this. However, > the whole function there looks too complex for a quick copy-paste-patch. Nested functions, big switches, and even a couple of gotos, that function has it all. ;) > Maybe later... or Andy might like to factor out the mapping part in a way > directly reusable for the cygwin console? Erm, no. The crucial subfunction there is undead_keycode(). > It > does not work however, even for ASCII characters, for characters produced > with AltGr, e.g. Alt-AltGr-Q where AltGr-Q is @ (German keyboard). Andy got > this to work in mintty (I think with some other subtle trick after I > challenged him for it IIRC); it does not work in xterm either. You can tell whether both Alt and AltGr are down by checking VK_LMENU and VK_RMENU. Andy
making scanf byte-clean(er)
Attached is a patch for making the scanf format string (more) byte-transparent. It actually couldn't deal with non-ASCII chars at all, even valid ones, due to comparing an 'unsigned char' with a (signed) 'char'. And when encountering an invalid byte, it would go backwards in the format string. Finally, it wrongly reset the multibyte conversion state for every character and used the same state object for the format string and %ls arguments. I thought I'd send the patch here for review first before sending it upstream. Hope that makes sense. Andy scanf.patch Description: Binary data
minor doc corrections
Attached is a patch with some minor locale-related doc corrections. Mostly just typos and removing stuff that no longer applies. Andy doc.patch Description: Binary data
[PATCH] internal_setlocale tweak
winsup/cygwin/ChangeLog: * nlsfuncs.cc (internal_setlocale, initial_setlocale): Move check whether charset has changed to internal_setlocale, to avoid unnecessary work when invoked via CW_INT_SETLOCALE. Sufficiently trivial, I hope. Andy int_setlocale.patch Description: Binary data
Re: console enhancements: mouse events etc
> How can I enforce printing garbage so I > can test the reset command? echo $'\e(0'
[PATCH] Cygwin: Correct /proc/*/stat for processes without ctty
Hi, I had noticed that selecting or excluding processes without a controlling terminal doesn't work in procps on Cygwin. For example, the mintty process shouldn't appear in the following, as the 'f' (for forest) argument triggers procps's "BSD personality", where processes without a controlling terminal are supposed to be excluded by default: $ procps f PID TTYSTAT STIME COMMAND 1809 ? Ss19:49 /usr/bin/mintty - 1810 pty0 Ss19:49 \_ -zsh 2075 pty0 R 21:14 \_ procps f Similarly, this should list the processes without a terminal, but comes up empty: $ procps -t - I tracked this down to a difference in the tty field of /proc/*/stat (which is the 7th field). On Linux, processes without a terminal have value 0 there, and that's what procps expects. Cygwin 3.3 has -1 instead, whereas on master the bits of the tty field were rearranged in commit 437d0a8f88, which turns the -1 into 268435455 (i.e. 0xFFF). Either way, procps treats such processes as having terminals. (The ? in the TTY column output is generated by a different code path in procps that uses /proc/*/ctty on Cygwin.) Patches for the 3.3 branch and master attached. Regards, Andy 0001-Cygwin-proc-stat-ctty.patch Description: Binary data 0001-Cygwin-proc-stat-ctty-3.3.patch Description: Binary data