Re: New modes for cygpath that terminate path with null byte, nothing
On Aug 14 22:11, Daniel Colascione wrote: > On 8/14/12 10:02 PM, Christopher Faylor wrote: > > On Tue, Aug 14, 2012 at 02:49:17PM -0700, Daniel Colascione wrote: > >> On 7/27/2012 2:32 AM, Corinna Vinschen wrote: > >>> There's just the problem of the copyright assignment. If you want to > >>> provide a non-obvious patch, or if the patch adds new functionality, we > >>> need a copyright assignment from you. Please see the section "Before > >>> you get started" on http://cygwin.com/contrib.html and the assignment > >>> form http://cygwin.com/assign.txt > >>> > >>> As soon as my manager has the assignment, I'll apply your patch. > >> > >> Do you guys still want the patch (and papers that go with it)? If so, > >> could I > >> discuss the particulars of the assignment off-list somewhere? > > > > As I mentioned, it seems like you can easily get the functionality > > you're looking for by using standard Cygwin tools so, IMO, we don't need > > your patch. > > I didn't know what the final decision was; that's why I asked. Corinna > said she wanted the patch. As for the feature itself: piping through > tr is fine, but having cygpath output paths in the desired way in the > first place keeps error information around (without pipefail, $? is > tr's exit status) and involves fewer forks. There's precedent in "echo > -n" too. What's your problem with the assignment form? You can send me PM to my address as mentioned in the ChangeLog files. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: /dev/clipboard pasting with small read() buffer
Hi Thomas, thanks for the patch. I have a few minor nits: On Aug 14 22:56, Thomas Wolff wrote: > --- sav/fhandler_clipboard.cc 2012-07-08 02:36:47.0 +0200 > +++ ./fhandler_clipboard.cc 2012-08-14 18:25:14.903255600 +0200 > @@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr, >UINT formatlist[2]; >int format; >LPVOID cb_data; > + int rach; > >if (!OpenClipboard (NULL)) > { > @@ -243,12 +244,18 @@ fhandler_dev_clipboard::read (void *ptr, >cygcb_t *clipbuf = (cygcb_t *) cb_data; > >if (pos < clipbuf->len) > - { > + { > ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len); > memcpy (ptr, clipbuf->data + pos , ret); > pos += ret; > } > } > + else if ((rach = get_readahead ()) >= 0) > +{ > + /* Deliver from read-ahead buffer. */ > + * (char *) ptr = rach; > + ret = 1; See (*) below. > +} >else > { >wchar_t *buf = (wchar_t *) cb_data; > @@ -256,25 +263,46 @@ fhandler_dev_clipboard::read (void *ptr, >size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1; >if (pos < glen) > { > + /* If caller's buffer is too small to hold at least one > + max-size character, redirect algorithm to local > + read-ahead buffer, finally fill class read-ahead buffer > + with result and feed caller from there. */ > + char * _ptr = (char *) ptr; > + size_t _len = len; I would prefer to have local variable names here which don't just differ by a leading underscore. It's a bit confusing. What about, say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that? > + char cprabuf [8 + 1]; /* need this length for surrogates */ > + if (len < 8) > + { > + _ptr = cprabuf; > + _len = 8; > + } 8? Why 8? The size appears to be rather artificial. The code should use MB_CUR_MAX instead. > + > /* Comparing apples and oranges here, but the below loop could become >extremly slow otherwise. We rather return a few bytes less than >possible instead of being even more slow than usual... */ > - if (glen > pos + len) > - glen = pos + len; > + if (glen > pos + _len) > + glen = pos + _len; > /* This loop is necessary because the number of bytes returned by >sys_wcstombs does not indicate the number of wide chars used for >it, so we could potentially drop wide chars. */ > while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos)) > != (size_t) -1 > - && ret > len) > + && ret > _len) >--glen; > if (ret == (size_t) -1) > ret = 0; > else > { > - ret = sys_wcstombs ((char *) ptr, (size_t) -1, > + ret = sys_wcstombs ((char *) _ptr, (size_t) -1, > buf + pos, glen - pos); > pos = glen; > + /* If using read-ahead buffer, copy to class read-ahead buffer > + and deliver first byte. */ > + if (_ptr == cprabuf) > + { > + puts_readahead (cprabuf, ret); > + * (char *) ptr = get_readahead (); > + ret = 1; (*) Ok, that works, but wouldn't it be more efficient to do that in a tiny loop along the lines of int x; ret = 0; while (ret < len && (x = get_readahead ()) >= 0) ptr++ = x; ret++; ? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: /dev/clipboard pasting with small read() buffer
Hi Corinna, On 16.08.2012 11:33, Corinna Vinschen wrote: Hi Thomas, thanks for the patch. I have a few minor nits: On Aug 14 22:56, Thomas Wolff wrote: --- sav/fhandler_clipboard.cc 2012-07-08 02:36:47.0 +0200 +++ ./fhandler_clipboard.cc 2012-08-14 18:25:14.903255600 +0200 ... See (*) below. ... + char * _ptr = (char *) ptr; + size_t _len = len; I would prefer to have local variable names here which don't just differ by a leading underscore. It's a bit confusing. What about, say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that? tmp_OK + char cprabuf [8 + 1]; /* need this length for surrogates */ + if (len < 8) + { + _ptr = cprabuf; + _len = 8; + } 8? Why 8? The size appears to be rather artificial. The code should use MB_CUR_MAX instead. MB_CUR_MAX does not work because its value is 1 at this point (after adding #include anyway). I guess that's because it changes its value after setlocale(). To avoid interference with application invocation of setlocale(), however, it must not be called here (with which parameters anyway...). So the maximum length of all encodings needs to be used here which would be 4 (UTF-8, EUC-TW, GB 18030) if there were not the surrogate pairs of UTF-16. Since a single surrogate can be encoded with 3 UTF-8 bytes, I tried 6 to host two of them but it didn't work. So I increased to 8 which seemed to work; I may have been mislead, however, since another test case now shows that the patch does not always work with surrogates. It only works if the non-BMP character (the surrogate pair) does not extend over the border of two blocks of multiple size of these internal small buffers..., meaning e.g. if the buffer size is 8 and a non-BMP character starts at byte 7 of the clipboard, it will fail (only the second surrogate will be pasted). I don't know what to do about this right now; maybe some tweak of sys_wcstombs would help, it should perhaps convert to non-surrogate UTF-8 first before any other buffering considerations. --- Actually, having written all this, I just notice that surrogate pairs don't work with the "old" code either (using a larger caller buffer) whenever the character is not block-aligned as described above. So maybe for the current patch, 4 would indeed be good, while the surrogate problem should be fixed in sys_wcstombs. Update attached. ... + /* If using read-ahead buffer, copy to class read-ahead buffer +and deliver first byte. */ + if (_ptr == cprabuf) + { + puts_readahead (cprabuf, ret); + * (char *) ptr = get_readahead (); + ret = 1; (*) Ok, that works, but wouldn't it be more efficient to do that in a tiny loop along the lines of int x; ret = 0; while (ret < len && (x = get_readahead ()) >= 0) ptr++ = x; ret++; ? I can add it if you prefer; I just didn't think it's worth the effort and concerning efficiency, after that prior trial-and-error count-down-loop... -- Thomas
Re: /dev/clipboard pasting with small read() buffer
On Aug 16 14:11, Thomas Wolff wrote: > Hi Corinna, > > On 16.08.2012 11:33, Corinna Vinschen wrote: > >Hi Thomas, > > > >thanks for the patch. I have a few minor nits: > > > >On Aug 14 22:56, Thomas Wolff wrote: > >>--- sav/fhandler_clipboard.cc 2012-07-08 02:36:47.0 +0200 > >>+++ ./fhandler_clipboard.cc 2012-08-14 18:25:14.903255600 +0200 > >>... > >See (*) below. > > > >>... > >>+ char * _ptr = (char *) ptr; > >>+ size_t _len = len; > >I would prefer to have local variable names here which don't just > >differ by a leading underscore. It's a bit confusing. What about, > >say, tmp_ptr/tmp_len, or use_ptr/use_len or something like that? > tmp_OK > > >>+ char cprabuf [8 + 1]; /* need this length for surrogates */ > >>+ if (len < 8) > >>+ { > >>+ _ptr = cprabuf; > >>+ _len = 8; > >>+ } > >8? Why 8? The size appears to be rather artificial. The code should > >use MB_CUR_MAX instead. > MB_CUR_MAX does not work because its value is 1 at this point So what about MB_LEN_MAX then? There's no problem using a multiplier, but a symbolic constant is always better than a numerical constant. > >>+ /* If using read-ahead buffer, copy to class read-ahead buffer > >>+and deliver first byte. */ > >>+ if (_ptr == cprabuf) > >>+ { > >>+ puts_readahead (cprabuf, ret); > >>+ * (char *) ptr = get_readahead (); > >>+ ret = 1; > >(*) Ok, that works, but wouldn't it be more efficient to do that in > >a tiny loop along the lines of > > > > int x; > > ret = 0; > > while (ret < len && (x = get_readahead ()) >= 0) > > ptr++ = x; > > ret++; > > > >? > I can add it if you prefer; I just didn't think it's worth the > effort and concerning efficiency, after that prior trial-and-error > count-down-loop... Yeah, that's a valid point. But maybe we shouldn't make it slower than necessary? If you have a good idea how to avoid the other loop, don't hesitate to submit a patch. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat
Re: /dev/clipboard pasting with small read() buffer
On 16.08.2012 14:30, Corinna Vinschen wrote: On Aug 16 14:11, Thomas Wolff wrote: Hi Corinna, On 16.08.2012 11:33, Corinna Vinschen wrote: Hi Thomas, thanks for the patch. I have a few minor nits: On Aug 14 22:56, Thomas Wolff wrote: ... + char cprabuf [8 + 1]; /* need this length for surrogates */ + if (len < 8) + { + _ptr = cprabuf; + _len = 8; + } 8? Why 8? The size appears to be rather artificial. The code should use MB_CUR_MAX instead. MB_CUR_MAX does not work because its value is 1 at this point So what about MB_LEN_MAX then? There's no problem using a multiplier, but a symbolic constant is always better than a numerical constant. I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from limits.h (note the "_" distinction :) ), because the latter, by its preceding comment, reserves the option to be changed into a dynamic function in the future, which could then possibly have the same problems as MB_CUR_MAX. About the surrogates problem, I think I've found a solution: I've added an explicit test to avoid processing of split surrogate pairs (to that loop...); this seems to work now. + /* If using read-ahead buffer, copy to class read-ahead buffer +and deliver first byte. */ + if (_ptr == cprabuf) + { + puts_readahead (cprabuf, ret); + * (char *) ptr = get_readahead (); + ret = 1; (*) Ok, that works, but wouldn't it be more efficient to do that in a tiny loop along the lines of int x; ret = 0; while (ret < len && (x = get_readahead ()) >= 0) ptr++ = x; ret++; ? I can add it if you prefer; I just didn't think it's worth the effort and concerning efficiency, after that prior trial-and-error count-down-loop... Yeah, that's a valid point. But maybe we shouldn't make it slower than necessary? If you have a good idea how to avoid the other loop, don't hesitate to submit a patch. Added the loop to use up the caller's buffer. About avoiding the trial-and-error loop, I think that would require digging into sys_mbstowcs (which doesn't even seem to behave as documented). -- Thomas --- sav/fhandler_clipboard.cc 2012-07-08 02:36:47.0 +0200 +++ ./fhandler_clipboard.cc 2012-08-16 16:08:23.782692300 +0200 @@ -222,6 +222,7 @@ fhandler_dev_clipboard::read (void *ptr, UINT formatlist[2]; int format; LPVOID cb_data; + int rach; if (!OpenClipboard (NULL)) { @@ -243,12 +244,24 @@ fhandler_dev_clipboard::read (void *ptr, cygcb_t *clipbuf = (cygcb_t *) cb_data; if (pos < clipbuf->len) - { + { ret = ((len > (clipbuf->len - pos)) ? (clipbuf->len - pos) : len); memcpy (ptr, clipbuf->data + pos , ret); pos += ret; } } + else if ((rach = get_readahead ()) >= 0) +{ + /* Deliver from read-ahead buffer. */ + char * out_ptr = (char *) ptr; + * out_ptr++ = rach; + ret = 1; + while (ret < len && (rach = get_readahead ()) >= 0) + { + * out_ptr++ = rach; + ret++; + } +} else { wchar_t *buf = (wchar_t *) cb_data; @@ -256,25 +269,54 @@ fhandler_dev_clipboard::read (void *ptr, size_t glen = GlobalSize (hglb) / sizeof (WCHAR) - 1; if (pos < glen) { + /* If caller's buffer is too small to hold at least one +max-size character, redirect algorithm to local +read-ahead buffer, finally fill class read-ahead buffer +with result and feed caller from there. */ + char * conv_ptr = (char *) ptr; + size_t conv_len = len; +#define cprabuf_len _MB_LEN_MAX/* newlib's max MB_CUR_MAX of all encodings */ + char cprabuf [cprabuf_len]; + if (len < cprabuf_len) + { + conv_ptr = cprabuf; + conv_len = cprabuf_len; + } + /* Comparing apples and oranges here, but the below loop could become extremly slow otherwise. We rather return a few bytes less than possible instead of being even more slow than usual... */ - if (glen > pos + len) - glen = pos + len; + if (glen > pos + conv_len) + glen = pos + conv_len; /* This loop is necessary because the number of bytes returned by sys_wcstombs does not indicate the number of wide chars used for it, so we could potentially drop wide chars. */ while ((ret = sys_wcstombs (NULL, 0, buf + pos, glen - pos)) != (size_t) -1 -&& ret > len) +&& (ret > conv_len + /* Skip separated high surrogate: */ +|| ((buf [pos + glen - 1] & 0xFC00) == 0xD800 && glen - pos > 1))) --glen;
Re: /dev/clipboard pasting with small read() buffer
On 08/16/2012 08:20 AM, Thomas Wolff wrote: >>> MB_CUR_MAX does not work because its value is 1 at this point >> So what about MB_LEN_MAX then? There's no problem using a multiplier, >> but a symbolic constant is always better than a numerical constant. > I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from > limits.h (note the "_" distinction :) ), > because the latter, by its preceding comment, reserves the option to be > changed into a dynamic function in the future, which could then possibly > have the same problems as MB_CUR_MAX. POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be dynamic. We cannot change MB_LEN_MAX to be dynamic in the future. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: /dev/clipboard pasting with small read() buffer
On Aug 16 09:24, Eric Blake wrote: > On 08/16/2012 08:20 AM, Thomas Wolff wrote: > > >>> MB_CUR_MAX does not work because its value is 1 at this point > >> So what about MB_LEN_MAX then? There's no problem using a multiplier, > >> but a symbolic constant is always better than a numerical constant. > > I've now used _MB_LEN_MAX from newlib.h, rather than MB_LEN_MAX from > > limits.h (note the "_" distinction :) ), > > because the latter, by its preceding comment, reserves the option to be > > changed into a dynamic function in the future, which could then possibly > > have the same problems as MB_CUR_MAX. > > POSIX requires MB_LEN_MAX to be a constant, only MB_CUR_MAX can be > dynamic. We cannot change MB_LEN_MAX to be dynamic in the future. ...also, Cygwin's include/limits.h doesn't mention to convert to a function. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat