Perfect.  Obviously didn't know that.  My assumption that only the first
byte gets checked was obviously wrong.

Thanks gents.

On Sun, Jan 19, 2020 at 12:12 AM yary <not....@gmail.com> wrote:

> In UTF-16 every character is 16 bits, so all 8 bits of zeros tells you is
> that it's possibly a big-endian ascii character or a little-endian
> non-ascii character at a position divisible by 256. All zeros U+0000 is
> unicode NULL, which the windows UTF-16 C convention uses to terminate the
> string.
> -y
>
>
> On Sat, Jan 18, 2020 at 9:04 PM ToddAndMargo via perl6-users <
> perl6-us...@perl.org> wrote:
>
>> On 2020-01-18 20:05, Paul Procacci wrote:
>> >  >> I also found out the
>> >  >> hard wasy the UTF16 strings need to be terminated with
>> >  >> a double nul (0x0000).
>> >
>> > Not to doubt you (I don't do anything in UTF-16), but can you show an
>> > example of this?
>> > I would have thought a single NULL character is enough.
>> >
>> > The 1st byte of a Unicode character determines whether or not it's
>> ascii
>> > or not and I wouldn't think when encountering the first null, any
>> > reasonable utf-16 interpretation would consume more than just that 1st
>> byte.
>>
>> Hi Paul,
>>
>> My dealings with UTF16 are dealing with Win API
>> calls to the registry.
>>
>> This is from my work in progress doc on NativeCall
>> and WinAPI:
>>
>>      Note: a UTF16 C string is “little-endian”
>>            meaning “ABC” is represented as
>>            0x4200 (A), 0X4300 (B), 0X4400 (C), 0x0000 (nul)
>>
>> The following is a call to:
>>
>>
>> https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagew
>>
>>       DWORD FormatMessageW(
>>           DWORD   dwFlags,      # bitwise OR
>> FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM |
>> FORMAT_MESSAGE_IGNORE_INSERTS
>>           LPCVOID lpSource,     # NULL.  The location of the message
>> definition. The type of this parameter depends upon the settings in the
>> dwFlags parameter.
>>           DWORD   dwMessageId,  # the error message number ($ErrorNumber)
>>           DWORD   dwLanguageId, # 0 for system's language
>>           LPTSTR  lpBuffer,     # the return string, give it 1024
>>           DWORD   nSize,        # 0  nubmer of bytes in the return
>>           va_list *Arguments    # NULL
>>       );
>>
>>
>> I have removed the comment from the call that prints out
>> the raw returned data.  It looks like this:
>>
>> <test start>
>> K:\Windows\NtUtil>perl6 -I. -e "use lib '.'; use WinErr
>> :WinFormatMessage; say WinFormatMessage( 0x789, True );"
>>
>> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101
>> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0
>> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100
>> 0 46 0 13 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> WinFormatMessage: Debug:
>>     WinGetLastError          0
>>     Error Number             1929
>>     nSize                    1024
>>     RtnCode                  41
>>     Error String Characters  39
>>     ErrorString              <The group element could not be removed.>
>>
>> The group element could not be removed.
>> </test end>
>>
>>
>> Note that the following UTF16 code is little endian and
>>
>> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101
>> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0
>> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100
>> 0 46 0 13 0 10 0 0 0
>>
>> corresponds to:
>>
>>      "The group element could not be removed", which
>> is error 0x789.
>>
>> And you can see why you need the double nul.
>>
>> The carriage return and line feed (13 0 10 0) were
>> fun to deal with.
>>
>> The code yourself is rather long winded.  If you
>> would like to run the code yourself, I can post
>> it to vpaste.net along with its companion module(s).
>>
>> -T
>>
>

-- 
__________________

:(){ :|:& };:

Reply via email to