In UTF-16 every character is 16 bits, so all 8 bits of zeros tells you is
that it's possibly a big-endian ascii character or a little-endian
non-ascii character at a position divisible by 256. All zeros U+0000 is
unicode NULL, which the windows UTF-16 C convention uses to terminate the
string.
-y


On Sat, Jan 18, 2020 at 9:04 PM ToddAndMargo via perl6-users <
perl6-us...@perl.org> wrote:

> On 2020-01-18 20:05, Paul Procacci wrote:
> >  >> I also found out the
> >  >> hard wasy the UTF16 strings need to be terminated with
> >  >> a double nul (0x0000).
> >
> > Not to doubt you (I don't do anything in UTF-16), but can you show an
> > example of this?
> > I would have thought a single NULL character is enough.
> >
> > The 1st byte of a Unicode character determines whether or not it's ascii
> > or not and I wouldn't think when encountering the first null, any
> > reasonable utf-16 interpretation would consume more than just that 1st
> byte.
>
> Hi Paul,
>
> My dealings with UTF16 are dealing with Win API
> calls to the registry.
>
> This is from my work in progress doc on NativeCall
> and WinAPI:
>
>      Note: a UTF16 C string is “little-endian”
>            meaning “ABC” is represented as
>            0x4200 (A), 0X4300 (B), 0X4400 (C), 0x0000 (nul)
>
> The following is a call to:
>
>
> https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagew
>
>       DWORD FormatMessageW(
>           DWORD   dwFlags,      # bitwise OR
> FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM |
> FORMAT_MESSAGE_IGNORE_INSERTS
>           LPCVOID lpSource,     # NULL.  The location of the message
> definition. The type of this parameter depends upon the settings in the
> dwFlags parameter.
>           DWORD   dwMessageId,  # the error message number ($ErrorNumber)
>           DWORD   dwLanguageId, # 0 for system's language
>           LPTSTR  lpBuffer,     # the return string, give it 1024
>           DWORD   nSize,        # 0  nubmer of bytes in the return
>           va_list *Arguments    # NULL
>       );
>
>
> I have removed the comment from the call that prints out
> the raw returned data.  It looks like this:
>
> <test start>
> K:\Windows\NtUtil>perl6 -I. -e "use lib '.'; use WinErr
> :WinFormatMessage; say WinFormatMessage( 0x789, True );"
>
> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101
> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0
> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100
> 0 46 0 13 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> WinFormatMessage: Debug:
>     WinGetLastError          0
>     Error Number             1929
>     nSize                    1024
>     RtnCode                  41
>     Error String Characters  39
>     ErrorString              <The group element could not be removed.>
>
> The group element could not be removed.
> </test end>
>
>
> Note that the following UTF16 code is little endian and
>
> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101
> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0
> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100
> 0 46 0 13 0 10 0 0 0
>
> corresponds to:
>
>      "The group element could not be removed", which
> is error 0x789.
>
> And you can see why you need the double nul.
>
> The carriage return and line feed (13 0 10 0) were
> fun to deal with.
>
> The code yourself is rather long winded.  If you
> would like to run the code yourself, I can post
> it to vpaste.net along with its companion module(s).
>
> -T
>

Reply via email to