On 2020-01-18 20:05, Paul Procacci wrote:
 >> I also found out the
 >> hard wasy the UTF16 strings need to be terminated with
 >> a double nul (0x0000).

Not to doubt you (I don't do anything in UTF-16), but can you show an example of this?
I would have thought a single NULL character is enough.

The 1st byte of a Unicode character determines whether or not it's ascii or not and I wouldn't think when encountering the first null, any reasonable utf-16 interpretation would consume more than just that 1st byte.

Hi Paul,

My dealings with UTF16 are dealing with Win API
calls to the registry.

This is from my work in progress doc on NativeCall
and WinAPI:

    Note: a UTF16 C string is “little-endian”
          meaning “ABC” is represented as
          0x4200 (A), 0X4300 (B), 0X4400 (C), 0x0000 (nul)

The following is a call to:

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagew

     DWORD FormatMessageW(
DWORD dwFlags, # bitwise OR FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS LPCVOID lpSource, # NULL. The location of the message definition. The type of this parameter depends upon the settings in the dwFlags parameter.
         DWORD   dwMessageId,  # the error message number ($ErrorNumber)
         DWORD   dwLanguageId, # 0 for system's language
         LPTSTR  lpBuffer,     # the return string, give it 1024
         DWORD   nSize,        # 0  nubmer of bytes in the return
         va_list *Arguments    # NULL
     );


I have removed the comment from the call that prints out
the raw returned data.  It looks like this:

<test start>
K:\Windows\NtUtil>perl6 -I. -e "use lib '.'; use WinErr :WinFormatMessage; say WinFormatMessage( 0x789, True );"

84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 0 46 0 13 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

WinFormatMessage: Debug:
   WinGetLastError          0
   Error Number             1929
   nSize                    1024
   RtnCode                  41
   Error String Characters  39
   ErrorString              <The group element could not be removed.>

The group element could not be removed.
</test end>


Note that the following UTF16 code is little endian and

84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 0 46 0 13 0 10 0 0 0

corresponds to:

    "The group element could not be removed", which
is error 0x789.

And you can see why you need the double nul.

The carriage return and line feed (13 0 10 0) were
fun to deal with.

The code yourself is rather long winded.  If you
would like to run the code yourself, I can post
it to vpaste.net along with its companion module(s).

-T

Reply via email to