In UTF-16 every character is 16 bits, so all 8 bits of zeros tells you is that it's possibly a big-endian ascii character or a little-endian non-ascii character at a position divisible by 256. All zeros U+0000 is unicode NULL, which the windows UTF-16 C convention uses to terminate the string. -y
On Sat, Jan 18, 2020 at 9:04 PM ToddAndMargo via perl6-users < perl6-us...@perl.org> wrote: > On 2020-01-18 20:05, Paul Procacci wrote: > > >> I also found out the > > >> hard wasy the UTF16 strings need to be terminated with > > >> a double nul (0x0000). > > > > Not to doubt you (I don't do anything in UTF-16), but can you show an > > example of this? > > I would have thought a single NULL character is enough. > > > > The 1st byte of a Unicode character determines whether or not it's ascii > > or not and I wouldn't think when encountering the first null, any > > reasonable utf-16 interpretation would consume more than just that 1st > byte. > > Hi Paul, > > My dealings with UTF16 are dealing with Win API > calls to the registry. > > This is from my work in progress doc on NativeCall > and WinAPI: > > Note: a UTF16 C string is “little-endian” > meaning “ABC” is represented as > 0x4200 (A), 0X4300 (B), 0X4400 (C), 0x0000 (nul) > > The following is a call to: > > > https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagew > > DWORD FormatMessageW( > DWORD dwFlags, # bitwise OR > FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | > FORMAT_MESSAGE_IGNORE_INSERTS > LPCVOID lpSource, # NULL. The location of the message > definition. The type of this parameter depends upon the settings in the > dwFlags parameter. > DWORD dwMessageId, # the error message number ($ErrorNumber) > DWORD dwLanguageId, # 0 for system's language > LPTSTR lpBuffer, # the return string, give it 1024 > DWORD nSize, # 0 nubmer of bytes in the return > va_list *Arguments # NULL > ); > > > I have removed the comment from the call that prints out > the raw returned data. It looks like this: > > <test start> > K:\Windows\NtUtil>perl6 -I. -e "use lib '.'; use WinErr > :WinFormatMessage; say WinFormatMessage( 0x789, True );" > > 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 > 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 > 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 > 0 46 0 13 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > WinFormatMessage: Debug: > WinGetLastError 0 > Error Number 1929 > nSize 1024 > RtnCode 41 > Error String Characters 39 > ErrorString <The group element could not be removed.> > > The group element could not be removed. > </test end> > > > Note that the following UTF16 code is little endian and > > 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 > 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 > 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 > 0 46 0 13 0 10 0 0 0 > > corresponds to: > > "The group element could not be removed", which > is error 0x789. > > And you can see why you need the double nul. > > The carriage return and line feed (13 0 10 0) were > fun to deal with. > > The code yourself is rather long winded. If you > would like to run the code yourself, I can post > it to vpaste.net along with its companion module(s). > > -T >