Perfect. Obviously didn't know that. My assumption that only the first byte gets checked was obviously wrong.
Thanks gents. On Sun, Jan 19, 2020 at 12:12 AM yary <not....@gmail.com> wrote: > In UTF-16 every character is 16 bits, so all 8 bits of zeros tells you is > that it's possibly a big-endian ascii character or a little-endian > non-ascii character at a position divisible by 256. All zeros U+0000 is > unicode NULL, which the windows UTF-16 C convention uses to terminate the > string. > -y > > > On Sat, Jan 18, 2020 at 9:04 PM ToddAndMargo via perl6-users < > perl6-us...@perl.org> wrote: > >> On 2020-01-18 20:05, Paul Procacci wrote: >> > >> I also found out the >> > >> hard wasy the UTF16 strings need to be terminated with >> > >> a double nul (0x0000). >> > >> > Not to doubt you (I don't do anything in UTF-16), but can you show an >> > example of this? >> > I would have thought a single NULL character is enough. >> > >> > The 1st byte of a Unicode character determines whether or not it's >> ascii >> > or not and I wouldn't think when encountering the first null, any >> > reasonable utf-16 interpretation would consume more than just that 1st >> byte. >> >> Hi Paul, >> >> My dealings with UTF16 are dealing with Win API >> calls to the registry. >> >> This is from my work in progress doc on NativeCall >> and WinAPI: >> >> Note: a UTF16 C string is “little-endian” >> meaning “ABC” is represented as >> 0x4200 (A), 0X4300 (B), 0X4400 (C), 0x0000 (nul) >> >> The following is a call to: >> >> >> https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessagew >> >> DWORD FormatMessageW( >> DWORD dwFlags, # bitwise OR >> FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | >> FORMAT_MESSAGE_IGNORE_INSERTS >> LPCVOID lpSource, # NULL. The location of the message >> definition. The type of this parameter depends upon the settings in the >> dwFlags parameter. >> DWORD dwMessageId, # the error message number ($ErrorNumber) >> DWORD dwLanguageId, # 0 for system's language >> LPTSTR lpBuffer, # the return string, give it 1024 >> DWORD nSize, # 0 nubmer of bytes in the return >> va_list *Arguments # NULL >> ); >> >> >> I have removed the comment from the call that prints out >> the raw returned data. It looks like this: >> >> <test start> >> K:\Windows\NtUtil>perl6 -I. -e "use lib '.'; use WinErr >> :WinFormatMessage; say WinFormatMessage( 0x789, True );" >> >> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 >> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 >> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 >> 0 46 0 13 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> WinFormatMessage: Debug: >> WinGetLastError 0 >> Error Number 1929 >> nSize 1024 >> RtnCode 41 >> Error String Characters 39 >> ErrorString <The group element could not be removed.> >> >> The group element could not be removed. >> </test end> >> >> >> Note that the following UTF16 code is little endian and >> >> 84 0 104 0 101 0 32 0 103 0 114 0 111 0 117 0 112 0 32 0 101 0 108 0 101 >> 0 109 0 101 0 110 0 116 0 32 0 99 0 111 0 117 0 108 0 100 0 32 0 110 0 >> 111 0 116 0 32 0 98 0 101 0 32 0 114 0 101 0 109 0 111 0 118 0 101 0 100 >> 0 46 0 13 0 10 0 0 0 >> >> corresponds to: >> >> "The group element could not be removed", which >> is error 0x789. >> >> And you can see why you need the double nul. >> >> The carriage return and line feed (13 0 10 0) were >> fun to deal with. >> >> The code yourself is rather long winded. If you >> would like to run the code yourself, I can post >> it to vpaste.net along with its companion module(s). >> >> -T >> > -- __________________ :(){ :|:& };: