Thanks, I will try this escape mechanism for the returned C string. On Mon, Mar 28, 2016 at 1:04 PM, Greg Clayton <gclay...@apple.com> wrote:
> > > On Mar 28, 2016, at 11:38 AM, Jeffrey Tan <jeffrey.fu...@gmail.com> > wrote: > > > > Thanks Greg for the detailed explanation, very helpful. > > 1. Just to confirm, the weird string displayed is because 'data_' points > to some random memory? > > Yes. > > > So what gdb displays is also some random memory content not something > that more meaningful than us? I thought we(lldb) did not display > std::string content well but gdb does it correct. > > So the "size_" variable is zero, so anything that GDB is displaying is > shear luck of what the contents of memory are that "data_" points to. You > can't rely on any contents of "data_" since it is clearly bogus. What you > really want to see is just the string that "std::string" points to: > > (std::string) my_string = "Hello" > > Or for a std::string that contains 0, 1, and 2 as characters: > > (std::string) my_string = "\x00\x01\x02" > > > > > 2. I guess the std::string formatter did not kick in because our company > may link some special stl implementation. Let me share our binary for you > to confirm. > > You can get some help from Enrico to see why things are not displaying > correctly. My guess is this C++ standard library is different from the ones > that we added support for. > > > 3. I dumped the content of the object we try to json.dumps() against, > here is the content: > > response: {'id': 57, 'result': {'result': [{'name': 'data_', > 'value': {'type': 'object', 'description': '(char *) > "\xc9\xc3UH\\x89\xe5H\x8 > > 9}\xf8H\\x8bE\xf8]\xc3\x90UH\\x89\xe5H\x83\xec\x10H\\x89}\xf8H\\x8bE\xf8H\\x89\xc7\xe8~\\xb4\xca\xff\\x90\xc9\xc3UH\\x89\ > > > xe5SH\\x83\xec\x18H\\x89}\xe8H\x89u\xe0H\x8bE\xe0H\x89\xc7\xe8\\x9e\xff\xff\xffH\\x8b\\x18H\\x8bE\xe8H\x89\xc7\xe8O\\xb4\ > xca\xffH\\x89\xc6\xbf\\b"', 'objectId': 'RemoteObjectManager.118'}}, > {'name': 'size_', 'value': {'type': 'object', 'descr iption': > '(std::size_t) 0'}}, {'name': 'capacity_', 'value': {'type': 'object', > 'description': '(std::size_t) 14411518807 58558720'}}]}} > > So seems that the problem is json.dumps() is trying to treat the raw > byte array as utf8 which failed. > > So we need to figure out how to escape the raw byte array into string so > that we can json.dumps() it. The key question is how do we know the correct > encoding of the byte array. > > It doesn't really matter. Just know that any of the strings from: > > const char *SBValue::GetName(); > const char *SBValue::GetTypeName (); > const char *SBValue::GetDisplayTypeName(); > const char *SBValue::GetValue(); > const char *SBValue::GetSummary(); > const char *SBValue::GetObjectDescription(); > const char *SBValue::GetLocation (); > > Will need to be escaped. > > > Is my understanding correct that only the formatter has the knowledge to > decode the byte array correctly? > > We dump the values as strings. You won't get bytes out. You might get UTF8 > bytes or other things that JSON might interpret as special characters and > any C strings that you get from the above calls will just need to be > escaped if needed. > > > If we fail to find a type formatter(which is this case) and get a raw > field with byte array, we have no knowledge of the encoding so either we > have to guess one default encoding and try it or just display the raw byte > array content instead of decoding it? > > Again, this is all C strings. I don't think anything else matters. > > Our JSON.cpp has the following: > > int > JSONParser::GetEscapedChar(bool &was_escaped) > { > was_escaped = false; > const char ch = GetChar(); > if (ch == '\\') > { > was_escaped = true; > const char ch2 = GetChar(); > switch (ch2) > { > case '"': > case '\\': > case '/': > default: > break; > > case 'b': return '\b'; > case 'f': return '\f'; > case 'n': return '\n'; > case 'r': return '\r'; > case 't': return '\t'; > case 'u': > { > const int hi_byte = DecodeHexU8(); > const int lo_byte = DecodeHexU8(); > if (hi_byte >=0 && lo_byte >= 0) > return hi_byte << 8 | lo_byte; > return -1; > } > break; > } > return ch2; > } > return ch; > } > > You can see how it is used when the JSON parser is parsing in > JSONParser::GetToken() in the '"' case.
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev