On Sat, 2010-03-20 at 00:40 -0500, John Arbash Meinel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Guilherme Salgado wrote: > > Hi John, > > > > I've used meliae to get a memory dump from Launchpad, but when I tried > > to load that dump I got http://paste.ubuntu.com/397273/ (the first line > > there shows the line that causes simplejson.loads() to choke). > > > > From my understanding of [1], this seems to be expected, but I wonder > > how these unpaired surrogates ended up in the dump. Any ideas? > > > > BTW, I did some hacks in my local copy of meliae to replace the > > problematic bits on that line, and after that I was able to load the > > dump. Maybe with that I could try and find out where the unpaired > > surrogates are coming from? > > > > [1] <http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates> > > > > Cheers, > > > > I'm mostly offline on vacation right now, but I'll try to help out when > I get back. I can think of 2 causes:
Thanks for the help, John, but it turned out a memory dump from staging was loaded just fine, so I'm not worrying about this now and hoping the same will happen for a production dump, when we see another memory leak. If for some reason I can't load the production dump, I'll see if it could be caused by one of the two reasons below. > > 1) I trim most output to 100 characters. (So if you have a 1,000 byte > string, I only output 100 bytes.) It is possible that a Unicode > surrogate was at bytes 100 and 101 and just got truncated. > > 2) I use a pretty stupid method for encoding 8-bit strings, just mapping > them all to the unicode code point '\xff' => U+00FF. Some of that may be > invalid. > > 3) Other bugs I don't even know about... :) > > I'm happy to debug this with you sometimes soon. (If you're getting > this, it probably means I'm back home, rather than offline in an airport.) > > John > =:-> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (Cygwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAkukX9sACgkQJdeBCYSNAANWfwCgw2CBP2rdIwUEGwNK9yE70sIY > LqoAn2J14Q84GDZEBLPDlqBZjol6iVzn > =MvTl > -----END PGP SIGNATURE----- > -- Guilherme Salgado <[email protected]>
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

