Hi, I've got some objects originally loaded, using the JSON loader from elephantbird, into nested maps, and subsequently stored using LZOPigStorage after various stages of processing.
When I subsequently load these nested maps, and pass them into a UDF for modification, pretty much any attempt to reference nested objects produces ClassCastExceptions where org.apache.pig.data.DataByteArray cannot be cast to whatever it is that I'm trying to get at. That all said, if I tear apart the map in pig latin, and pass all its bits and pieces as arguments to the UDF for reassembly, these exceptions don't happen, so this is what I've been doing, but it gets pretty ugly and complicated at times (especially when the layers of nesting involve arrays that I have to FLATTEN out and then group back together). Looking at the stored data, I've noticed that nested maps are serialized differently than the top-level map, so I've developed a suspicion that there's some extra bit of magic that Pig is using to parse/cast these sub-maps that I just don't know about (to be able to call it in my UDFs). If such a beast exists, and someone could provide me with a pointer, I'd appreciate it. Also, the cluster I'm working on right now is using version 0.8.1-cdh3u4 of Pig. I poked arond JIRA to see if this issue has been previously observed (and if I should bug the relevant ops-folk that much harder to upgrade our pig), but found nothing. If this has somehow been fixed in a later version (or perhaps if someone can recommend a storage class that doesn't cause this problem in the first place), that pointer would also be very much appreciated. Thanks, Kris -- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
