Nope, sorry, I wish I could open source this. I did some patches to the loader (e.g. it did not like empty tags) - those are submitted as pull requests.
Some more hints: 1) I've found pig-style concat function to be very useful - mine could take any input, skip nulls, flatten bags and tuples 2) I had to introduce custom type. It does not like top-level custom types, but works OK with tuples of custom types. 24 груд. 2012 10:13, "Russell Jurney" <[email protected]> напис. > Thanks - any chance of contributing some of that code? :) > > I have thought of a similar approach: starting with an XMLToPig > EvalFunc that takes the output of the existing XMLLoader and converts > it to tuple/bag/map form. Easier to baby step that, just a matter of > plugging that code in to the xml slice trimmed by XMLLoader, and much > easier once the EvalFunc works. > > Russell Jurney http://datasyndrome.com > > On Dec 24, 2012, at 12:10 AM, Vitalii Tymchyshyn <[email protected]> wrote: > > > I was doing such a thing in my previous project, but I did parse on > demand. > > What I mean is that I've created set of xml-processing functions, each > can > > take a string or Dom on input plus explicit parse function. > > I did this because I was usually using concatenation/grouping on parsed > > input files and processing was done only after that. Or processing can be > > done in another MR step and serialization of string is much easier than > of > > Dom. > > 24 груд. 2012 09:24, "Russell Jurney" <[email protected]> напис. > > > >> I want to extend the existing XMLLoader to go beyond capturing the text > >> inside a tag and to actually create a Pig mapping of the Document Object > >> Model the XML represents. This would be similar to elephant-bird's > >> JsonLoader. > >> > >> For instance, check this example: https://gist.github.com/4368194 > >> > >> Semi-structured data can vary, so this behavior can be risky but... I > want > >> people to be able to load JSON and XML data easily their first session > with > >> Pig. > >> > >> Russell Jurney http://datasyndrome.com > >> >
