Hi, I'm capturing data of the form A (1:n) B, which is a fairly standard item-subitem pattern. In a standard DB, I'd have A and B tables with a foreign key from B to A.
But since Hive is different -- there's no natural primary key in my data and joins seem much more expensive -- I'm considering using an Array of Structs. So -- some questions: Does this make sense? How's performance? Say B has an attribute 'num', and I want to find the average of nums or something [which a B table would lend itself to] Is there an example of how to format the files? Thanks, Ranjan