We've got a data type that is modeled after a typical object-oriented
data-model format (simple fields, and collections of other objects). We're
trying to accomplish the following join:
Here's out example input:
-------------------------------------
data1 = { ( 'a1', { ('a2-thing1'), ('a2-thing2') } ) }
data2 = { ( 'a2-thing1', 'x-value1' ), ( 'a2-thing1', 'x-value2' ) }
Here's what we want to get:
--------------------------------------
( 'a1', { ('a2-thing1', {
('x-value1'), ('x-value2') }
) }
)
Notice that we are trying to join the collection of a2 fields of the 1st
data set, on the first field in the 2nd data set.
We tried this:
--------------------
A = load 'data1' as ( a:tuple(a1:chararray, a2:bag{(a2t:chararray)}) );
B = load 'data2' as ( a2t:chararray, x:chararray );
X = join A by a2.a2t, B by a2t;
We get this error:
---------------------------
ERROR 1128: Cannot find field a2t in
a1:chararray,a2:bag{:tuple(a2t:chararray)}
Try as we might, we cannot find the right way to do this complex join.
Questions:
1) Should we be simplifying our data format into a more SQL table-like
structure and doing more joins to reduce the complexity?
2) How can we accomplish joining data2's data into the data1 "objects"?
--
Ho Duc Ha