Hi All,

I am researching some ways to combine array data with a UDAF. The raw data 
table schema is listed here:

CREATE TABLE IF NOT EXISTS array_data (session_id string, properties 
array<struct<name : string, value : string>>);

I would like to do such operation for it with a UDAF "array_combine":

SELECT session_id, array_combine(properties) as combined_properties
FROM array_data
GROUP BY session_id;

For example, array_data table has two records:

session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}]
session_id1, [{"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]

Then with the combination, the result should be one record:

session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}, 
{"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]


But when I debug the UDAF, the "iterate" and "merge" functions will pass 
LazyArray type object as parameter,

public void iterate(AggregationBuffer agg, Object[] parameters)
public void merge(AggregationBuffer agg, Object partial)


There are two questions here:


(1)    Why the object is not ArrayList? I checked the input ObjectInspector 
which is StandardListObjectInspector in "init" function,

public ObjectInspector init(Mode m, ObjectInspector[] parameters)


(2)    And how to combine two LazyArray objects into one with easy way in 
"iterate" and "merge" functions? It seems that I have to create a new LazyArray 
object, but I don't know the values of separator, nullSequence, escapeChar in 
original LazyArray object, and I also have less knowledge to build a LazyArray 
with the complex type (array<struct<name : string, value : string>>).

Does anyone give me a help? Thanks in advance.


Best Regards,
Eric

Reply via email to