Hi, I have a question about using lateral views with multi table insert. I have a table of data that represents raw log data, the structure of which makes it onerous to query directly largely because it requires UNIONTYPE columns. So, I transform that raw table into 3 new tables, a primary table and 2 1-to-many tables.
The raw table is similar to this: CREATE TABLE IF NOT EXISTS events_raw ( event_id STRING, event_data_0 INT, event_data_1 BIGINT, packed_event_data_2 UNIONTYPE < INT, STRUCT < event_data_2:INT, event_data_2_sub_0:BOOLEAN, event_data_2_sub_1:BOOLEAN>>, packed_event_data_3 UNIONTYPE < BIGINT, ARRAY < STRUCT < event_data_3_metadata_key:STRING, event_data_3_metadata_value:STRING>>>, packed_event_data_4 UNIONTYPE < BOOLEAN, STRUCT < event_data_4:BOOLEAN, event_data_4_metadata:ARRAY < STRUCT < event_data_4_metadata_key:STRING, event_data_4_metadata_value:STRING>>>>); This is to be transformed into these tables: CREATE TABLE IF NOT EXISTS events ( event_id STRING, event_data_0 INT, event_data_1 BIGINT, event_data_2 INT, event_data_2_sub_0 BOOLEAN, event_data_2_sub_1 BOOLEAN, event_data_3 BIGINT, event_data_4 BOOLEAN); CREATE TABLE IF NOT EXISTS event_data_3_metadata ( event_id STRING, metadata_key STRING, metadata_value STRING); CREATE TABLE IF NOT EXISTS event_data_4_metadata ( event_id STRING, metadata_key STRING, metadata_value STRING); The only way I know how to unpack and/or explode the UNIONTYPEs is to create custom UDTFs for each UNIONTYPE column. For example, I created an unpack_packed_event_data_2 function which maps an single UnionObject to a STRUCT<event_data_2:INT, event_data_2_sub_0:BOOLEAN, event_data_2_sub_1:BOOLEAN>. Similarly, I created UDTFs to explode the ARRAY elements contained in the UNIONTYPE columns. Using those UDTFs I devised these queries to build the transformed tables: FROM events_raw LATERAL VIEW unpack_event_data_2 (packed_event_data_2) event_data_2_struct AS event_data_2, event_data_2_sub_0, event_data_2_sub_1 LATERAL VIEW unpack_event_data_3 (packed_event_data_3) event_data_3_struct AS event_data_3 LATERAL VIEW unpack_event_data_4 (packed_event_data_4) event_data_4_struct AS event_data_4 INSERT INTO TABLE events SELECT event_id, event_data_0, event_data_1, event_data_2_struct.event_data_2, event_data_2_struct.event_data_2_sub_0, event_data_2_struct.event_data_2_sub_1, event_data_3_struct.event_data_3, event_data_4_struct.event_data_4); FROM events_raw LATERAL VIEW explode_event_data_3 (packed_event_data_3) event_data_3_array_element AS metadata_key, metadata_value INSERT INTO TABLE event_data_3_metadata SELECT event_id, event_data_3_array_element.metadata_key, event_data_3_array_element.metadata_value); FROM events_raw LATERAL VIEW explode_event_data_4 (packed_event_data_4) event_data_4_array_element AS metadata_key, metadata_value INSERT INTO TABLE event_data_3_metadata SELECT event_id, event_data_4_array_element.metadata_key, event_data_4_array_element.metadata_value); This works correctly, the tables are filled with the appropriate number of rows. However, the raw table is scanned 3 times to accomplish this and that is very costly given the amount of data. When I combine those 3 statements into one Multi Table Insert: FROM events_raw LATERAL VIEW unpack_event_data_2 (packed_event_data_2) event_data_2_struct AS event_data_2, event_data_2_sub_0, event_data_2_sub_1 LATERAL VIEW unpack_event_data_3 (packed_event_data_3) event_data_3_struct AS event_data_3 LATERAL VIEW unpack_event_data_4 (packed_event_data_4) event_data_4_struct AS event_data_4 LATERAL VIEW explode_event_data_3 (packed_event_data_3) event_data_3_array_element AS metadata_key, metadata_value LATERAL VIEW explode_event_data_4 (packed_event_data_4) event_data_4_array_element AS metadata_key, metadata_value INSERT INTO TABLE events SELECT event_id, event_data_0, event_data_1, event_data_2_struct.event_data_2, event_data_2_struct.event_data_2_sub_0, event_data_2_struct.event_data_2_sub_1, event_data_3_struct.event_data_3, event_data_4_struct.event_data_4 INSERT INTO TABLE event_data_3_metadata SELECT event_id, event_data_3_array_element.metadata_key, event_data_3_array_element.metadata_value INSERT INTO TABLE event_data_4_metadata SELECT event_id, event_data_4_array_element.metadata_key, event_data_4_array_element.metadata_value; The query fails with: [Hive Error]: Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Column packed_event_data_3 Found in more than One Tables/Subqueries. I don't know how to get around having separate unpack_event_data_3 and explode_event_data_3 functions. Combining them would seem to marry the functions' output signatures and in instances when the BIGINT is type of the UNIONTYPE there shouldn't be a row of NULL values in the event_data_4_metadata table and vice versa. Is there a better way to do this? Thanks, Jim Krehl