Thanks Manish.

It's a good article; But it's still not clear to mehow you define when the
column is of nested type (like array of maps, maps or array, etc).

Just a clarification on item 2 below.

2.      **What would be the seperator for map elements?****

For Map element separator is “=”


'=' is the MAP key separator, what I mean is the item separator when the
map contains multiple key/value pairs like,

   (Key1=Value1; Key2=Value2; Key3=Value3....)


Here '=' is the key separator and ';' is the item separator.


I can handle the above example with  COLLECTION ITEMS TERMINATED BY ';'
and MAP KEYS TERMINATED BY '=' if the element is of type MAP. The  COLLECTION
ITEMS TERMINATED BY ',' works on all three data types ( maps, arrays,
struct) when they are by them selves. The problem is defining them for
nested structures. Because we need multiple separators: one separator for
array items and a different separator for map items defined within that
array, etc.


The default hive delimiters work just fine.The delimiters in that case will
be level1 will have '^A', level 2 '^B', level 3 '^C', etc; What I am trying
to do is to explicitly define them. The COLLECTION ITEMS TERMINATED BY ','
statement addresses the first level (^A); but don't know how to define the
separators for other levels (to use instead of ^B, ^C, etc).


Thanks,

Sadu



On Fri, Sep 28, 2012 at 1:28 AM, Manish.Bhoge <manish.bh...@target.com>wrote:

> Hi Sadu,****
>
> ** **
>
> See my answer below.****
>
> ** **
>
> Also this will help you to understand in detail about collection, MAP and
> Array.****
>
> ** **
>
>
> http://datumengineering.wordpress.com/2012/09/27/agility-in-hive-map-array-score-for-hive/
> ****
>
> ** **
>
> ** **
>
> *From:* Sadananda Hegde [mailto:saduhe...@gmail.com]
> *Sent:* Friday, September 28, 2012 10:31 AM
> *To:* user@hive.apache.org
> *Subject:* Defining collection items terminated by for a nested data type*
> ***
>
> ** **
>
> How does "collection items terminated by" work  on a nested structure? Say
> the  table is created with the DDL:****
>
>  ****
>
> CREATE TABLE table_1(f1 int, f2 string, f3  array <struct <a string, b
> int, c map<string, string>>>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY '='
> LINES TERMINATED BY '\'n'
> STORED AS TEXTFILE;****
>
>  ****
>
> I guess comma seperator wll be used for the items in the outer
> most structure (i.e. array).  Is that true?****
>
> Yes. Right, comma is a separator for array.****
>
> **1.      **What would be the seperator character between a,b and c
> (struct  elements)?****
>
> I think it is \n. Not very sure about this.****
>
> **2.      **What would be the seperator for mapelements?****
>
> For Map element separator is “=”****
>
>  3. Is there a way to explicitly specify those ITEMS seperators rather
> than using the default ones like ^B, ^C, etc, (like multiple collection
> items)?****
>
>  You can define the custom separator. But multiple collection seems
> infeasible. ****
>
>  The original data is in xml format (complex one with many nested levels)
> and we are planning to parse that xml using a java parser into delimited
> text file which can be used to load the hive table. My question is:****
>
>      " How should we be representng the f3 like structures in the data
> file?" ****
>
>  ****
>
> The actual file has lot many fields with quite a few complex types like f3
> above; but I guess logic would be the same. ****
>
>  --- For this either you need to write custom input reader in MAP-REDUCE
> or use custom serde.****
>
> Thanks for your help.....****
>
>  ****
>
> Regards,****
>
> Sadu****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>

Reply via email to