Hi Mahsa It is possible to store unstructured data in have if the records follow a constant pattern like log files. You need to use a SERDE for the same. It would be nice parsing your text line by line using regular expressions and you can use RegexSerDe for the same . In the serde properties define input.regex - the regular expression output.format string - to which columns the parsed data corresponds to
An example of apache web log analytics is given in hive wiki https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-ApacheWeblogData add jar ../build/contrib/hive_contrib.jar; CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE; Regards Bejoy KS ________________________________ From: mahsa mofidpoor <mofidp...@gmail.com> To: user@hive.apache.org Sent: Thursday, March 1, 2012 2:25 AM Subject: Hive and unstructured data Hello I am curious to know how Hive maps the real-world unstructured data (like Facebook logs) with its own structures(tables). In other words, if there is a concept of building a table over unstructured data in Hive then how does the structure is exactly defined? Thank you in advance for your response. Mahsa