Hi, I would like to seek help on loading logfiles to hive tables.
I learnt from the "Getting Started" page that we could create hive tables as follow to import apachelog into it. ------------------------------------------------------------------ CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE; ------------------------------------------------------------------ I was trying to do the same thing, but changing the value of my output.form.string, let's say i only need, host, user, request. CREATE TABLE apachelog ( host STRING, user STRING, request STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?", "output.format.string" = "%1$s %3$s %5$s" ) STORED AS TEXTFILE; My questions are : (1) I specified only %1, %3 %5 variables to be input into my table column, but looks like hive load the first 3 variables into it (%1 %2 %3) Is there no way that hive could only load the columns i want? (2) How can i skip lines which does not fit input.regex pattern match? Thank you. lai