Hi,

I would like to seek help on  loading logfiles to hive tables.

I learnt from the "Getting Started" page that we could create hive
tables as follow to import apachelog into it.
------------------------------------------------------------------
CREATE TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") 
(-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;
------------------------------------------------------------------

I was trying to do the same thing, but changing the value of my 
output.form.string,
let's say i only need, host, user, request.

CREATE TABLE apachelog (
  host STRING,
  user STRING,
  request STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") 
(-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?",
  "output.format.string" = "%1$s %3$s %5$s"
)
STORED AS TEXTFILE;

My questions are :
(1) I specified only %1, %3 %5 variables to be input into my table
column, but looks like hive load the first 3 variables into it (%1 %2
%3)
Is there no way that hive could only load the columns i want?

(2) How can i skip lines which does not fit input.regex pattern match?

Thank you.

lai


Reply via email to