Thanks, Edward, I am just using this simple example to test regex serde. Ultimately I would like to use regex to parse various types of log files.
On Fri, Jul 1, 2011 at 2:34 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > > > On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu <huyich...@gmail.com> > wrote: >> >> Hi, >> >> I am doing some simple tests to create table, load data using Hive. I >> am working on the VM provided by cloudera >> (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM). >> >> I have a text file with each line containing an IP address and a name, >> e.g., >> >> 123.45.67.89 tom >> 123.45.67.92 mark >> >> I create a table using following command: >> >> CREATE TABLE ip_name( >> ip STRING, >> name STRING >> ) >> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' >> WITH SERDEPROPERTIES( >> "input.regex" = "^([\d.]+) ([a-z]+)", >> "output.format.string" = "%1$s %2$s" >> ) >> STORED AS TEXTFILE; >> >> Then, I use the following command to load data into the table: >> >> LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE >> ip_name; >> >> Table was successfully created and file was also loaded, but all are >> NULL (the number of rows in the table is the same as the number of >> rows in the file). What could be the problem? >> >> Thanks a lot! >> >> William > > You do not need the regex serde for this. Specify the table normally and use > space as the delimiter. > > CREATE EXTERNAL TABLE logdata( > xxx STRING, > yyy STRING, > ... > > z_t) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\040' > STORED AS TEXTFILE; > > http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html >