Thanks, Edward, I am just using this simple example to test regex
serde. Ultimately I would like to use regex to parse various types of
log files.

On Fri, Jul 1, 2011 at 2:34 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>
>
> On Fri, Jul 1, 2011 at 2:16 PM, Yichuan (William) Hu <huyich...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am doing some simple tests to create table, load data using Hive. I
>> am working on the VM provided by cloudera
>> (https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM).
>>
>> I have a text file with each line containing an IP address and a name,
>> e.g.,
>>
>> 123.45.67.89 tom
>> 123.45.67.92 mark
>>
>> I create a table using following command:
>>
>> CREATE TABLE ip_name(
>> ip STRING,
>> name STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES(
>> "input.regex" = "^([\d.]+) ([a-z]+)",
>> "output.format.string" = "%1$s %2$s"
>> )
>> STORED AS TEXTFILE;
>>
>> Then, I use the following command to load data into the table:
>>
>> LOAD DATA LOCAL INPATH '/home/cloudera/test.txt' OVERWRITE INTO TABLE
>> ip_name;
>>
>> Table was successfully created and file was also loaded, but all are
>> NULL (the number of rows in the table is the same as the number of
>> rows in the file). What could be the problem?
>>
>> Thanks a lot!
>>
>> William
>
> You do not need the regex serde for this. Specify the table normally and use
> space as the delimiter.
>
> CREATE EXTERNAL TABLE logdata(
>      xxx STRING,
>      yyy STRING,
>      ...
>
>      z_t)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\040'
> STORED AS TEXTFILE;
>
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg11178.html
>

Reply via email to