Re: SerDe question

Vijay Tue, 27 Sep 2011 21:19:02 -0700

There are a couple of problems. First of all, input.regex needs to be
"(\\w+)". Please note the case.
The bigger problem though, is that, with this (and most) serdes, you
can only expect one row per line of input. So multiple words within
the text cannot generate multiple rows. The best option is to probably
parse the text file and generate a different file with each word on a
separate line and then load it into hive.


Hope that helps,
Vijay

On Tue, Sep 27, 2011 at 6:45 PM, Mark Kerzner <mark.kerz...@shmsoft.com> wrote:
> Hi, Hive experts,
>
> Would you see what I am doing wrong? For a simple test of breaking a text
> into words and putting these words into a table, I am doing this
>
> CREATE EXTERNAL TABLE books1
> (
>   words string
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES ("input.regex" = "\\W")
> STORED AS TextFile;
>
> LOAD DATA INPATH '/test-data/ch1/moby-dick.txt'  OVERWRITE INTO TABLE
> books1;
>
> This SerDe works in Java code, but in Hive I am getting all nulls in the
> books1 table.
>
> Thank you,
> Mark
>

Re: SerDe question

Reply via email to