You can use your custom mapreduce code. Just check the record type and if xml 
then preprocess to avoid new lines.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: iwannaplay games <funnlearnfork...@gmail.com>
Date: Tue, 20 Nov 2012 14:29:18 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: populating xml data in hive

How to preprocess data where millions of records are there out of
which only few thousands contain xml data


On 11/20/12, Nitin Pawar <nitinpawar...@gmail.com> wrote:
> Hive currently supports only new line as record separator. If you got
> newline in in column values then you will need to preprocess your data and
> remove new line from column values
> On Nov 20, 2012 1:30 PM, "iwannaplay games" <funnlearnfork...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have a csv file ( separated by |) where data is like
>>
>> id               data
>>                                        date
>> 1            apple
>>                                   24-nov-2011
>> 2            mango
>>                                 26-nov-2011
>> 3            <?xml version="1.0" encoding="utf-8"?>
>>                  <a>fruits</a>
>>                                 28-nov-2011
>> 4             papaya
>>                                  30-nov-2011
>>
>>
>> Since id=3 has new line in data field hive  takes only first
>> line and treats second line as different row.I want my full xml field
>> to be taken inside data in hive table .
>>
>> it seems hive doesnt support            lines terminated by '|'
>>
>> How to treat xml data in hive
>>
>> Thanks & Regards
>> Prabhjot
>>
>

Reply via email to