Re: CSV file reading in hive

Furcy Pin Fri, 13 Feb 2015 02:11:28 -0800

Hi Sreeman,

Unfortunately, I don't think that Hive built-in format can currently read
csv files with fields enclosed in double quotes.
More generally, for having ingested quite a lot of messy csv files myself,
I would recommend you to write a MapReduce (or Spark) job
for cleaning your csv before giving it to Hive. This is what I did.
The (other) kind of issue I've met were among :

   - File not encoded in utf-8, making special characters unreadable for
   Hive
   - Some lines with missing or too many columns, which could shift your
   columns and ruin your stats.
   - Some lines with unreadable characters (probably data corruption)
   - I even got some lines with java stack traces in it

I hope your csv is cleaner than that, and would recommend that if you have
the control on how it is generated, replace your current separator with tab
(and replace inline tabs with \t) or something like that.

There might be some open source tools for data cleaning already out there.
I plan to release mine one day, once I've migrated it to Spark maybe, and
if my company agrees.

If you're lazy, I heard that Dataiku Studio (which has a free version) can
do such thing, though I never used it myself.

Hope this helps,

Furcy

2015-02-13 7:30 GMT+01:00 Slava Markeyev <[email protected]>:

> You can use lazy simple serde with ROW FORMAT DELIMITED FIELDS TERMINATED
> BY ',' ESCAPED BY '\'. Check the DDL for details
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>
>
>
> On Thu, Feb 12, 2015 at 8:19 PM, Sreeman <[email protected]> wrote:
>
>>  Hi All,
>>
>> How all of you are creating hive/Impala table when the CSV file has some
>> values with COMMA in between. it is like
>>
>> sree,12345,"payment made,but it is not successful"
>>
>>
>>
>>
>>
>> I know opencsv serde is there but it is not available in lower versions
>> of Hive 14.0
>>
>>
>>
>
>
>
> --
>
> Slava Markeyev | Engineering | Upsight
> Find me on LinkedIn <http://www.linkedin.com/in/slavamarkeyev>
> <http://www.linkedin.com/in/slavamarkeyev>
>

Re: CSV file reading in hive

Reply via email to