Hi Kiwon Lee

There isn't anything specific you need to do in hive DDL or DML to parse gz 
files. You need to ensure that 'org.apache.hadoop.io.compress.GzipCodec' is 
availabe in 'io.compression.codecs' property within core-site.xml.

To parse log files you can use RegexSerde. A sample DDL for loading Apache log 
files can be found at
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ApacheWeblogData


You can create a partitioned table by using the  'PARTITIONED BY' clause while 
creating a table.  A sample DDL  below 

CREATE TABLE page_view(viewTime INT, userid BIGINT,
                    page_url STRING, referrer_url STRING,
                    ip STRING COMMENT 'IP Address of the User')
    COMMENT 'This is the page view table'
    PARTITIONED BY(dt STRING, country STRING)
    ROW FORMAT DELIMITED
            FIELDS TERMINATED BY '1'
    STORED AS SEQUENCEFILE;

If your data is already partitioned in hdfs then you can create a partitioned 
table and add partitions to the table by specifying the dir corresponding to 
each partition using 'ALTER TABLE ADD PARTITION' statement.

If the data is not partitioned in hdfs but would like to be partitioned in hive 
then you can take a look at Dynamic Partition Insert.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Kiwon Lee <kiwoni....@gmail.com>
Date: Sat, 18 Aug 2012 00:29:20 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: how to handling complex log file(compressed, 200G)

Hi,

I have complex log files (compressed ".gz", 200G) on HDFS.

+ log file format :
127.0.0.1 [2012Avg08] "a=abc&b=adf&c=aadfad"

I think DDL)),
CREATE TABLE log_tb (ip STRING, dt STRING, kv Map<STRING, STRING>)
ROW FORMAT SERDE "??"
STORED AS SEQUENCEFILE;

I want the results below.
SELECT kv['b']
FROM log_tb
LIMIT 10;


1) How do I parsing to Complex log file (compressed(".gz", 200G)

2) If I have to SerDe, what SerDe should I use?

3) Does existed SerDe(input/output) by user define class?

4) If I use to partition with log file, how use to DDL, DML?..plz. sample
sql (DDL, DML)


Thanks.

Reply via email to