Thanks Bejoy. I have zip file there is sense to convert into gzip again. Chuck, I got what you are trying to say. So I need to process it outside HDFS and bring the text file into HDFS.
On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: > Hi Manish > > Gzip works well if you have the compression codec available in > 'io.compression.codes' . Gzip codec is present in default. > > I don't think untar ing world be done by map reduce jobs. So tar files > may not work with hive, you need to untar the files out of hadoop hive > as a prerequisite. > > > > Regards > Bejoy KS > > > > > > ______________________________________________________________________ > To: user@hive.apache.org; keshav.c.sav...@fisglobal.com > Subject: Re: zip file or tar file cosumption > From: manishbh...@rocketmail.com > Date: Sun, 30 Sep 2012 12:32:15 +0000 > > What about .gz OR tar file. Does this unzip require at HDFS and load > into hive? How you resolve it. > > > Sent from my BlackBerry, pls excuse typo > > > ______________________________________________________________________ > > From: "Connell, Chuck" <chuck.conn...@nuance.com> > Date: Sun, 30 Sep 2012 12:24:37 +0000 > To: user@hive.apache.org<user@hive.apache.org>; Savant, > Keshav<keshav.c.sav...@fisglobal.com> > ReplyTo: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > I have seen that error when I try to overwrite an existing file. > > But, more importantly, Hive cannot understand ZIP files. There was a > long thread about this just a few days ago. Your table def says > "stored as textfile" but you are not giving it a text file. > > Chuck > > > > > ______________________________________________________________________ > > From: Manish [manishbh...@rocketmail.com] > Sent: Sunday, September 30, 2012 7:38 AM > To: Savant, Keshav > Cc: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > I am getting below error when loading zip file > > Driver returned: 9. Errors: Hive history > file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt > Loading data to table default.pageview_zip > Failed with exception Error moving: > hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: > /user/manish/input/zip > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.MoveTask > > My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' > OVERWRITE INTO TABLE `pageview_zip` > > Table definition: > CREATE external TABLE pageview_zip > ( > C_0 STRING, > C_1 STRING, > C_7 MAP<STRING,STRING>, > C_8 STRING, > C_13 MAP<STRING,STRING>, > C_21 STRING > ) > COMMENT 'Page View' > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY > ';' MAP KEYS TERMINATED BY '=' > STORED AS TEXTFILE LOCATION '/user/manish/input/zip' > > Thank You, > Manish > > > > On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: > > True Manish. > > > > Keshav C Savant > > > > > From: Manish.Bhoge [mailto:manish.bh...@target.com] > Sent: Thursday, September 27, 2012 4:26 PM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Thanks Savant. I believe this will hold good for .zip file > also. > > > > Thank You, > > Manish. > > > > From: Savant, Keshav [mailto:keshav.c.sav...@fisglobal.com] > Sent: Thursday, September 27, 2012 10:19 AM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Manish the table that has been created for zipped text files > should be defined as sequence file, for example > > > > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile; > > > > After this you can use regular load command to load these > files, for example > > > > load data local inpath 'path-to-csv-file.gz' into table > my_table_zip; > > > > hope this helps > > > > Keshav C Savant > > > > > From: Manish Bhoge [mailto:manishbh...@rocketmail.com] > Sent: Wednesday, September 26, 2012 9:43 PM > To: user@hive.apache.org > Subject: Re: zip file or tar file cosumption > > > > > Hi Richin, > > Thanks! Yes this is what I wanted to understand how to load > zip file to Hive table. Now, I'll try this option. > > Thank You, > Manish. > > Sent from my BlackBerry, pls excuse typo > > > > > ______________________________________________________________ > > From:<richin.j...@nokia.com> > > > Date:Wed, 26 Sep 2012 14:51:39 +0000 > > > To:<user@hive.apache.org> > > > ReplyTo:user@hive.apache.org > > > Subject:RE: zip file or tar file cosumption > > > > > > You are right Chuck. I thought his question was how to use zip > files or any compressed files in Hive tables. > > > > Yeah, seems like you can’t do that > > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E > > But you can always compress your files in gzip format and they > should be good to go. > > > > Richin > > > > From: ext Connell, Chuck [mailto:chuck.conn...@nuance.com] > Sent: Wednesday, September 26, 2012 10:44 AM > To: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > But TEXTFILE in Hive always has newline as the record > delimiter. How could this possibly work with a zip/tar file > that can contain ASCII 10 characters at random locations, and > certainly does not have ASCII 10 at the end of each data > record? > > > > Chuck Connell > > Nuance R&D Data Team > > Burlington, MA > > > > > > > From:richin.j...@nokia.com [mailto:richin.j...@nokia.com] > Sent: Wednesday, September 26, 2012 10:14 AM > To: user@hive.apache.org; manishbh...@rocketmail.com > Subject: RE: zip file or tar file cosumption > > > > > Hi Manish, > > > > If you have your zip file at location - /home/manish/zipfile, > you can just point your external table to that location like > > CREATE EXTERNAL TABLE manish_test (field1 string, field2 > string) ROW FORMAT DELIMITED FIELDS TERMINATED BY > <your_column_delimiter> STORED AS TEXTFILE LOCATION > ‘/home/manish/zipfile’; > > > > OR > > > > If you already have external table pointing to a certain > location you can load this zip file into your table as > > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE > manish_test; > > > > Hope this helps. > > > > Richin > > > > From: ext Manish Bhoge [mailto:manishbh...@rocketmail.com] > Sent: Wednesday, September 26, 2012 9:13 AM > To: user@hive.apache.org > Subject: Re: zip file or tar file cosumption > > > > > Hi Savant, > > Got it. But I still need to understand that how to load zip? > Can I directly use zip file in external table. can u pls help > to get the load statement. > > Sent from my BlackBerry, pls excuse typo > > > > > ______________________________________________________________ > > From:"Savant, Keshav" <keshav.c.sav...@fisglobal.com> > > > Date:Wed, 26 Sep 2012 12:25:38 +0000 > > > To:user@hive.apache.org<user@hive.apache.org> > > > ReplyTo:user@hive.apache.org > > > Cc:manish.bh...@target.com<manish.bh...@target.com>; > chuck.conn...@nuance.com<chuck.conn...@nuance.com> > > > Subject:RE: zip file or tar file cosumption > > > > > > Another solution would be > > > > Using shell script do following > > 1. unzip txt files, > > 2. one by one merge those 50 (or N number of) text files > into one text file, > > 3. then the zip/tar that bigger text file, > > 4. then that big zip/tar file can be uploaded into hive. > > > > Keshav C Savant > > > > > From: Connell, Chuck [mailto:chuck.conn...@nuance.com] > Sent: Wednesday, September 26, 2012 4:04 PM > To: user@hive.apache.org > Subject: RE: zip file or tar file cosumption > > > > > This could be a problem. Hive uses newline as the record > separator. A ZIP file will certainly newline characters. So I > doubt this is possible. > > BUT, I would like to hear from anyone who has solved the > "newline is always a record separator" problem, because we ran > into it for another type of compressed file. > > Chuck > > > > ______________________________________________________________ > > From: Manish.Bhoge [manish.bh...@target.com] > Sent: Wednesday, September 26, 2012 3:17 AM > To: user@hive.apache.org > Subject: zip file or tar file cosumption > > > Hivers, > > > > I want to understand that would it be possible to utilize > zip/tar files directly into Hive. All the files has similar > schema (structure). Say 50 *.txt files are zipped into a > single zip file can we load data directly from this zip file > OR should we need to unzip first? > > > > Thanks & Regards > > Manish Bhoge | Technical > Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: > 22165 |! “Excellence is not a skill, It is an attitude.” > MySite > > > > > _____________ > The information contained in this message is proprietary > and/or confidential. If you are not the intended recipient, > please: (i) delete the message and all copies; (ii) do not > disclose, distribute or use the message in any manner; and > (iii) notify the sender immediately. In addition, please be > aware that any message addressed to our domain is subject to > archiving and review by persons other than the intended > recipient. Thank you. > > > _____________ > The information contained in this message is proprietary > and/or confidential. If you are not the intended recipient, > please: (i) delete the message and all copies; (ii) do not > disclose, distribute or use the message in any manner; and > (iii) notify the sender immediately. In addition, please be > aware that any message addressed to our domain is subject to > archiving and review by persons other than the intended > recipient. Thank you. > > > _____________ > The information contained in this message is proprietary > and/or confidential. If you are not the intended recipient, > please: (i) delete the message and all copies; (ii) do not > disclose, distribute or use the message in any manner; and > (iii) notify the sender immediately. In addition, please be > aware that any message addressed to our domain is subject to > archiving and review by persons other than the intended > recipient. Thank you. > > > > >