Re: dfs storage full on all slave machines of 6 machine hive cluster

2013-03-19 Thread Chunky Gupta
> hive tables or add more disk ( dropping a EXTERNAL hive table doesn't clear > the data from HDFS) > > Thanks, > > > On Mon, Mar 18, 2013 at 9:28 PM, Chunky Gupta wrote: > >> Hi Zhiwen, >> >> /mnt/hadoop-fs/mapred/local/taskTracker/ >> >>

Re: dfs storage full on all slave machines of 6 machine hive cluster

2013-03-18 Thread Chunky Gupta
;> Sent from HTC via Rocket! excuse typo. >> >> -- >> * From: * Chunky Gupta ; >> * To: * ; >> * Subject: * dfs storage full on all slave machines of 6 machine hive >> cluster >> * Sent: * Mon, Mar 18, 2013 10:37:39 AM >> &

dfs storage full on all slave machines of 6 machine hive cluster

2013-03-18 Thread Chunky Gupta
Hi, We have a 6 machine hive cluster. We are getting errors while a query is running and it fails. I found that on all 5 slaves storage is nearly full ( 96%, 98%, 100%, 97%, 98% storage used) . On my slaves machines, this folder "/mnt/hadoop-fs/dfs/data/current/" is contributing 95% storage used.

Re: Adding comment to a table for columns

2013-02-21 Thread Chunky Gupta
ct syntax. > > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > > From: Chunky Gupta > Date: Thu, 21 Feb 2013 17:15:37 +0530 > To: ; ; < snehalata_bhas...@syntelinc.com> > ReplyTo: user@hive.apache.org > Subj

Re: Adding comment to a table for columns

2013-02-21 Thread Chunky Gupta
----- > *From: * Chunky Gupta > *Date: *Thu, 21 Feb 2013 16:46:30 +0530 > *To: * > *ReplyTo: * user@hive.apache.org > *Subject: *Adding comment to a table for columns > > Hi, > > I am using this syntax to add comments for all columns :- > > C

Adding comment to a table for columns

2013-02-21 Thread Chunky Gupta
Hi, I am using this syntax to add comments for all columns :- CREATE EXTERNAL TABLE test ( c STRING COMMENT 'Common class', time STRING COMMENT 'Common time', url STRING COMMENT 'Site URL' ) PARTITIONED BY (dt STRING ) LOCATION 's3://BucketName/' Output of Describe Extended table is like :- (O

Re: Need tab separated output file and put limit on number of lines in a output file

2013-02-20 Thread Chunky Gupta
. > > Mark > > On Tue, Feb 19, 2013 at 10:53 PM, Chunky Gupta > wrote: > > Hi, > > > > Currently the output file columns of my query is separate by "^A", I > need my > > output to be separated by tab. Can anybody help me in setting this ?

Need tab separated output file and put limit on number of lines in a output file

2013-02-19 Thread Chunky Gupta
Hi, Currently the output file columns of my query is separate by "^A", I need my output to be separated by tab. Can anybody help me in setting this ? I more doubt, I want to limit the number of lines in output files. For example, I do not want any of my output file to be more than 1000 lines, can

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

2013-02-18 Thread Chunky Gupta
18, 2013 at 8:47 PM, Chunky Gupta wrote: > Hi Dean, > > I tried with removing underscore too, and getting the same output which > means problem is not with underscore. Yes, it was an example. > > Actual json file is like :- > > > {"colnamec":"ColNam

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

2013-02-18 Thread Chunky Gupta
extracting one column only as I mentioned in last mail. There are values not in double quotes, some are null and some keys are having multiple values. Dean, is this json file correct for HIVE to handle it ? Thanks, Chunky. On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler < dean.wamp...@thinkbiganaly

Loading json files into hive table is giving NULL as output(data is in s3 bucket)

2013-02-18 Thread Chunky Gupta
Hi, I have data in s3 bucket, which is in json format and is a zip file. I have added this jar file in hive console :- http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q= I tried the following steps to create table and load data :- 1. CREATE EXTERNAL T

Change timestamp format in hive

2013-02-13 Thread Chunky Gupta
Hi, I have a log file which has timestamp in format "-MM-DD-HH:MM:SS". But since the timestamp datatype format in hive is "-MM-DD HH:MM:SS". I created a table with datatype of that column as TIMESTAMP. But when I load the data it is throwing error. I think it is because of difference in fo

Re: Getting Error while executing "show partitions TABLE_NAME"

2013-02-07 Thread Chunky Gupta
Hi Venkatesh, I checked and found that /tmp is having less space left. I moved my db to other location having space and it is working fine now. Thanks, Chunky. On Thu, Feb 7, 2013 at 12:41 AM, Venkatesh Kavuluri wrote: > Looks like it's memory/ disk space issue with your database server used to

Getting Error while executing "show partitions TABLE_NAME"

2013-02-06 Thread Chunky Gupta
Hi All, I ran this :- hive> show partitions tab_name; and got this error :- FAILED: Error in metadata: javax.jdo.JDODataStoreException: Error executing JDOQL query "SELECT `THIS`.`PART_NAME` AS NUCORDER0 FROM `PARTITIONS` `THIS` LEFT OUTER JOIN `TBLS` `THIS_TABLE_DATABASE` ON `THIS`.`TBL_ID` = `

Does Hue (Hadoop User Experience) works with Apache HIVE/HADOOP

2012-12-28 Thread Chunky Gupta
Hi, I have Apache Hive and Apache Hadoop on Amazon EC2 machines. If anyone can tell me that can HUE be used with this setup instead of CHD Hadoop cluster. If not, then is there any alternate UI similar to HUE. Please help. Thanks, Chunky.

Re: Alter table is giving error

2012-11-27 Thread Chunky Gupta
OAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable; It gives this error:- "Error in semantic analysis: Line 1:17 Invalid path ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems accepted" Please help

Re: hive query not running in cron job

2012-11-23 Thread Chunky Gupta
Thanks, its working after adding this line :) Chunky. On Fri, Nov 23, 2012 at 11:24 AM, wd wrote: > Add the following line before your crontab config > > source ~/.bashrc > > > > On Thu, Nov 22, 2012 at 5:59 PM, Chunky Gupta wrote: > >>

hive query not running in cron job

2012-11-22 Thread Chunky Gupta
Hi, I have a python script :- ---cron_script.py--- import os import sys from subprocess import call print 'starting' call(['hive', '-f', '/mnt/user/test_query'],stderr=open('/mnt/user/tmp/error','w'), stdout=open('/mnt/user/tmp/output','w')) --

Re: Alter table is giving error

2012-11-07 Thread Chunky Gupta
ory but before you populate your directory with data. > > Mark > > > On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta wrote: > >> Hi Mark, >> Sorry, I forgot to mention. I have also tried >> msck repair table ; >> and same output I got which I

Re: Alter table is giving error

2012-11-06 Thread Chunky Gupta
al+DDL#LanguageManualDDL-Recoverpartitions > > Mark > > > On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta wrote: > >> Hi Mark, >> I didn't get any error. >> I ran this on hive console:- >> "msck table Table_Name;" >> It says Ok and s

Re: Alter table is giving error

2012-11-06 Thread Chunky Gupta
ions. Thanks, Chunky. On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover wrote: > Glad to hear, Chunky. > > Out of curiosity, what errors did you get when using msck? > > > On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta wrote: > >> Hi Mark, >> I tried msck, but it is not wor

Re: Alter table is giving error

2012-11-06 Thread Chunky Gupta
; Recover partitions should work the same way for different file systems. >> >> Edward >> >> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler >> wrote: >> > Writing a script to add the external partitions individually is the >> only way >> > I know of.

Re: Alter table is giving error

2012-11-05 Thread Chunky Gupta
scusses this feature and other aspects > of using Hive in EMR. > > > dean > > > On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta wrote: > >> Hi, >> >> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive >> version 0.8.1 (I configured

Alter table is giving error

2012-11-05 Thread Chunky Gupta
Hi, I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :- CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/'; Now I am

Re: Enabling fair scheduler using Bootstrap is failing

2012-10-29 Thread Chunky Gupta
cker ip-10-116-159-127.ec2.internal:9001 Please suggest any solution for this. Thanks, Chunky. On Mon, Oct 29, 2012 at 7:10 PM, Chunky Gupta wrote: > Hi, > > I tried this also in optional arguments "--site-config-file > s3://viz-emr-hive/config/mapred-site.xml -m > m

Re: Enabling fair scheduler using Bootstrap is failing

2012-10-29 Thread Chunky Gupta
n do to make it work. Thanks, Chunky. On Mon, Oct 29, 2012 at 6:37 PM, Chunky Gupta wrote: > Hi, > > I am trying to enable fair scheduler on my emr cluster at bootstrap. The > steps I am doing are : > > 1. Creating Job instance from AWS console as "Create New Job Flow"

Enabling fair scheduler using Bootstrap is failing

2012-10-29 Thread Chunky Gupta
Hi, I am trying to enable fair scheduler on my emr cluster at bootstrap. The steps I am doing are : 1. Creating Job instance from AWS console as "Create New Job Flow" with Job Type as Hive program. 2. Selecting "Start an Interactive Hive Session". 3. Selecting Master and core instance group and A

Executing queries after setting hive.exec.parallel in hive-site.xml

2012-10-25 Thread Chunky Gupta
Hi , I have 2 questions regarding the parameter 'hive.exec.parallel' in hive-site.xml in ~/.versions/hive-0.8.1/conf/ 1. how do I verify from query log files or from any other way, that a particular query is executing with this parameter set to true. 2. Is it advisable to set this parameter for r

Re: How to run multiple Hive queries in parallel

2012-10-22 Thread Chunky Gupta
hausted and you need parallelism here, then >> you may need to look at some approaches of using fair scheduler and >> different user accounts for each user so that each user gets his fair share >> of task slots. >> >> >> Regards >> Bejoy KS >> >&g

How to run multiple Hive queries in parallel

2012-10-22 Thread Chunky Gupta
Hi, I have one name node machine and under which there are 4 slaves machines to run the job. The way users run queries is - They ssh into the name node machine - They initiate hive and submit their queries Currently multiple users log in with the same credentials and submit queries Whenever 2 o