Re: Skewed Join

2013-06-06 Thread Nitin Pawar
can you share your query and your table definition ? On Fri, Jun 7, 2013 at 12:12 PM, sumit ghosh wrote: > Hi, > > I am hit by skewed Join, my last reducer is getting same number of Reduce > input groups/records. > Reduce input groups 432,446,942 > Reduce shuffle bytes

Skewed Join

2013-06-06 Thread sumit ghosh
Hi,   I am hit by skewed Join, my last reducer is getting same number of Reduce input groups/records. Reduce input groups  432,446,942 Reduce shuffle bytes  13,012,613,275 Reduce input records 432,446,942    Why is this happening? I have tur

RE: What is HIVE_PLAN?

2013-06-06 Thread Li jianwei
Hi FangKun: Thanks for your reply! I ran the "select count(*)" again, and check the JobConf, find the property you mentioned, they were as following: hive.exec.plan hdfs://192.168.1.112:9100/tmp/hive-cyg_server/hive_2013-06-07_12-56-10_656_195237350266205704/-mr-10003/e1438d71-2497-4834-a89e-8b2e

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi Ted, All, Unfortunately profiling turns out to be extremely slow, so it's not very fruitful for determining what's going on here. On the other hand I seem to have traced this problem down to the "hive.task.progress" configuration variable. When this is set to true (as it is automatically when

Re: Hive Header line in Select query help?

2013-06-06 Thread FangKun Cao
There's a issue working on this : https://issues.apache.org/jira/browse/HIVE-4346 2013/6/7 Nitin Pawar > if you do not have hiverc file then you can create one and use it with > --hiveconf along side your hive execution > you also set the same value just before your select query starts > > > On

Re: What is HIVE_PLAN?

2013-06-06 Thread FangKun Cao
It's kept in JobConf as part of the plan file name. Check the link below http://hdfs-namenode:50030/jobconf.jsp?jobid=job_201306070901_0001 and find * hive.exec.plan * and * hive.exec.scratchdir* Do you have proper Read and Write permissions ? 2013/6/7 Li jianwei > Hi, everyone: > I have

What is HIVE_PLAN?

2013-06-06 Thread Li jianwei
Hi, everyone: I have set up a hadoop cluster on THREE windows7 machines with Cygwin, and made several test, which were all passed, with hadoop-test-1.1.2.jar and hadoop-examples-1.1.2.jar. Then I tried to run Hive 0.10.0 on my cluster ( also in Cygwin ). I could create tables, show them, load d

Properly escaping newlines for Hive / HBase

2013-06-06 Thread Rob Roland
I have an HBase table I've defined as an external table in Hive, and I'm having trouble determining the proper escaping of newlines in the byte arrays. The primary use-case of this table is writing via the HBase client API, then reading via HiveQL select queries against HiveServer2. I've found

Re: Textfile compression using Gzip codec

2013-06-06 Thread Stephen Sprague
aha! All's well that ends well then! :) On Thu, Jun 6, 2013 at 9:49 AM, Sachin Sudarshana wrote: > Hi Stephen, > > Thank you for your reply. > > But, its the silliest error from my side. Its a typo! > > The codec is : org.apache.hadoop.io.compress.*GzipCodec* and not > org.apache.hadoop.io.com

Re: Hive Header line in Select query help?

2013-06-06 Thread Nitin Pawar
if you do not have hiverc file then you can create one and use it with --hiveconf along side your hive execution you also set the same value just before your select query starts On Thu, Jun 6, 2013 at 10:13 PM, Matouk IFTISSEN wrote: > I use Hortonworks on windows (HDInsight) and I dont’ find

Re: Textfile compression using Gzip codec

2013-06-06 Thread Sachin Sudarshana
Hi Stephen, Thank you for your reply. But, its the silliest error from my side. Its a typo! The codec is : org.apache.hadoop.io.compress.*GzipCodec* and not org.apache.hadoop.io.compress.*GZipCodec.* * * I regret making that mistake. Thank you, Sachin On Thu, Jun 6, 2013 at 10:07 PM, Stephen

RE: Hive Header line in Select query help?

2013-06-06 Thread Matouk IFTISSEN
I use Hortonworks on windows (HDInsight) and I dont’ find the file hiverc , where can I fin it? And where I add set hive.cli.print.header=true; Thanks; *De :* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Envoyé :* jeudi 6 juin 2013 18:31 *À :* user@hive.

Re: Textfile compression using Gzip codec

2013-06-06 Thread Stephen Sprague
Hi Sachin, LIke you say looks like something to do with the GZipCodec all right. And that would make sense given your original problem. Yeah, one would think it'd be in there by default but for whatever reason its not finding it but at least the problem is now identified. Now _my guess_ is that m

Re: Hive Header line in Select query help?

2013-06-06 Thread Nitin Pawar
why don't you set the variable in your script or in the hiverc file ? then it will work all the time On Thu, Jun 6, 2013 at 9:43 PM, Matouk IFTISSEN wrote: > I use a script to query Hive and store results in Local file, but no > headers after execution of the select query how to do this? thanks

RE: Hive Header line in Select query help?

2013-06-06 Thread Matouk IFTISSEN
I use a script to query Hive and store results in Local file, but no headers after execution of the select query how to do this? thanks *De :* Nitin Pawar [mailto:nitinpawar...@gmail.com] *Envoyé :* jeudi 6 juin 2013 18:02 *À :* user@hive.apache.org *Objet :* Re: Hive Header line in Select quer

Re: Hive Header line in Select query help?

2013-06-06 Thread Nitin Pawar
if you look at the setting it says "hive.cli" that means it is supposed to work on cli only. Which interface you want to use it ? On Thu, Jun 6, 2013 at 9:12 PM, Matouk IFTISSEN wrote: > Hello Ivry bady, > > > > I want to know if is there a way to export headers line in select query > to store

Hive Header line in Select query help?

2013-06-06 Thread Matouk IFTISSEN
Hello Ivry bady, I want to know if is there a way to export headers line in select query to store the result in file from local or HDFS directory? like this query results : set hive.cli.print.header=true; INSERT OVERWRITE LOCAL DIRECTORY 'C:\resultats\alerts_http_500\par_heure' SELECT

Re: Textfile compression using Gzip codec

2013-06-06 Thread Sachin Sudarshana
Hi Stephen, *hive> show create table facts520_normal_text;* *OK* *CREATE TABLE facts520_normal_text(* * fact_key bigint,* * products_key int,* * retailers_key int,* * suppliers_key int,* * time_key int,* * units int)* *ROW FORMAT DELIMITED* * FIELDS TERMINATED BY ','* * LINES TERMINATED B

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Ted Xu
Hi Shaun, This is weird. I'm not sure if there is any other reasons (e.g., a very complex UDF?) caused this issue, but it would be the best if you can do a profiling, see if there is hot spot. On Thu, Jun 6, 2013 at 4:38 PM, Sh

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi Ted, It's actually just one partition being created which is what makes it so weird. Thanks, Shaun On 6 June 2013 18:36, Ted Xu wrote: > Hi Shaun, > > Too many partitions in dynamic partitioning may slow down the mapreduce > job. Can you estimate how many partitions will be generated after

Issue with creating HIVE metadata for a HBASE table with 2000 + columns

2013-06-06 Thread shouvanik.haldar
Hi, I have a HBASE table with 2000 + columns. I have to create a HIVE metadata. But am facing issue while creating the HIVE table FAILED: Error in metadata: MetaException(message:javax.jdo.JDODataStoreException: Put request failed : INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_K

Re: Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Ted Xu
Hi Shaun, Too many partitions in dynamic partitioning may slow down the mapreduce job. Can you estimate how many partitions will be generated after insert? On Thu, Jun 6, 2013 at 4:24 PM, Shaun Clowes wrote: > Hi All, > > Does anyone know the performance impact the dynamic partitions should be

Extremely slow throughput with dynamic partitions using Hive 0.8.1 in Amazon Elastic Mapreduce

2013-06-06 Thread Shaun Clowes
Hi All, Does anyone know the performance impact the dynamic partitions should be expected to have? I have a table that is partitioned by a string in the form '-MM'. When I insert in to this table (from an external table that is just an S3 bucket containing gzipped logs) using dynamic partitio