Basic question regarding calculating number of reducers

2014-06-27 Thread KayVajj
Hi, I have a basic question regarding the calculation of the number of reducers in hive. I know that is computed as /. In case of compressed files it is not clear whether total input size is calculated when compressed or decompressed. Doesn't it make a significant difference if calculated when co

Determining the number of buckets in hive

2014-06-26 Thread KayVajj
Hi, I know this question might have been beaten to death, but I could not find an answer to a particular question. I'm using hive 0.10.x I have a table partitioned on day and I would like to bucket the table on a different column to avail of the SMB join optimization. I have seen an earlier threa

Question regarding cluster by multiple columns

2014-01-26 Thread KayVajj
Hi, I'm studying the bucketed tables as an option for my storage. What would be use case where it is useful to cluster by multiple columns? I 'm trying to solve a problem of optimizing a join between two tables with filtering. Let's say Table A has columns (id, country, .) and table has colu

Re: Using Cluster by to improve Group by Performance

2013-10-31 Thread KayVajj
Any response or pointers to understand how Cluster By in sub queries can affect the performance/speed of outer queries is helpful. Thanks Kay On Mon, Oct 28, 2013 at 1:17 PM, KayVajj wrote: > Hi, > > I have a question if I could use the cluster by clause in a sub query to >

Using Cluster by to improve Group by Performance

2013-10-28 Thread KayVajj
Hi, I have a question if I could use the cluster by clause in a sub query to improve the performance of a group by query in hive Lets I have a Table A with columns (all strings) col1..col5 and the table is not "Clustered" now I 'm trying to run the below query select > col1, > col2, > col3, > c

What is the use of compressed field in the hive storage descriptor

2013-09-26 Thread KayVajj
A desc extended command results in the following Table( tableName:table_namet, dbName:cybs_test, owner:x...@abc.com, createTime:1380232668, lastAccessTime:0, retention:0, sd:StorageDescriptor( cols:[ FieldSchema(name:a, type:string, comment:null), FieldSchema(name:b, type:string, comment:nul

non-string partition column types is it discouraged

2013-06-13 Thread KayVajj
I have a question regarding the partition column types in a Hive table. We run hive 0.9.0 in a cloudera distribution and we're having issue trying to connect to hive using the Cloudera Tableau ODBC Connector. I'm unable to use a partition column of type int. Trying to find a solution I chanced upo