Re: Compaction in hive

2016-12-07 Thread Nishant Aggarwal
Hi Allan, Good Morning Thanks for your reply. We have lots of external tables with parquet and gzip format. Data will be pushed to those tables on regular interval with volume close to 10-20GB/per day. Our concern is that this process will generate lots of small files in the tables. We are searc

group by across multiple partitions of clustered table.

2016-12-07 Thread Jan Morlock
Hi, in our company, we are using a Hive table which is both partitioned and clustered similar to the following snippet: PARTITIONED BY (year INT, month INT, day INT, feed STRING) CLUSTERED BY (key) INTO 1024 BUCKETS Using this input table we regularly perform queries where we group by key across

Schema evolution in hive tables

2016-12-07 Thread Rajat Khandelwal
So far, my understanding has been that in Hive tables, each partition has a schema and whenever you add a partition to a Hive table, the current table schema is copied into the partition schema. This should allow a seamless evolution of the schema. Recently I came across something that contradicts

RE: When Hive on Spark will support Spark 2.0?

2016-12-07 Thread Joaquin Alzola
The version that will support Spark2.0 is Hive2.2 No not know yet when this is going to be release. -Original Message- From: baipeng [mailto:b...@meitu.com] Sent: 07 December 2016 08:04 To: user@hive.apache.org Subject: When Hive on Spark will support Spark 2.0? Does Anyone know when Hiv

When Hive on Spark will support Spark 2.0?

2016-12-07 Thread baipeng
Does Anyone know when Hive will release version to support Spark 2.0? Now hive 2.1.0 only supports spark 1.6.