Re: Skewed Tables

2014-04-27 Thread Lefty Leverenz
Prasanth, Hive's user docs are wiki-only at this point so there's no version control. We just add notes about which release introduced or changed something. For an example see the beginning of the Skewed Tables

Re: Skewed Tables

2014-04-27 Thread Prasanth Jayachandran
@Mayur.. I don’t think the initial design considered CTAS for skewed tables. So it might not be supported at all. @Lefty.. I am not sure where/how the docs are maintained. Is it version controlled? Or is it only maintained in confluence wiki? If it is the later can you please provide me access

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
So one more follow-up: The 16-.25-Success turns to a fail if I throw more data (and hence more partitions) at the problem. Could there be some sort of issue that rears it's head based on the number of output dynamic partitions? Thanks all! On Sun, Apr 27, 2014 at 3:33 PM, John Omernik wrote:

Re: Executing Hive Queries in Parallel

2014-04-27 Thread Manish Malhotra
What Sanjay and Swagatika replied are perfect. Plus fundamentally if you see, if you are able to run the hive query from CLI or some internal API like HiveDriver, the flow will be this: >> Compile the query >> Get the info from Hive Metastore using Thrift or JDBC, Optimize it ( if required and ca

Re: Executing Hive Queries in Parallel

2014-04-27 Thread Swagatika Tripathy
Hi, You can also use oozie's fork fearure which acts as a workflow scheduler to run jobs in parallel. You just need to define all our hql's inside the workflow.XML to make it run in parallel. On Apr 22, 2014 3:14 AM, "Subramanian, Sanjay (HQP)" < sanjay.subraman...@roberthalf.com> wrote: > Hey

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
Here is some testing, I focused on two variables (Not really understanding what they do) orc.compress.size (256k by default) hive.exec.orc.memory.pool (0.50 by default). The job I am running is a admittedly complex job running through a Python Transform script. However, as noted above, RCFile wri

Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
Hello all, I am working with Hive 0.12 right now on YARN. When I am writing a table that is admittedly quite "wide" (there are lots of columns, near 60, including one binary field that can get quite large). Some tasks will fail on ORC file write with Java Heap Space Issues. I have confirmed th

Re: What is the minimal required version of Hadoop for Hive 0.13.0?

2014-04-27 Thread Lefty Leverenz
This needs to be documented somewhere. -- Lefty On Wed, Apr 23, 2014 at 11:45 PM, Edward Capriolo wrote: > LOL. I thought I was the last 0.20.2 hold out. > > > On Wed, Apr 23, 2014 at 4:01 PM, Thejas Nair > wrote: > > > There is a jira for the hadoop 1.0.x compatibility issue. > > https://issu