Re: Optimizing hive queries

2013-03-28 Thread Owen O'Malley
On Thu, Mar 28, 2013 at 11:08 PM, Jagat Singh wrote: > Hello Owen, > > Thanks for your reply. > > I am seeing its providing the advantage which Avro provided , of adding > and removing fields. > ORC files like Avro files are self-describing. They include the type structure of the records in the

Re: Optimizing hive queries

2013-03-28 Thread Jagat Singh
Hello Owen, Thanks for your reply. I am seeing its providing the advantage which Avro provided , of adding and removing fields. Can you please write some sample code for hive table which is partitioned and each partitioned has different schema. I tried searching but could not find any example.

RE: A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
Sorry, the test should be following (changed extract_shas to extract_product_category): import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import org.apache.hadoop

Re: Optimizing hive queries

2013-03-28 Thread Owen O'Malley
Actually, Hive already has the ability to have different schemas for different partitions. (Although of course it would be nice to have the alter table be more flexible!) The "versioned metadata" means that the ORC file's metadata is stored in ProtoBufs so that we can add (or remove) fields to the

Re: Optimizing hive queries

2013-03-28 Thread Nitin Pawar
I could just find this link http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html according to this, the metadata is handled by protobuf which allows of adding/removing fields. On Fri, Mar 29, 2013 at 10:55 AM, Jagat Singh wrote: > Hello Nitin, > > Thanks for sharing.

Re: Optimizing hive queries

2013-03-28 Thread Jagat Singh
Hello Nitin, Thanks for sharing. Do we have more details on Versioned metadata feature of ORC ? , is it like handling varying schemas in Hive? Regards, Jagat Singh On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: > > Hi, > > Here is is a nice presentation from Owen from Hortonworks on "O

Optimizing hive queries

2013-03-28 Thread Nitin Pawar
Hi, Here is is a nice presentation from Owen from Hortonworks on "Optimizing hive queries" http://www.slideshare.net/oom65/optimize-hivequeriespptx Thanks, Nitin Pawar

Re: different outer join plan between hive 0.9 and hive 0.10

2013-03-28 Thread Navis류승우
The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212, HIVE-3464) and still not completely fixed even in trunk. Will be fixed shortly. 2013/3/29 wzc : > The bug remains even if I apply the patch in HIVE-4206 :( The explain > result hasn't change. > > > 2013/3/28 Navis류승우 >> >> It's

A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
I am trying to write a GenericUDF function to collect all of a specific struct field(s) within an array for each record, and return them in an array as well. I wrote the UDF (as below), and it seems to work but: 1) It does not work when I am performing this on an external table, it works fine on

Re: External table for hourly log files

2013-03-28 Thread Ian
Thanks, but it seems it relates only to inserts. How can I use dynamic partition on the query?   So if I changed the log path to use the Hive's directory naming convention (e.g., dt=2013-03-08/hr=01), I still need to Add Partition multiple times. How can dynamic partitions help in this case?    

Re: External table for hourly log files

2013-03-28 Thread Sanjay Subramanian
Hi You may want to look at Dynamic partitions https://cwiki.apache.org/Hive/dynamicpartitions.html Thanks sanjay From: Ian mailto:liu...@yahoo.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>>, Ian mailto:liu...@yahoo.com>> Date: Thursday, March 28

External table for hourly log files

2013-03-28 Thread Ian
Hi,   We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:     /my/logs/2013-03-08/01/00_0/my/logs/2013-03-08/02/00_0 /my/logs/2013-03-08/03/00_0     ...   Now we want to create external table to query the log da

Re: different outer join plan between hive 0.9 and hive 0.10

2013-03-28 Thread wzc
The bug remains even if I apply the patch in HIVE-4206 :( The explain result hasn't change. 2013/3/28 Navis류승우 > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206). > > Thanks for reporting it. > > 2013/3/24 wzc : > > Recently we tried to upgrade our hive from 0.9 to 0.10, but found

Problem with Custom InputFormat

2013-03-28 Thread Peter Marron
Hi, I seem to have a problem getting Hive to use a custom InputFormat. I am using Hive version 0.10.0 with Hadoop 1.0.4 on Centos 6.3 currently in standalone mode. At this stage I am just experimenting. I have a file with 10 records which I am using for testing. I've created a table called zownv