Which approach for handling heavily nested JSONs ?

2013-04-05 Thread Himanshu Vijay
Hi, I have been dealing with some heavily nested and complex JSON data. It has all sorts of combinations like: Struct I wanted to know which approach you find better: using the SerDe or using the UDFs. In my opinion the two approaches can be compared in the following way. P

Re: builtins submodule - is it still needed?

2013-04-05 Thread Travis Crawford
Thanks for the background – I've filed https://issues.apache.org/jira/browse/HIVE-4304 and will remove them. --travis On Fri, Apr 5, 2013 at 4:45 PM, Owen O'Malley wrote: > +1 to removing them. > > We have a Rot13 example in > ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.j

bz2 compressed table usage?

2013-04-05 Thread Sushanth Sowmyan
Hi folks, Anyone have any experience using bz2 based compressed tables? I have the following .q file: == SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; SET hive.exec.max.dynamic.partitions=500; SET hive.exec.max.dynamic.partitions.pernode=500; SET hive.exec.

Re: builtins submodule - is it still needed?

2013-04-05 Thread Owen O'Malley
+1 to removing them. We have a Rot13 example in ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13{In,Out}putFormat.java anyways. *smile* -- Owen On Fri, Apr 5, 2013 at 3:11 PM, Gunther Hagleitner < ghagleit...@hortonworks.com> wrote: > +1 > > I would actually go a step further and propose to

Re: builtins submodule - is it still needed?

2013-04-05 Thread Gunther Hagleitner
+1 I would actually go a step further and propose to remove both PDK and builtins. I've went through the code for both and here is what I found: Builtins: - BuiltInUtils.java: Empty file - UDAFUnionMap: Merges maps. Doesn't seem to be useful by itself, but was intended as a building block for PDK

Re: builtins submodule - is it still needed?

2013-04-05 Thread Ashutosh Chauhan
I haven't used it myself anytime till now. Neither have met anyone who used it or plan to use it. Ashutosh On Thu, Apr 4, 2013 at 2:01 PM, Travis Crawford wrote: > Hey hive gurus - > > Is the "builtins" hive submodule in use? The submodule was added in > HIVE-2523 as a location for builtin-UDFs

Re: Bucketing external tables

2013-04-05 Thread Sadananda Hegde
Thanks, Mark. I found the problem. For some reason, Hive is not able to write Avro output file when the schema has a complex field with NULL option. It read without any problem; but cannot write with that structure. For example, Insert was failing on this array of structure field. { "name": "Pa

Re: Partition performance

2013-04-05 Thread Ramki Palle
Can you tell how many map tasks are there in each scenario? If my assumption is correct, you should have 336 in the first case and 14 in second case. It looks like it is combing all small files in a folder and running as one map task for all 24 files in a folder, whereas it is running a separate

Re: Partition performance

2013-04-05 Thread Ian
Thanks. This is just a test from my local box. So each file is only 1kb. I shared the query plans of these two tests at: http://codetidy.com/paste/raw/5198 http://codetidy.com/paste/raw/5199   Also in the Hadoop log, there is this line for each partition:org.apache.hadoop.hive.ql.exec.MapOperator

Re: Syntax for filters on timstamp data type

2013-04-05 Thread Mark Grover
Steffan, One thing that may be different is that equal can cast operands to make equals work but that may not be true for IN. FWIW, this is me just speculating, I haven't looked at the code just yet. Perhaps, you could explicit casting to get around this? On Fri, Apr 5, 2013 at 7:36 AM, LUTTER, S

RE: Syntax for filters on timstamp data type

2013-04-05 Thread LUTTER, Steffen
Equal, not equal, less than, less or equal, greater than, greater or equal all work. Also the function execution in the IN clause seems to work, as the error message states that the result type is bigint. Following the error message, it expects the input as timestamp, but I couldn't find a synta

Re: Syntax for filters on timstamp data type

2013-04-05 Thread Nitin Pawar
I am not sure IN clause supports executing functions in the query did it fail when you tried less than greater than type On Fri, Apr 5, 2013 at 7:36 PM, LUTTER, Steffen wrote: > Hi, > > ** ** > > I have a question regarding filters on timestamps. The syntax seems to be > UNIX_TIMESTAMP('y

Syntax for filters on timstamp data type

2013-04-05 Thread LUTTER, Steffen
Hi, I have a question regarding filters on timestamps. The syntax seems to be UNIX_TIMESTAMP('-MM-dd hh:mm:ss'), is there another way to express a datetime type? The problem is that I get an exception when using the IN syntax, while the equal comparison works without problems. Example: SE

RE: Error Creating External Table

2013-04-05 Thread Ranjitha Chandrashekar
Hi Piyush That was a problem with the path. There were other incompatible files in that directory. Thanks anyway.. :) From: Piyush Srivastava [mailto:piyush.srivast...@wizecommerce.com] Sent: 05 April 2013 15:23 To: user@hive.apache.org Subject: RE: Error Creating External Table When you givin

RE: Error Creating External Table

2013-04-05 Thread Piyush Srivastava
When you giving location give it as '/user/myfolder/items' hive know it is need to be store at HDFS which is define at $HADOOP_CONF_DIR/hdfs-site.xml. Thanks, ./Piyush From: Ranjitha Chandrashekar [ranjitha...@hcl.com] Sent: Friday, April 05, 2013 3:16 PM To: user

RE: External Table to Sequence File on HDFS

2013-04-05 Thread Ranjitha Chandrashekar
Hi Sanjay Thank you for the quick response. I got the input format part from the link that u sent. But in order to read that table in Hive, i need to specify the SerDe, where exactly do I specify this class file. Is it something like, create table seq10 (key STRING, value STRING) ROW FORMAT S

Error Creating External Table

2013-04-05 Thread Ranjitha Chandrashekar
Hi When i try creating a external table to a text file on HDFS i get the following error. Could someone please let me know where I am going wrong. hive> create external table seq4 (item STRING) row format delimited fields terminated by '' STORED as TEXTFILE location 'hdfs://:54310/user/myfold