Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Nishant Aggarwal
Hello Sam, Please find attached PIG script for the same. You may find the necessary jars below. http://mvnrepository.com/artifact/com.twitter.elephantbird/elephant-bird-pig Note: Same functionality can be achieved in Hive as well. Thanks and Regards Nishant Aggarwal, PMP Cell No:- +91 99588

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Gopal Vijayaraghavan
Hi, > If you have any tutorial for extracting data from complex nested json >arrays (as the example given in my previous email), please send it. 90% of working with the real world is cleansing bad data. People under-sell hive's flexibility in situations like this. This is what I do hive> comp

RE: Using json_tuple for Nested json Arrays

2015-10-27 Thread Ryan Harris
I mean that I don't know what the return hive datatype of json_tuple is. If it is returning a json array based on the hive string datatype, explode() isn't going to be happy with that and it will throw the error that you are getting. if you do a create table as statement, you can then check the

row_number() over skew partition by clolumns

2015-10-27 Thread Loudongfeng
The following SQL will run very slow due to skew values in skew_col column: select row_number() over (partition by skew_col) from some_table; Is there any way to optimize it? Thanks

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Thanks Nishant! Will try using Pig json loader too to achieve this requirement. If you have any tutorial for extracting data from complex nested json arrays (as the example given in my previous email), please send it. Appreciate your help! Thanks, Joel On Tue, Oct 27, 2015 at 10:20 PM, Nishant A

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Nishant Aggarwal
Hello Sam, You can easily achieve this by using elephant-bird.jars in pig. We are also caturing tweets via flume and filter them using pig and elephant-jars. You can find the related jars over internet. Cheers, Nishant Aggarwal On 28 Oct 2015 00:50, "Sam Joe" wrote: > Hi, > > Is it possible to u

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi Ryan, I think tr3.media a complex json array having nested json tuple objects. For example, sizes is a json tuple object present inside the array which I think the function EXPLODE is not expecting. May be the explode function is expecting a closing brace '}' corresponding to the first brace '

Re: Locking when using the Metastore/HCatalog APIs.

2015-10-27 Thread Eugene Koifman
Wouldn't it make more sense for the api to acquire required locks automatically? That seems like a simpler user model. From: Alan Gates mailto:alanfga...@gmail.com>> Reply-To: "user@hive.apache.org" mailto:user@hive.apache.org>> Date: Tuesday, October 27, 2015 at 1

RE: Using json_tuple for Nested json Arrays

2015-10-27 Thread Ryan Harris
hmmm...I'm not sure what the return value type of json_tuple is... I'd probably try creating a temporary table from your working query below and then work on getting the lateral view explode to work against the temp table. FAILED: UDFArgumentException explode() takes an array or a map as a parame

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi Ryan, The statement returns null for media as shown below: hive> SELECT tr2.id, tr2.possibly_sensitive, tr2.media > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive', 'extended_entities.media') tr2 as id, extended_entities,

RE: Using json_tuple for Nested json Arrays

2015-10-27 Thread Ryan Harris
I see where you are going with this now Not sure if you might be bumping into this bug: https://issues.apache.org/jira/browse/HIVE-1575 since this line LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media pulls the JSON array as a "top-level" object... does this not work?

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi Ryan, The simple query is running fine as shown below: hive> SELECT tr2.id, tr2.possibly_sensitive > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive > where tr2.id=

RE: Using json_tuple for Nested json Arrays

2015-10-27 Thread Ryan Harris
looking at your sample data, you shouldn't need to use lateral view explode unless you are trying to get 1 entry per row for your media sizes (thumb, small, large, medium, etc) ... Try starting with something simple like : SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10

Maximum Size of String data type in hive

2015-10-27 Thread Kashif Hussain
Hi, Is there any limit on the size of value in a column of string data type ? Also, is there any performance impact of large values in string columns while querying ? Regards, Kashif

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi Ryan, Thanks for your reply! I didn't try using get_json_object() UDF. I will try using that and let you know the results. I tried using the following script which failed : SELECT tr2.id, tr2.possibly_sensitive, tr3.media, media_object.source_user_id FROM tweets_raw tr1 LATERAL VIEW js

RE: Using json_tuple for Nested json Arrays

2015-10-27 Thread Ryan Harris
Do you have an example of the query that you tried (which failed). In short, you probably want to use the get_json_object() UDF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object if you need the JSON array broken into individual records, you migh

Re: Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
I tried using EXPLODE function on the nested json array but it doesn't work and throws following error: FAILED: UDFArgumentException explode() takes an array or a map as a parameter Thanks, Joel On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe wrote: > Hi, > > Is it possible to use json_tuple functio

Using json_tuple for Nested json Arrays

2015-10-27 Thread Sam Joe
Hi, Is it possible to use json_tuple function to extract data from json arrays (nested too). I am trying to process json data as string and avoid using serdes since user data may be malformed. Please see a sample json data given below: { "filter_level": "low", "retweeted": false, "in_reply_t

Re: Locking when using the Metastore/HCatalog APIs.

2015-10-27 Thread Alan Gates
Answers inlined. Elliot West October 22, 2015 at 6:40 I notice from the Hive locking wiki page that locks may be acquired for a range of HQL DDL operations. I wanted to know how the locking scheme mapped mapp

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-27 Thread Gopal Vijayaraghavan
> I want to override partitionByHash function on Flink like the same way >of DBY on Hive. > I am working on implementing some benchmark system for these two system, >which could be contritbutino to Hive as well. I would be very disappointed if Flink fails to outperform Hive with a Distribute BY,

Re: Re: HiveServer2 load data inpath fails

2015-10-27 Thread Takahiko Saito
Hi Vineet, Were you able to find anything in HS2 log? I was just able to run 'load data inpath' with hive ver. 0.14 without any issue. My env is hive.execution.engine=tez though. Also Based on your error message, it may be worth checking what value is set for datanucleus.connectionPoolingType. I

Re: insert timestamp values in Hive

2015-10-27 Thread Alan Gates
Actually, for INSERT VALUES you don't have to have a transactional table (you do to use UPDATE or DELETE). So I would expect this to work as is. What happens if you do: create table foo (x int); insert into foo values (5); select * from foo; Do you get 5 or null? This will tell whether the

Re: How to use grouping__id in a query

2015-10-27 Thread Michal Krawczyk
Thanks Jesus, sorry for delayed response. Looking forward to the fix ;). On Wed, Oct 21, 2015 at 6:03 PM, Jesus Camacho Rodriguez < jcamachorodrig...@hortonworks.com> wrote: > I created HIVE-12223 to track this issue. > > Thanks, > Jesús > > > From: Jesus Camachorodriguez > Reply-To: "user@hive.a

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-27 Thread Philip Lee
Hello, the same question about DISTRIBUTE BY on Hive. Accorring to you, you do not use hashCode of Object class on DBY, Distribute By. I tried to understand how ObjectInspectorUtils works for distribution, but it seemed it has a lot of Hive API. It is not much understnading. I want to override pa

Re: insert timestamp values in Hive

2015-10-27 Thread AnandaVelMurugan Chandra Mohan
Hi, Thanks for the suggestions. I was planning to parquet format. I will read about transactions fully before proceeding. Regards, Anand On Tue, Oct 27, 2015 at 12:19 PM, Srinivas Thunga wrote: > Hi, > > If you want those properties to executed, then you need to create table in > ORC format an