Fwd: Future date getting converted to epoch date with windowing function

2014-03-25 Thread Akansha Jain
Hi, I am trying to use hive windowing functions for a business use case. Hive version is Apache Hive 0.11. I have a table with a column end_date where value is 2999-12-31. While using hive windowing function with this value, Hive is converting it to 1970s date. *Query used is :* SELECT account_id

Writing to Hive tables programmatically

2014-03-25 Thread Jiahua Wang
Hello, I've been looking for good ways to create and write to Hive tables from Java code. So far, I've considered the following options: 1. Create Hive table using the JDBC client, write data to HDFS using bare HDFS operations, and load that data into the Hive table using the JDBC client. I didn'

Re: Does hive instantiate new udf object for each record

2014-03-25 Thread Edward Capriolo
Actually it should be more like: *public Text evaluate (String s) {if(t==null) { t=new Text("initialization");}else { t.set(s.getBytes());}return t;}* Your trying to avoid new if possible. On Tue, Mar 25, 2014 at 9:09 PM, sky88088 wrote:

RE: Does hive instantiate new udf object for each record

2014-03-25 Thread sky88088
It works! I really appreciate your help! Best Regards,ypg From: java8...@hotmail.com To: user@hive.apache.org Subject: RE: Does hive instantiate new udf object for each record Date: Tue, 25 Mar 2014 09:57:25 -0400 The reason you saw that is because when you provide evaluate() method, you di

Delta or incremental loading for Hbase table

2014-03-25 Thread Manjula mohapatra
We have a Hbase table. Each time we aggreate the table based on some columns, we are doing full scan for entire table. What are the ideas for extracting just the delta or increments frokm the last loading . Right now i m following this approach. But want some better ideas. - Mount the hbase into

Re: Handling hierarchical data in Hive

2014-03-25 Thread Nitin Pawar
bucketing is certainly helpful when you have finite number of values on a different column in a partitioned column. though bucketing would mean that when you load data into the table, it can't be a straight forward load data in path, you will need to run it via hive queries (which does not seem to

Re: Buildfile: build.xml does not exist!

2014-03-25 Thread Chinna Rao Lalam
Hi, Hive is mavenized, so please follow this link to build https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-MakingChanges Hope It Helps you, On Tue, Mar 25, 2014 at 9:12 PM, Nagarjuna Vissarapu < nagarjuna.v...@gmail.com> wrote: > > Hi, > > Can you please

Re: Handling hierarchical data in Hive

2014-03-25 Thread Saumitra Shahapure (Vizury)
Hi Nitin/Prasan, Thanks for your replies, I appreciate your help :) Clustering looks to be quite close to what we want. However one main gap is that we need to fire hive query to populate clusters. In our case, the clustered data is already there. So computation in Hive query would be redundant.

Re: Permission denied creating external table

2014-03-25 Thread Abdelrahman Shettia
Hi Oliver, Try to set these properties in core-site.xml: Using * will allow everyone to impersonate Hive and the cluster needs to be restarted. hadoop.proxyuser.hive.groups users Allow the superuser hive to impersonate any members of the group users. Required only when installing Hive.

Re: Handling hierarchical data in Hive

2014-03-25 Thread Prasan Samtani
Hi Saumitra, You might want to look into clustering within the partition. That is, partition by "day", but cluster by "generated by" (within those partitions), and see if that improves performance. Refer to the CLUSTER BY command in the Hive language Manual. -Prasan On Mar 25, 2014, at 4:26

RE: Does hive instantiate new udf object for each record

2014-03-25 Thread java8964
The reason you saw that is because when you provide evaluate() method, you didn't specified the type of column it can be used. So Hive will just create test instance again and again for every new row, as it doesn't know how or which column to apply your UDF. I changed your code as below: public

Re: Handling hierarchical data in Hive

2014-03-25 Thread Nitin Pawar
in general when you have large number of partitions, your hive query performance drops. This has been significantly addressed in current releases but still see the performance issues. sadly I currently do not have that larger dataset where I need to create large number of partitions. This issue la

Re: Handling hierarchical data in Hive

2014-03-25 Thread Saumitra Shahapure (Vizury)
Hi Nitin, We are not facing small files problem since data is in S3. Also we do not want to merge files. Merging files are creating large analyze table for say one day would slow down queries fired on specific day and *generated_by.* Let me explain my problem in other words. Right now we are over

Re: Handling hierarchical data in Hive

2014-03-25 Thread Nitin Pawar
see if this is what you are looking for https://github.com/sskaje/hive_merge On Tue, Mar 25, 2014 at 4:21 PM, Saumitra Shahapure (Vizury) < saumitra.shahap...@vizury.com> wrote: > Hello, > > We are using Hive to query S3 data. For one of our tables named analyze, > we generate data hierarchica

Handling hierarchical data in Hive

2014-03-25 Thread Saumitra Shahapure (Vizury)
Hello, We are using Hive to query S3 data. For one of our tables named analyze, we generate data hierarchically. First level of hierarchy is date and second level is a field named *generated_by*. e.g. for 20 march we may have S3 directories as s3://analyze/20140320/111/ s3://analyze/20140320/222/