Hi,
I am trying to use hive windowing functions for a business use case. Hive
version is Apache Hive 0.11.
I have a table with a column end_date where value is 2999-12-31. While
using hive windowing function with this value, Hive is converting it to
1970s date.
*Query used is :*
SELECT account_id
Hello,
I've been looking for good ways to create and write to Hive tables from
Java code. So far, I've considered the following options:
1. Create Hive table using the JDBC client, write data to HDFS using bare
HDFS operations, and load that data into the Hive table using the JDBC
client. I didn'
Actually it should be more like:
*public Text evaluate (String s) {if(t==null) {
t=new Text("initialization");}else {
t.set(s.getBytes());}return t;}*
Your trying to avoid new if possible.
On Tue, Mar 25, 2014 at 9:09 PM, sky88088 wrote:
It works!
I really appreciate your help!
Best Regards,ypg
From: java8...@hotmail.com
To: user@hive.apache.org
Subject: RE: Does hive instantiate new udf object for each record
Date: Tue, 25 Mar 2014 09:57:25 -0400
The reason you saw that is because when you provide evaluate() method, you
di
We have a Hbase table.
Each time we aggreate the table based on some columns, we are doing full
scan for entire table.
What are the ideas for extracting just the delta or increments frokm the
last loading .
Right now i m following this approach. But want some better ideas.
- Mount the hbase into
bucketing is certainly helpful when you have finite number of values on a
different column in a partitioned column.
though bucketing would mean that when you load data into the table, it
can't be a straight forward load data in path, you will need to run it via
hive queries (which does not seem to
Hi,
Hive is mavenized, so please follow this link to build
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-MakingChanges
Hope It Helps you,
On Tue, Mar 25, 2014 at 9:12 PM, Nagarjuna Vissarapu <
nagarjuna.v...@gmail.com> wrote:
>
> Hi,
>
> Can you please
Hi Nitin/Prasan,
Thanks for your replies, I appreciate your help :)
Clustering looks to be quite close to what we want. However one main gap is
that we need to fire hive query to populate clusters. In our case, the
clustered data is already there. So computation in Hive query would be
redundant.
Hi Oliver,
Try to set these properties in core-site.xml: Using * will allow everyone
to impersonate Hive and the cluster needs to be restarted.
hadoop.proxyuser.hive.groups
users
Allow the superuser hive to impersonate any members of
the group users. Required only when installing Hive.
Hi Saumitra,
You might want to look into clustering within the partition. That is, partition
by "day", but cluster by "generated by" (within those partitions), and see if
that improves performance. Refer to the CLUSTER BY command in the Hive language
Manual.
-Prasan
On Mar 25, 2014, at 4:26
The reason you saw that is because when you provide evaluate() method, you
didn't specified the type of column it can be used. So Hive will just create
test instance again and again for every new row, as it doesn't know how or
which column to apply your UDF.
I changed your code as below:
public
in general when you have large number of partitions, your hive query
performance drops. This has been significantly addressed in current
releases but still see the performance issues. sadly I currently do not
have that larger dataset where I need to create large number of partitions.
This issue la
Hi Nitin,
We are not facing small files problem since data is in S3. Also we do not
want to merge files. Merging files are creating large analyze table for say
one day would slow down queries fired on specific day and *generated_by.*
Let me explain my problem in other words.
Right now we are over
see if this is what you are looking for https://github.com/sskaje/hive_merge
On Tue, Mar 25, 2014 at 4:21 PM, Saumitra Shahapure (Vizury) <
saumitra.shahap...@vizury.com> wrote:
> Hello,
>
> We are using Hive to query S3 data. For one of our tables named analyze,
> we generate data hierarchica
Hello,
We are using Hive to query S3 data. For one of our tables named analyze, we
generate data hierarchically. First level of hierarchy is date and second
level is a field named *generated_by*. e.g. for 20 march we may have S3
directories as
s3://analyze/20140320/111/
s3://analyze/20140320/222/
15 matches
Mail list logo