Hi,
I have a hive table which uses the jar file provided from the elephant-bird,
which is a framework integrated between lzo and google protobuf data and
hadoop/hive.
If I use the hive command like this:
hive --auxpath path_to_jars, it works fine to query my table,
but if I use the add jar aft
This is in HIVE-0.9.0
hive> list
jars;/nfs_home/common/userlibs/google-collections-1.0.jar/nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar/nfs_home/common/userlibs/protobuf-java-2.3.0.jar/nfs_home/common/userlibs/elephant-bird-core-3.0.7.jarfile:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.
Hi,
Hive 0.9.0 + Elephant-Bird 3.0.7
I faced a problem to use the elephant-bird with hive. I know what maybe cause
this problem, but I don't know which side this bug belongs to. Let me know
explain what is the problem.
If we define a google protobuf file, with field name like 'dateString' (the
conf while hive client is running. #hive -hiveconf
hive.root.logger=ALL,console -e " DDL statement ;"#hive -hiveconf
hive.root.logger=ALL,console -f ddl.sql ; Hope this helps
Thanks
On Mar 20, 2013, at 1:45 PM, java8964 java8964 wrote:Hi,
I have the hadoop running in pseudo-distributed mode on
I have the requirement trying to support in hive, not sure if it is doable.
I have the hadoop 1.1.1 with Hive 0.9.0 (Using deby as the meta store)
If I partition my data by a dt column, so if my table 'foo' have some
partitions like 'dt=2013-07-01' to 'dt=2013-07-30'.
Now the user want to query al
Hi,
I have a question about the behavior of the class
org.apache.hadoop.hive.contrib.serde2.RegexSerDe. Here is the example I tested
using the Cloudra hive-0.7.1-cdh3u3 release. The above class did NOT do what I
expect, any one knows the reason?
user:~/tmp> more Test.javaimport java.io.*;impor
Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
I am trying to write a hive UDF function as to calculate the moving sum. Right
now, I am having trouble to get the constrant value passed in in the
initialization stage.
For example, let's assume the function is like the fo
2012 at 4:17 AM, java8964 java8964 wrote:
Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version.
I am trying to write a hive UDF function as to calculate the moving sum. Right
now, I am having trouble to get the constrant value passed in in the
initialization stage.
For exampl
Hi,
I am trying to implement a UDAF of Kurtosis (<�a
href="http://en.wikipedia.org/wiki/Kurtosis";>http://en.wikipedia.org/wiki/Kurtosis<�/a>
in the hive.
I already found a library to do it, from Apache commons math (<�a
href="http://commons.apache.org/math/apidocs/org/apache/commons/math/stat
If you don't need to join current_web_page and previous_web_page, assuming you
can just trust the time stamp, as Phil points out, an custom UDF of
collect_list() is the way to go.
You need to implement collect_list() UDF by yourself, hive doesn't have one by
default.But it should be straight fo
This is not a hive but a SQL question.
You need to be more clear about your data, and try to think a way to solve your
problem. Without the detail about your data, no easy way to answer your
question.
For example, just based on your example data you provide, does the 'abc' and
'cde' only happen
Hi,
Our company current is using CDH3 release, which comes with Hive 0.7.1.
Right now, I have the data coming from another team, which also provides the
custom InputFormat and RecorderReader, but using the new mapreduce API.
I am trying to build a hive table on these data, and hope I can reuse t
Hi, In our project to use the HIVE on CDH3U4 release (Hive 0.7.1), I have a
hive table like the following:
Table foo ( search_results array>
search_clicks array>)
As you can see, the 2nd column, which represents the list of search results
clicked, contains the index location of which result
OK.
I followed the hive source code of
org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the
UDF. It is quite simple.
It works fine as I expected for simple case, but when I try to run it under
some complex query, the hive MR jobs failed with some strange errors. What I
Hi,
I played my query further, and found out it is very puzzle to explain the
following behaviors:
1) The following query works:
select c_poi.provider_str, c_poi.name from (select darray(search_results,
c.rank) as c_poi from nulf_search lateral view explode(search_clicks)
clickTable as c) a
I g
optimize.cp=false;
> set hive.optimize.ppd=false;
>
> 2012/12/13 java8964 java8964 :
> > Hi,
> >
> > I played my query further, and found out it is very puzzle to explain the
> > following behaviors:
> >
> > 1) The following query works:
> >
> > select
Hi, I have a question related to the XPATH UDF currently in HIVE.
>From the original Jira story about this UDF:
>https://issues.apache.org/jira/browse/HIVE-1027, It looks like the UDF won't
>support namespace in the XML, is that true?
Any later HIVE version does support namespace, if so, what i
ou are
> welcome to do so by creating a JIRA and posting a patch. UDFs are an
> easy and excellent way to contribute back to the Hive community.
>
> Thanks!
>
> Mark
>
> On Wed, Dec 19, 2012 at 8:52 AM, java8964 java8964
> wrote:
> > Hi, I have a question related to
Actually I am backing up this question. In additional for that, I wonder if it
is possible we can access the table properties from the UDF too.
I also have XML data, but with namespace into it. The XPATH UDF coming from
HIVE doesn't support namespace. To support the namespace in XML is simple, j
can access them just by their name in your code.
>
> About #2, doesn't sound normal to me. Did you figure that out or still
> running into it?
>
> Mark
>
> On Thu, Dec 20, 2012 at 5:01 PM, java8964 java8964
> wrote:
> > Hi, I have 2 questions related to the h
Hi, I am using Hive 0.9.0, and not sure why the from_utc_timestamp gave me
error to the following value, but works for others.
The following example shows 2 bigint as 2 epoch value of milliseconds level.
They are only 11 seconds difference. One works fine in hive 0.9.0 with
from_utc_timestamp UD
Best mailing list for this question is hive, but I will try to give my guess
here anyway.
If you only see 'default' database, most likely you are using hive
'LocalMetaStore'. For helping yourself to find out the problem, try to find out
following information:
1) What kind of Hive metastore you a
Hi,
I know this has been asked before. I did google around this topic and tried to
understand as much as possible, but I kind of got difference answers based on
different places. So I like to ask what I have faced and if someone can help me
again on this topic.
I created one table with one colu
What is your stracktrace? Can you paste here?
It is maybe a different bug.
If you put e.f3 <> null at an outsider query? Does that work?
Or maybe you have to enhance your UDTF to push that filter into your UDTF. It
is not perfect, but maybe a solution for you as now.
You can create a new Jira if i
ss(MapOperator.java:658)
... 9 more On Friday, February 21, 2014 11:18 AM, java8964
wrote:What is your stracktrace? Can you paste
here?It is maybe a different bug.If you put e.f3 <> null at an outsider query?
Does that work?Or maybe you have to enhance your UDTF to push tha
it works there. If you get a chance to reproduce this problem on hive 0.10,
please let me know.
Thanks. On Monday, February 24, 2014 10:59 PM, java8964
wrote:My guess is that your UDTF will return an
array of struct. I don't have Hive 0.10 in handy right now, but I write a
s
one query won't work, as totalcount is not in "group by".
You have 2 options:
1) use the sub query
select a.timestamp_dt, a.totalcount/b.total_sumfrom daily_count_per_kg_domain a
join(select timestamp_dt, sum(totalcount) as
total_sumfromdaily_count_per_kg_domaingroup by timestamp_dt) b on
(a.tim
Yes, it is good that the file sizes are evenly close, but not very important,
unless there are files very small (compared to the block size).
The reasons are:
Your files should be splitable to be used in Hadoop (Or in Hive, it is the same
thing). If they are splitable, then 1G file will use 10 bl
Hi,
I tried to run the all tests in my local Linux x64 of current Hive trunk code.
My "mvn clean package -DskipTests -Phadoop-2 -Pdist" will work fine if I skip
tests.
The following unit test failed, and then it stopped.
I traced the code down to a native method invoked
at"org.apache.hadoop.sec
OK. Now I understand that this error is due to missing the Hadoop native
library.
If I manually add "libhadoop.so" into java.library.path for this unit test, it
passed.
So either the hadoop 2.2.0 coming from Maven reponsitory includes 32bit of
hadoop native library, or totally missed it.
Now the
That is good to know.
We are using Hive 0.9. Right now the biggest table contains 2 years data, and
we partitioned by hour, as the data volume is big.
So right now, it has 2*365*24 around 17000+ partitions. So far we didn't see
too much problem yet, but I do have some concerns about it.
We are us
Can you reproduce with an empty table? I can't reproduce it.
Also, can you paste the stack trace?
Yong
From: krishnanj...@gmail.com
Date: Thu, 27 Feb 2014 12:44:28 +
Subject: Hive query parser bug resulting in "FAILED: NullPointerException null"
To: user@hive.apache.org
Hi all,
we've experien
Hi, Wolli:
Cross join doesn't mean Hive has to use one reduce.
>From query point of view, the following cases will use one reducer:
1) Order by in your query (Instead of using sort by)2) Only one reducer group,
which means all the data have to send to one reducer, as there is only one
reducer gro
t_keywords) prep_kw;
...
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
What could be setup wrong here? Or can it be avoided to use this ugly cross
join at all? I mean my original problem is actually something else ;-)
CheersWolli
2014-03-05 15:07 GMT+01:0
If you want to set some properties of hive, just run it as it is in your JDBC
connection.
Any command in the hive JDBC will send to the server as the same if you run
"set hive.server2.async.exec.threads=50;" in the hive session.
Run the command "set hive.server2.async.exec.threads=50;" as a SQL
I don't know from syntax point of view, if Hive will allow to do "columnA IN
UDF(columnB)".
What I do know that even let's say above work, it won't do the partition
pruning.
The partition pruning in Hive is strict static, any dynamic values provided to
partition column won't enable partition pru
but a static value.
Thanks,
Petter
2014-03-11 0:16 GMT+01:00 java8964 :
I don't know from syntax point of view, if Hive will allow to do "columnA IN
UDF(columnB)".
What I do know that even let's say above work, it won't do the partition
pruning.
The partition
I am not sure about your question.
Do you mean the query runs very fast if you run like 'select * from
hbase_table', but very slow for 'select * from hbase where row_key = ?'
I think it should be the other way round, right?
Yong
Date: Wed, 19 Mar 2014 11:42:39 -0700
From: sunil_ra...@yahoo.com
S
Your UDF object will only initialized once per map or reducer.
When you said your UDF object being initialized for each row, why do you think
so? Do you have log to make you think that way?
If OK, please provide more information, so we can help you, like your example
code, log etc
Yong
Date
It looks like his job failed in OOM in mapper tasks:
Job failed as tasks failed. failedMaps:1 failedReduces:0
So what he need is to increase the mapper heap size request.
Yong
Date: Mon, 24 Mar 2014 16:16:50 -0400
Subject: Re: Joins Failing
From: divakarredd...@gmail.com
To: user@hive.apache.org
C
The reason you saw that is because when you provide evaluate() method, you
didn't specified the type of column it can be used. So Hive will just create
test instance again and again for every new row, as it doesn't know how or
which column to apply your UDF.
I changed your code as below:
public
Hi, Narayanan:
The current problem is that for a generic solution, there is no way that we
know that element in the Json is an array. Keep in mind that in any element of
Json, it could be any valid structure. So it could be array, another structure,
or map etc.
You know your data, so you can sa
>From Hive manual, there is only "left semi join", no "semi join", nor "inner
>semi join".
>From the Database world, it is just a traditional name for this kind of join:
>"LEFT semi join", as a reminder to the reader that the resultset comes out
>from the LEFT table ONLY.
Yong
> From: lukas.e..
When you turn "vectorized" on, does the following query consistently return
1 in the output?
select ten_thousand() from testTabOrc
Yong
Date: Fri, 30 May 2014 08:24:43 -0400
Subject: Vectorization with UDFs returns incorrect results
From: bbowman...@gmail.com
To: user@hive.apache.org
Hive 0.
Your "alias_host" column is an array, from your Avro specification, right?
If so, just use [] to access the specified element in the array
select alias_host[0] from array_tests where aliat_host[0] like '%test%'
If you want to query all the elements in the array, google "explode lateral
view" of hi
(18.778 seconds)
On Fri, May 30, 2014 at 10:52 AM, java8964 wrote:
When you turn "vectorized" on, does the following query consistently return
1 in the output?
select ten_thousand() from testTabOrc
Yong
Date: Fri, 30 May 2014 08:24:43 -0400
Subject: Vectorizat
I agree that the originally request is not very clear.
>From my understanding, the reference_id is very unique in both Ad load and Ad
>click tables, but both tables could contain huge amount of data. (But in
>theory, click table should be much smaller than the load table, right? But
>let's just
Are you trying to read the Avro file directly in your UDF? If so, that is not
the correct way to do it in UDF.
Hive can support Avro file natively. Don't know your UDF requirement, but here
is normally what I will do:
Create the table in hive as using AvroContainerInputFormat
create external tabl
I don't think Hbase-Hive integration part is that smart, be able to utilize the
index existing in the HBase. But I think it depends on the version you are
using.
>From my experience, there are a lot of improvement space in the Hbase-hive
>integration, especially "push down" logic into HBase engi
Hi,
Currently our production is using Hive 0.9.0. There is already a complex Hive
query running on hadoop daily to generate millions records output. What I want
to do is to transfer this result to Cassandra.
I tried to do it in UDF, as then I can send the data at reducer level, to
maximum the t
Based on this wiki page:
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-TypeSystem
The string will do a implicit conversion to double, as "Double" is the only
common ancestor between bigint and string.
So the result is unpredictable if you are talking about double.
Yong
Date:
51 matches
Mail list logo