That particular OutOfMemoryError is happening on one of your hadoop nodes.
It's the heap within the process forked by the hadoop tasktracker, I think.
Phil.
On 30 January 2013 14:28, John Omernik wrote:
> So just a follow-up. I am less looking for specific troubleshooting on how
> to fix my pr
This is a known (recently fixed) bug:
https://issues.apache.org/jira/browse/HIVE-3699
Phil.
On 26 January 2013 15:17, John Omernik wrote:
> I ran into an interesting bug. Basically, if your FROM() source is
> a partitioned table and you use a where clause that prunes, all of the
> INSERT HERE
Hive doesn't support theta joins. Your best bet is to do a full cross join
between the tables, and put your range conditions into the WHERE clause.
This may or may not work, depending on the respective sizes of your tables.
The fundamental problem is that parallelising a theta (or range) join via
You could use collect_set() and GROUP BY. That wouldn't preserve order
though.
Phil.
On Oct 31, 2012 9:18 PM, "qiaoresearcher" wrote:
> Hi all,
>
> here is the question. Assume we have a table like:
>
> -
I'm really not convinced that there's no skew in your data. Look at
the counters from the Hadoop TaskTracker pages, and thoroughly check
that the numbers of reducer input records / groups and output records
are all similar.
Phil.
On 18 October 2012 09:56, Saurabh Mishra wrote:
> any views on the
Is your data heavily skewed towards certain values of a.x etc?
On 15 October 2012 15:23, Saurabh Mishra wrote:
> The queries are simple joins, something on the lines of
> select a, b, c, count(D) from tableA join tableB on a.x=b.y join group
> by a, b,c;
>
>
>> From: liy...@gmail.com
>> Date:
How about:
select name from ABC order by grp desc limit 1?
Phil.
On Sep 27, 2012 9:02 PM, "yogesh dhari" wrote:
> Hi Bejoy,
>
> I tried this one also but here it throws horrible error:
>
> i.e:
>
> hive: select name from ABD where grp=MAX(grp);
>
> FAILED: Hive Internal Error: java.lang.NullPoi
t;
> select value,COALESCE(value,3) from testtest;
> 1 1
> 1 1
> 2 2
> NULL3
> NULL3
>
> On Wed, Sep 5, 2012 at 7:52 PM, Philip Tromans
> wrote:
> > You could do something with the coalesce UDF?
> >
> > Phil.
> >
> >
You could do something with the coalesce UDF?
Phil.
On Sep 5, 2012 12:24 AM, "MiaoMiao" wrote:
> I have a file whose content is:
> 1,1
> 2,1
> 3,2
> 4,
> 5,
> Then I import in into a hive table.
> create external table testtest (id int,value int) row format delimited
> fields terminated by ',' s
insert into originalTable
select uniqueId, collect_set(whatever) from explodedTable group by uniqueId
will probably do the trick.
Phil.
On 23 August 2012 17:45, Mike Fleming wrote:
> I see that hive has away to take a table and produce multiple rows.
>
> Is there a built in way to do the revers
There's a case bug in hive. Put all the names into lower case. I've got a
JIRA open about it somewhere.
Phil.
On Aug 22, 2012 4:39 AM, "Lin" wrote:
> Hi,
>
> I build a compact index IX for table A as follows,
>
> create index IX on table A(a, b) as 'COMPACT'
> with deferred rebuild
> in table A_
What you're trying to do can be achieved with:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
with a "D" in a format string. See:
http://docs.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html
Phil.
On 14 August 2012 07:30, Deep
https://cwiki.apache.org/Hive/languagemanual-joins.html
On 14 August 2012 10:29, Prakrati Agrawal wrote:
> Dear Phil,
>
> Can you be a liitle more specific about using the left outer join?
>
> Thanks and Regards,
> Prakrati
>
> -Original Message
Hive doesn't support IN. You'll need to rewrite your query as a left
outer join, and check whether the RHS is null.
Phil.
On 14 August 2012 10:20, Bertrand Dechoux wrote:
> According to the error message, you are not using the correct synthax :
> https://cwiki.apache.org/confluence/display/Hive/
I think you're ordering by a constant. Give your concat column an
alias, and then order by that.
Phil.
On 10 August 2012 12:26, Joshi, Rekha wrote:
> Manisha, when you say concat issue, did you verify the stmt without concat
> (just any few fields to test) and that gives ordered data correctly?
Your rank() is being evaluated map side. Put your distribute by and sort by
in an inner query, and then evaluate your rank() in an outer query.
Phil.
On Jul 19, 2012 9:00 PM, "comptech geeky" wrote:
> This is the below data in my Table1
>
>
> BID PID TIME
> --
A really quick (but by no means as good) solution is to use screen.
http://www.gnu.org/software/screen/
Phil.
On 14 June 2012 13:38, dong.yajun wrote:
> Hi Praveenesh
>
> have a look at
> http://blog.milford.io/2010/06/daemonizing-the-apache-hive-thrift-server-on-centos/
> :)
>
> Thanks .
>
>
>
Hi all,
I'm interested in knowing how everyone is importing their data into
their production Hive clusters.
Let me explain a little more. At the moment, I have log files (which
are divided into 5 minute chunks, per event type (of which there are
around 10), per server (a few 10s) arriving on one
Is there anything interesting in the datanode logs?
Phil.
On 29 May 2012 10:37, Nitin Pawar wrote:
> can you check atleast one datanode is running and is not part of blacklisted
> nodes
>
>
> On Tue, May 29, 2012 at 3:01 PM, Nimra Choudhary
> wrote:
>>
>>
>>
>> We are using Dynamic partitioning
Hi Ranjith,
I haven't checked the code (so this might not be true), but I think that
the map side aggregation stuff uses it's own hash map within the map phase
to do the aggregation, instead of using a combiner, so you wouldn't expect
to see any combine input records. Have a look for parameters
li
I knocked up the following when we were experimenting with Hive. I've been
meaning to go and tidy it up for a while, but using it with a separator of
"" (empty string) should have the desired effect. (Obviously the UDF throws
an exception if the array is empty, been meaning to fix that for a while.
Have a read of the thread "Lag function in Hive", linked from:
http://mail-archives.apache.org/mod_mbox/hive-user/201204.mbox/thread
There's an example of how to force a function to run reduce-side. I've
written a UDF which replicates RANK () OVER (...), but it requires the
syntactic sugar given
that left hand side should be evaluated at
compile time, which means you have two different values of
unix_timestamp() floating around, which can only end badly.
Cheers,
Phil.
On 19 April 2012 16:35, Philip Tromans wrote:
> I don't know what the state of Hive's partition pruning is
I don't know what the state of Hive's partition pruning is, but I
would imagine that the problem is that the two example you're giving
are fundamentally different.
1) WHERE local_date = =date_add('2011-12-07',3) ,
the udf is a function of some constants, so the constant gets
evaluated at compile
Hi,
Hive supports EXISTS via SEMI JOIN. Have a look at:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Cheers,
Phil.
On 11 April 2012 13:59, Bhavesh Shah wrote:
> Hello all,
> I want to query like below in Hive:
> Select a.* FROM tblA a JOIN tblB b ON a.field1 = b.field
fine. While the stand alone script works fine, when
>>> the record is created in hive using std output from perl - I see 2 records
>>> for some of the unique identifiers. I explored the possibility of default
>>> data type changes but that does not solve the problem.
>>&
Hi Karan,
To the best of my knowledge, there isn't one. It's also unlikely to
happen because it's hard to parallelise in a map-reduce way (it
requires knowing where you are in a result set, and who your
neighbours are and they in turn need to be present on the same node as
you which is difficult t
You are running into: https://issues.apache.org/jira/browse/HIVE-1579
I've been meaning to submit a patch for this. I emailed the dev list
concerning a patch for it but got no reply...
Hive is crashing because it can't pull the debug logs for the failed
task, because it's trying to pull them from
I've used Hive in a multiple connections per server instance setup. It
works ok, but it is a little flakey. I have some snapshot of trunk >
0.8.0 deployed. When I have some time, I'd like to help increase the
test coverage for multithreaded clients.
Phil.
On 28 March 2012 19:19, Abhishek Pratap S
Is that not just a COUNT(1) and a GROUP BY?
Phil.
2012/3/12 Richard :
> I have noticed histogram_numeric(col, n), but it seems to require numeric
> column.
> I have a string column, they are numeric like string but are category label,
> e.g,
>
> 11, 200034
>
> two different strings are two di
I guess that split(...)[1] is giving you what's inbetween the 1st and
2nd '/' character, which is nothing. Try split(...)[2].
Phil.
On 1 March 2012 21:19, Saurabh S wrote:
> Hello,
>
> I have a set of URLs which I need to parse. For example, if the url is,
> http://www.google.com/anything/goes/h
Hi all,
I'm having a problem, where I'm trying to insert into a table which
has ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe', and is
STORED AS RCFILE. The exception:
java.lang.UnsupportedOperationException: Currently the writer can only
accept BytesRefArrayWritable
32 matches
Mail list logo