Silly question…
What about using COUNT() and a GROUP BY() instead?
I’m going from memory…. this may or may not work. Since you want the row_id
only in order to de-dupe, right?
On Jun 12, 2017, at 3:59 PM, Premal Shah
mailto:premal.j.s...@gmail.com>> wrote:
Thanx Gopal.
Sorry, took me a few d
Sorry. Need to send via right email address.
Begin forwarded message:
From: Michael Segel mailto:mse...@segel.com>>
Subject: Re: Pro and Cons of using HBase table as an external table in HIVE
Date: June 9, 2017 at 7:37:22 AM CDT
To: user@hive.apache.org<mailto:user@hive.apache.org>
than plain hive querying over ORC / Text
file formats
In other words Is querying over plain hive (ORC or Text) always faster than
through HiveStorageHandler?
Regards,
Amey
On 9 June 2017 at 15:08, Michael Segel
mailto:msegel_had...@hotmail.com>> wrote:
The pro’s is that you have the ab
The pro’s is that you have the ability to update a table without having to
worry about duplication of the row. Tez is doing some form of compaction for
you that already exists in HBase.
The cons:
1) Its slower. Reads from HBase have more overhead with them than just reading
a file. Read Lar
On Oct 19, 2016, at 11:00 AM, Michael Segel wrote:
>
> Hi,
> Since I am not on the ORC mailing list… and since the ORC java code is in the
> hive APIs… this seems like a good place to start. ;-)
>
>
> So…
>
> Ran in to a little problem…
>
> One of my develo
Hi,
Since I am not on the ORC mailing list… and since the ORC java code is in the
hive APIs… this seems like a good place to start. ;-)
So…
Ran in to a little problem…
One of my developers was writing a map/reduce job to read records from a source
and after some filter, write the result se
Just a clarification.
Tez is ‘vendor’ independent. ;-)
Yeah… I know… Anyone can support it. Only Hortonworks has stacked the deck in
their favor.
Drill could be in the same boat, although there now more committers who are not
working for MapR. I’m not sure who outside of HW is supporting
> On Jun 8, 2016, at 3:35 PM, Eugene Koifman wrote:
>
> if you split “create table test.dummy as select * from oraclehadoop.dummy;”
> into create table statement, followed by insert into test.dummy as select…
> you should see the behavior you expect with Hive.
> Drop statement will block while
you try a select * from foo; and in another shell try
dropping foo? and if you want to simulate a m/r job add something like an
order by 1 clause.
HTH
-Mike
> On Jun 8, 2016, at 2:36 PM, Michael Segel wrote:
>
> Hi,
>
> Lets take a step back…
>
> Which version of Hive
ile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 30 May 2016 at 20:19, Michael Segel <mailto:msegel_had...@hotmail.com>> wrote:
> Mich,
>
> Most people use vendor releases because they
Mich,
Most people use vendor releases because they need to have the support.
Hortonworks is the vendor who has the most skin in the game when it comes to
Tez.
If memory serves, Tez isn’t going to be M/R but a local execution engine? Then
LLAP is the in-memory piece to speed up Tez?
HTH
-M
Currently using Hive 13.x
Would like to select from a table that exists and output to an external file(s)
in avro via hive.
Is there a simple way to do this?
From what I’ve seen online, the docs tend to imply you need to know the avro
schema when you specify the table.
Could you copy from a
Hi,
While I haven’t upgraded to HDP 2.2, I have to ask if the transaction
processing introduced in 14 has been tested at scale in terms of both
users, and data size?
I am curious as to what happens if you have a long transaction how well
it copes.
Thx
-Mike
13 matches
Mail list logo