Re: Hive query on ORC table is really slow compared to Presto

2017-06-12 Thread Michael Segel
Silly question… What about using COUNT() and a GROUP BY() instead? I’m going from memory…. this may or may not work. Since you want the row_id only in order to de-dupe, right? On Jun 12, 2017, at 3:59 PM, Premal Shah mailto:premal.j.s...@gmail.com>> wrote: Thanx Gopal. Sorry, took me a few d

Fwd: Pro and Cons of using HBase table as an external table in HIVE

2017-06-09 Thread Michael Segel
Sorry. Need to send via right email address. Begin forwarded message: From: Michael Segel mailto:mse...@segel.com>> Subject: Re: Pro and Cons of using HBase table as an external table in HIVE Date: June 9, 2017 at 7:37:22 AM CDT To: user@hive.apache.org<mailto:user@hive.apache.org>

Re: Pro and Cons of using HBase table as an external table in HIVE

2017-06-09 Thread Michael Segel
than plain hive querying over ORC / Text file formats In other words Is querying over plain hive (ORC or Text) always faster than through HiveStorageHandler? Regards, Amey On 9 June 2017 at 15:08, Michael Segel mailto:msegel_had...@hotmail.com>> wrote: The pro’s is that you have the ab

Re: Pro and Cons of using HBase table as an external table in HIVE

2017-06-09 Thread Michael Segel
The pro’s is that you have the ability to update a table without having to worry about duplication of the row. Tez is doing some form of compaction for you that already exists in HBase. The cons: 1) Its slower. Reads from HBase have more overhead with them than just reading a file. Read Lar

Re: Bug in ORC file code? (OrcSerde)?

2016-10-19 Thread Michael Segel
On Oct 19, 2016, at 11:00 AM, Michael Segel wrote: > > Hi, > Since I am not on the ORC mailing list… and since the ORC java code is in the > hive APIs… this seems like a good place to start. ;-) > > > So… > > Ran in to a little problem… > > One of my develo

Bug in ORC file code? (OrcSerde)?

2016-10-19 Thread Michael Segel
Hi, Since I am not on the ORC mailing list… and since the ORC java code is in the hive APIs… this seems like a good place to start. ;-) So… Ran in to a little problem… One of my developers was writing a map/reduce job to read records from a source and after some filter, write the result se

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
Just a clarification. Tez is ‘vendor’ independent. ;-) Yeah… I know… Anyone can support it. Only Hortonworks has stacked the deck in their favor. Drill could be in the same boat, although there now more committers who are not working for MapR. I’m not sure who outside of HW is supporting

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Michael Segel
> On Jun 8, 2016, at 3:35 PM, Eugene Koifman wrote: > > if you split “create table test.dummy as select * from oraclehadoop.dummy;” > into create table statement, followed by insert into test.dummy as select… > you should see the behavior you expect with Hive. > Drop statement will block while

Re: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread Michael Segel
you try a select * from foo; and in another shell try dropping foo? and if you want to simulate a m/r job add something like an order by 1 clause. HTH -Mike > On Jun 8, 2016, at 2:36 PM, Michael Segel wrote: > > Hi, > > Lets take a step back… > > Which version of Hive

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
ile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 30 May 2016 at 20:19, Michael Segel <mailto:msegel_had...@hotmail.com>> wrote: > Mich, > > Most people use vendor releases because they

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-30 Thread Michael Segel
Mich, Most people use vendor releases because they need to have the support. Hortonworks is the vendor who has the most skin in the game when it comes to Tez. If memory serves, Tez isn’t going to be M/R but a local execution engine? Then LLAP is the in-memory piece to speed up Tez? HTH -M

Simple way to export data from a Hive table in to Avro?

2015-02-02 Thread Michael Segel
Currently using Hive 13.x Would like to select from a table that exists and output to an external file(s) in avro via hive. Is there a simple way to do this? From what I’ve seen online, the docs tend to imply you need to know the avro schema when you specify the table. Could you copy from a

Hive 14 performance and scalability?

2014-12-11 Thread Michael Segel
Hi, While I haven’t upgraded to HDP 2.2, I have to ask if the transaction processing introduced in 14 has been tested at scale in terms of both users, and data size? I am curious as to what happens if you have a long transaction how well it copes. Thx -Mike