Re: Comparing Google Cloud Platform BiqQuery with Hive

Mich Talebzadeh Tue, 29 Jan 2019 11:44:30 -0800

Hi Furcy,

Thanks.

Apologies for being late on this. You are absolutely correct. I tried and
BQ can read compressed ORC files.

Still referring to my original thread, BQ handling of Double and Dates are
problematic. I tend to create these type of fields as String and do the ETL
in BQ by converting these fields into the desired type.

I am not much concerned about what Hive itself does. I run Hive on Spark
Execution engine on prem and use Spark for anything on prem interacting
with Hive. On BQ one can achieve the same although my Spark codes (written
in Scala) have to be modified. In general I have founds out that using
Spark in both prem and GCP on Hive and BQ respectively makes things easier.
Also so far as my tests go Spark has analytical functions identical both on
prem and in Dataproc.

HTH,

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Mon, 14 Jan 2019 at 09:18, Furcy Pin <pin.fu...@gmail.com> wrote:

> Hi Mich,
>
> Contrary to what you said, I can confirm you that BQ is able to read ORC
> files compressed with Snappy.
> However, BQ requires to perform a loading operation from the ORC file on
> Google Storage and convert it into a BQ table.
>
> The main advantages I see with BQ is the guaranteed very high scalability
> and low query latency without having to manage a Hadoop cluster yourself.
>
> I would not say however, that you can simply plug your existing HQL
> queries into BQ. All useful analytics functions are indeed there, but in
> many cases they have a different name.
> For instance, the equivalent of Hive's UDF *trunc* in BQ is *date_trunc.*
>
> In my use case I use pyspark for complex transformations and use BQ as a
> Warehouse to plug Power BI on it.
> So for a fair comparison, I think you should compare BQ with Vertica,
> Presto, Impala or Hive LLAP rather than just Hive.
>
> Regards,
>
> Furcy
>
>
>
>
> On Fri, 11 Jan 2019 at 11:18, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Has anyone got some benchmarks on comparing Hive with Google Cloud
>> Platform (GCP) BiqQuery (BQ)?
>>
>> From my experience  experience BQ supports both Avro and ORC file types.
>> There is no support for compressed ORC or AVRO. So if you want to load a
>> Hive table into BQ, you will need to create a table with no compression. In
>> short you need to perform ETL to move a Hive table to BQ.
>>
>> On the other hand BQ seems to support all analytical functions available
>> in Hive so your queries should run without any modification in BQ.
>>
>> On the other hand Dataproc tool in GCP also supports Hive (though I have
>> not tried it myself). So the question is are there any advantages taking a
>> Hive table into BQ itself?
>>
>> Thanks,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>

Re: Comparing Google Cloud Platform BiqQuery with Hive

Reply via email to