Re: How useful are tools for Hive data modeling

2020-11-11 Thread Panos Garefalakis
Hey Mich, I agree with Austin's reply, a fundamental way of skipping data reading that is not necessary for the query is table partitioning so that would be the first thing to check (along with skewness). Columnar formats such as Parquet, and ORC come with row group statistics (such as min/max

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Austin Hackett
Hi Mich Understood, I was thinking along the lines of the tool being able to auto-generate SQL join syntax etc, rather than in terms of scan performance. I’m not so familiar with Parquet with Hive. I know that Parquet also has min and max indexes, and more recently bloom filters. However, I rec

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Mich Talebzadeh
Many thanks Austin. The challenge I have been told is how to effectively query a subset of data avoiding full table scan. The tables I believe are parquet. I know performance in Hive is not that great, so anything that could help would be great. Cheers, LinkedIn * https://www.linkedin.com/pr

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Austin Hackett
Hi Mich Hive also has non-validated primary key, foreign key etc constraints. Whilst I’m not too familiar with the modelling tools you mention, perhaps they’re able to use these for generating SQL etc? ORC files have indexes (min, max, bloom filters) - not particularly relevant to the data mod

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Mich Talebzadeh
Many thanks Peter. LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or dest

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Peter Vary
Hi Mich, Index support was removed from hive: https://issues.apache.org/jira/browse/HIVE-21968 https://issues.apache.org/jira/browse/HIVE-18715 Thanks, Peter > On Nov 11, 2020, at 17:25, Mich

Fwd: How useful are tools for Hive data modeling

2020-11-11 Thread Mich Talebzadeh
Hi all, I wrote these notes earlier this year. I heard today that someone mentioned Hive 1 does not support indexes but hive 2 does. I still believe that Hive does not support indexing as per below. Has this been changed? Regards, Mich -- Forwarded message - From: Mich Ta