Re: issue with hive wide tables/views

Viral Bajaria Sun, 30 Nov 2014 22:45:12 -0800

Any help will be appreciate here.

This issue becomes a bigger pain when you have a VIEW referencing another
VIEW(s) which have 1000s of columns.


It seems the generation of the query plan has some un-optimized code path
when there are 1000s of columns.

A jstack of a running process ( > 30 minutes ) shows this:
https://gist.github.com/vbajaria/2b46eb015eb5f97954fc

I ran jstack multiple times on the running process and everytime the stack
trace of the SemanticAnalyzer propped up with the same results, hence I am
guessing that the underlying issue could be in there.

Let me know if any more details are needed to get any help on this. Will it
benefit if I reached out to the dev list for this ?

Thanks,
Viral



On Wed, Nov 26, 2014 at 11:21 AM, Viral Bajaria <viral.baja...@gmail.com>
wrote:

> Hi,
>
> I have a table which ended up having 3K+ columns. The building of the
> table wasn't that painful, but the part where things suck is when creating
> VIEWs on top of that table.
>
> 1 of the views that I want to create needs complex operation and
> references a ton of columns or almost all of the columns.
>
> When applying this view to hive, it takes over 25 minutes for the view
> definition to get applied. Acceptable if the view didn't need frequent
> updates, but not acceptable if we plan to change the view often or have
> multiple such views.
>
> So the questions:
> 1) Should it take so long for hive to create a view that has so many
> columns ? If not, should we open a JIRA and investigate this issue ?
> 2) The underlying tables are CSV (raw data) or ORC (after some
> processing)... would we benefit if we change it from 3K+ columns to a
> single column containing List<Object> column or Map<String, Object> for all
> the values and then use the required columns
>
> We are on Hive 0.13.0 and our metastore is backed by MariaDB 10
>
> Thanks,
> Viral
>
>

Re: issue with hive wide tables/views

Reply via email to