Any help will be appreciate here. This issue becomes a bigger pain when you have a VIEW referencing another VIEW(s) which have 1000s of columns.
It seems the generation of the query plan has some un-optimized code path when there are 1000s of columns. A jstack of a running process ( > 30 minutes ) shows this: https://gist.github.com/vbajaria/2b46eb015eb5f97954fc I ran jstack multiple times on the running process and everytime the stack trace of the SemanticAnalyzer propped up with the same results, hence I am guessing that the underlying issue could be in there. Let me know if any more details are needed to get any help on this. Will it benefit if I reached out to the dev list for this ? Thanks, Viral On Wed, Nov 26, 2014 at 11:21 AM, Viral Bajaria <viral.baja...@gmail.com> wrote: > Hi, > > I have a table which ended up having 3K+ columns. The building of the > table wasn't that painful, but the part where things suck is when creating > VIEWs on top of that table. > > 1 of the views that I want to create needs complex operation and > references a ton of columns or almost all of the columns. > > When applying this view to hive, it takes over 25 minutes for the view > definition to get applied. Acceptable if the view didn't need frequent > updates, but not acceptable if we plan to change the view often or have > multiple such views. > > So the questions: > 1) Should it take so long for hive to create a view that has so many > columns ? If not, should we open a JIRA and investigate this issue ? > 2) The underlying tables are CSV (raw data) or ORC (after some > processing)... would we benefit if we change it from 3K+ columns to a > single column containing List<Object> column or Map<String, Object> for all > the values and then use the required columns > > We are on Hive 0.13.0 and our metastore is backed by MariaDB 10 > > Thanks, > Viral > >