Re: Improve performance of Analyze table compute statistics

2018-09-13 Thread Prabhakar Reddy
Thank you Gopal for this Information.Currently I am using EMR to run this query.As this operation is CPU intensive could you please let me know if increasing the RAM/cores can speed up this process? On Tue, Aug 28, 2018 at 8:56 PM Gopal Vijayaraghavan wrote: > > > Will it be referring to orc met

Re: Improve performance of Analyze table compute statistics

2018-08-28 Thread Gopal Vijayaraghavan
> Will it be referring to orc metadata or it will be loading the whole file and > then counting the rows. Depends on the partial-scan setting or if it is computing full column stats (the full column stats does an nDV, which reads all rows). hive> analyze table compute statistics ... partialsc

Re: Improve performance of Analyze table compute statistics

2018-08-28 Thread Prabhakar Reddy
Yeah partition level statistics are good.I see hive orc reader is reading rows from s3 for each file in the hive server log.Will it be referring to orc metadata or it will be loading the whole file and then counting the rows.Is there any place to cache this information so that I don't need to scan

Re: Improve performance of Analyze table compute statistics

2018-08-26 Thread Jörn Franke
You can partition it and only compute statistics for new partitions... > On 26. Aug 2018, at 12:43, Prabhakar Reddy wrote: > > Hello, > > Are there any properties that I can set to improve the performance of Analyze > table compute statistics statement.My data sits in s3 and I see it's taking