You can partition it and only compute statistics for new partitions...

> On 26. Aug 2018, at 12:43, Prabhakar Reddy <prabha.cl...@gmail.com> wrote:
> 
> Hello,
> 
> Are there any properties that I can set to improve the performance of Analyze 
> table compute statistics statement.My data sits in s3 and I see it's taking 
> one second per file to read the schema of each file from s3.
> 
> 2018-08-24T03:25:57,525 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC 
> rows from s3://file_1
> 2018-08-24T03:25:57,526 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader 
> schema not provided -- using file schema 
> 
> 2018-08-24T03:25:58,395 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC 
> rows from s3://file_2
> 2018-08-24T03:25:58,395 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader 
> schema not provided -- using file schema
> 
> It takes around 80 seconds for 76 files with total size of 23 GB.
> 
> 
> 2018-08-24T03:27:07,673 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: exec.Task (SessionState.java:printInfo(1111)) - Table 
> dept_data_services.lc_credit_2018_08_20_temp stats: [numFiles=76, 
> numRows=101341845, totalSize=26500166568, rawDataSize=294491898741]
> 2018-08-24T03:27:07,673 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing 
> command(queryId=lcapp_20180824032545_aba21e71-ea4b-4214-8793-705a5e0367f0); 
> Time taken: 81.169 seconds
> 2018-08-24T03:27:07,674 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: ql.Driver (SessionState.java:printInfo(1111)) - OK
> 2018-08-24T03:27:07,681 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 
> main([])]: CliDriver (SessionState.java:printInfo(1111)) - Time taken: 81.992 
> seconds
> 
> If I run the same command with few columns then the query runs 60% faster.Is 
> there any property that I can modify to reduce the time taken for this read?
> 
> Regards
> Prabhakar Reddy
> 
> 
>  

Reply via email to