You can partition it and only compute statistics for new partitions...
> On 26. Aug 2018, at 12:43, Prabhakar Reddy <prabha.cl...@gmail.com> wrote: > > Hello, > > Are there any properties that I can set to improve the performance of Analyze > table compute statistics statement.My data sits in s3 and I see it's taking > one second per file to read the schema of each file from s3. > > 2018-08-24T03:25:57,525 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC > rows from s3://file_1 > 2018-08-24T03:25:57,526 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader > schema not provided -- using file schema > > 2018-08-24T03:25:58,395 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC > rows from s3://file_2 > 2018-08-24T03:25:58,395 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader > schema not provided -- using file schema > > It takes around 80 seconds for 76 files with total size of 23 GB. > > > 2018-08-24T03:27:07,673 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: exec.Task (SessionState.java:printInfo(1111)) - Table > dept_data_services.lc_credit_2018_08_20_temp stats: [numFiles=76, > numRows=101341845, totalSize=26500166568, rawDataSize=294491898741] > 2018-08-24T03:27:07,673 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing > command(queryId=lcapp_20180824032545_aba21e71-ea4b-4214-8793-705a5e0367f0); > Time taken: 81.169 seconds > 2018-08-24T03:27:07,674 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: ql.Driver (SessionState.java:printInfo(1111)) - OK > 2018-08-24T03:27:07,681 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 > main([])]: CliDriver (SessionState.java:printInfo(1111)) - Time taken: 81.992 > seconds > > If I run the same command with few columns then the query runs 60% faster.Is > there any property that I can modify to reduce the time taken for this read? > > Regards > Prabhakar Reddy > > >