RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-09-10 Thread Rao, Abhishek (Nokia - IN/Bangalore)
were seeing discrepancy in query execution time on S3 with Spark 3.0.0. Thanks and Regards, Abhishek From: Gourav Sengupta Sent: Wednesday, August 26, 2020 5:49 PM To: Rao, Abhishek (Nokia - IN/Bangalore) Cc: user Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
gt; > > Thanks and Regards, > > Abhishek > > > > *From:* Gourav Sengupta > *Sent:* Wednesday, August 26, 2020 2:35 PM > *To:* Rao, Abhishek (Nokia - IN/Bangalore) > *Cc:* user@spark.apache.org > *Subject:* Re: Spark 3.0 using S3 taking long time for some set

RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
ngupta mailto:gourav.sengu...@gmail.com>> Sent: Wednesday, August 26, 2020 1:18 PM To: Rao, Abhishek (Nokia - IN/Bangalore) mailto:abhishek@nokia.com>> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC

Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
gt; > > > *From:* Gourav Sengupta > *Sent:* Wednesday, August 26, 2020 1:18 PM > *To:* Rao, Abhishek (Nokia - IN/Bangalore) > *Cc:* user@spark.apache.org > *Subject:* Re: Spark 3.0 using S3 taking long time for some set of TPC DS > Queries > > > > Hi, > > > >

RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi Gourav, Yes. We’re using s3a. Thanks and Regards, Abhishek From: Gourav Sengupta Sent: Wednesday, August 26, 2020 1:18 PM To: Rao, Abhishek (Nokia - IN/Bangalore) Cc: user@spark.apache.org Subject: Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries Hi, are you using

Re: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-26 Thread Gourav Sengupta
Hi, are you using s3a, which is not using EMRFS? In that case, these results does not make sense to me. Regards, Gourav Sengupta On Mon, Aug 24, 2020 at 12:52 PM Rao, Abhishek (Nokia - IN/Bangalore) < abhishek@nokia.com> wrote: > Hi All, > > > > We’re doing some performance comparisons betw

RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-25 Thread Rao, Abhishek (Nokia - IN/Bangalore)
GB of data whereas in case of HDFS, it is only 4.5 GB. Any idea why this difference is there? Thanks and Regards, Abhishek From: Luca Canali Sent: Monday, August 24, 2020 7:18 PM To: Rao, Abhishek (Nokia - IN/Bangalore) Cc: user@spark.apache.org Subject: RE: Spark 3.0 using S3 taking long time fo

RE: Spark 3.0 using S3 taking long time for some set of TPC DS Queries

2020-08-24 Thread Luca Canali
Hi Abhishek, Just a few ideas/comments on the topic: When benchmarking/testing I find it useful to collect a more complete view of resources usage and Spark metrics, beyond just measuring query elapsed time. Something like this: https://github.com/cerndb/spark-dashboard I'd rather not use dyn