Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-31 Thread Tin Vu
different used > cases. Have you tried using JDBC connector to Drill from within SPARKSQL? > > Regards, > Gourav Sengupta > > > On Thu, Mar 29, 2018 at 1:03 AM, Tin Vu wrote: > >> Hi, >> >> I am executing a benchmark to compare performance of SparkSQL, Apac

Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-29 Thread Tin Vu
ecords. Creation and > distribution of tasks has a noticeable overhead on smaller datasets. > > > > You might want to look at the driver logs, or the Spark Application Detail > UI. > > > > *From: *Tin Vu > *Date: *Wednesday, March 28, 2018 at 8:04 PM > *To: *"

Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-28 Thread Tin Vu
the > query and immediately return (similarly count might immediately return by > using some statistics). > > On 29. Mar 2018, at 02:03, Tin Vu wrote: > > Hi, > > I am executing a benchmark to compare performance of SparkSQL, Apache > Drill and Presto. My experimental setu

[SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto

2018-03-28 Thread Tin Vu
Hi, I am executing a benchmark to compare performance of SparkSQL, Apache Drill and Presto. My experimental setup: - TPCDS dataset with scale factor 100 (size 100GB). - Spark, Drill, Presto have a same number of workers: 12. - Each worked has same allocated amount of memory: 4GB. - Da