Re: How to tune the performance of Tpch query5 within Spark

2017-07-17 Thread Pralabh Kumar
Hi To read file parallely , you can follow the below code. case class readData (fileName : String , spark : SparkSession) extends Callable[Dataset[Row]]{ override def call(): Dataset[Row] = { spark.read.parquet(fileName) // spark.read.csv(fileName) } } val spark = SparkSession.buil

Re: How to tune the performance of Tpch query5 within Spark

2017-07-17 Thread vaquar khan
Verify your configuration, following link covered all Spark tuning points. https://spark.apache.org/docs/latest/tuning.html Regards, Vaquar khan On Jul 17, 2017 6:56 AM, "何文婷" wrote: 2.1.1 发自网易邮箱大师 On 07/17/2017 20:55, vaquar khan wrote: Could you please let us know your Spark version? Re

Re: How to tune the performance of Tpch query5 within Spark

2017-07-17 Thread 何文婷
2.1.1 发自网易邮箱大师 On 07/17/2017 20:55, vaquar khan wrote: Could you please let us know your Spark version? Regards, vaquar khan On Jul 17, 2017 12:18 AM, "163" wrote: I change the UDF but the performance seems still slow. What can I do else? 在 2017年7月14日,下午8:34,Wenchen Fan 写道: Try

Re: How to tune the performance of Tpch query5 within Spark

2017-07-17 Thread vaquar khan
Could you please let us know your Spark version? Regards, vaquar khan On Jul 17, 2017 12:18 AM, "163" wrote: > I change the UDF but the performance seems still slow. What can I do else? > > > 在 2017年7月14日,下午8:34,Wenchen Fan 写道: > > Try to replace your UDF with Spark built-in expressions, it s

Re: How to tune the performance of Tpch query5 within Spark

2017-07-16 Thread 163
I change the UDF but the performance seems still slow. What can I do else? > 在 2017年7月14日,下午8:34,Wenchen Fan 写道: > > Try to replace your UDF with Spark built-in expressions, it should be as > simple as `$”x” * (lit(1) - $”y”)`. > >> On 14 Jul 2017, at 5:46 PM, 163 >

Re: How to tune the performance of Tpch query5 within Spark

2017-07-14 Thread Wenchen Fan
Try to replace your UDF with Spark built-in expressions, it should be as simple as `$”x” * (lit(1) - $”y”)`. > On 14 Jul 2017, at 5:46 PM, 163 wrote: > > I modify the tech query5 to DataFrame: > val forders = > spark.read.parquet("hdfs://dell127:20500/SparkParquetDoubleTimestamp100G/orders >

How to tune the performance of Tpch query5 within Spark

2017-07-14 Thread 163
I modify the tech query5 to DataFrame: val forders = spark.read.parquet("hdfs://dell127:20500/SparkParquetDoubleTimestamp100G/orders ”).filter("o_orderdate < 1995-01-01 and o_orderdate >= 1994-01-01").select("o_custkey", "o_orderkey") val flineitem = spark.read.parquet("hdfs://dell127:20500/Spa

How to tune the performance of Tpch query5 within Spark

2017-07-14 Thread 163
> > I modify the tech query5 to DataFrame: > val forders = > spark.read.parquet("hdfs://dell127:20500/SparkParquetDoubleTimestamp100G/orders > > ”).filter("o_orderdate > < 1995-01-01 and o_orderdate >= 1994-01-01").select("o_custkey", > "o_orderkey") > val flineitem = > spark.read.parquet("