date:20201108

Reading data slows down when Spark3.0 uses multiple cpu cores

2020-11-08 Thread 1650996069

Hello, I recently encountered a problem that confuses me when using spark3.0. I used the tpcx-bb dataset (200GB) and executed Query#5 in it. The SQL will read about 65.7GB of table data. Query#5 is as follows(https://github.com/NVIDIA/spark-rapids/blob/branch-0.3/integration_tests/src/main/s

Reading data slows down when Spark3.0 uses multiple cpu cores

2020-11-08 Thread 叶新

Hello, I recently encountered a problem that confuses me when using spark3.0. I used the tpcx-bb dataset (200GB) and executed Query#5 in it. The SQL will read about 65.7GB of table data. Query#5 is as follows(https://github.com/NVIDIA/spark-rapids/blob/branch-0.3/integration_tests/src/main/sca

Re: Using two WriteStreams in same spark structured streaming job

2020-11-08 Thread Kevin Pis

you means sparkSession.streams.awaitAnyTermination()? May i have your code ? or you can see the following: my demo code: val hourDevice = beginTimeDevice.groupBy($"subsId",$"eventBeginHour",$"serviceType") .agg("duration" -> "sum").withColumnRenamed("sum(duration)", "durationForHo

Out of memory issue

2020-11-08 Thread Amit Sharma

Hi , I am using 16 nodes spark cluster with below config 1. Executor memory 8 GB 2. 5 cores per executor 3. Driver memory 12 GB. We have streaming job. We do not see problem but sometimes we get exception executor-1 heap memory issue. I am not understanding if data size is same and this job rece