from:"Jianneng Li"

Re: [Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

2020-02-25 Thread Jianneng Li

an't do all joins this way. Best, Jianneng From: yeikel valdes Sent: Tuesday, February 25, 2020 5:48 AM To: Jianneng Li Cc: user@spark.apache.org ; genie_...@outlook.com Subject: Re: [Spark SQL] Memory problems with packing too many joins into the same W

Re: [Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

2020-02-24 Thread Jianneng Li

many joins into the same WholeStageCodegen I have encountered too many joins problem before. Since the joined dataframe is small enough, I convert join to udf operation, which is much faster and didn’t generate out of memory problem. 2020年2月25日 10:15，Jianneng Li mailto:jianneng...@workday.com>

[Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

2020-02-24 Thread Jianneng Li

Hello everyone, WholeStageCodegen generates code that appends results into a BufferedRowIterator, which keeps the results in an in-memory linked list

Re: Where does the Driver run?

2019-03-29 Thread Jianneng Li

quests and call SparkContext accordingly. Best, Jianneng From: Pat Ferrel Sent: Thursday, March 28, 2019 10:10 AM To: Jianneng Li Cc: user@spark.apache.org; ak...@hacked.work; andrew.m...@gmail.com; and...@actionml.com Subject: Re: Where does the Driver run?

Re: Where does the Driver run?

2019-03-28 Thread Jianneng Li

Hi Pat, The driver runs in the same JVM as SparkContext. You didn't go into detail about how you "launch" the job (i.e. how the SparkContext is created), so it's hard for me to guess where the driver is. For reference, we've had success launching Spark programmatically to YARN in cluster mode

Questions about Spark Shuffle and Heap

2015-12-04 Thread Jianneng Li

Hi, On the Spark Configuration page ( http://spark.apache.org/docs/1.5.2/configuration.html), the documentation for spark.shuffle.memoryFraction mentions that the fraction is taken from the Java heap. However, the documentation for spark.shuffle.io.preferDirectBufs implies that off-heap memory mig

Record-at-a-time model for Spark Streaming

2014-10-07 Thread Jianneng Li

Hello, I understand that Spark Streaming uses micro-batches to implement streaming, while traditional streaming systems use the record-at-a-time processing model. The performance benefit of the former is throughput, and the latter is latency. I'm wondering what it would take to implement record-at

Re: [Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

Re: [Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

[Spark SQL] Memory problems with packing too many joins into the same WholeStageCodegen

Re: Where does the Driver run?

Re: Where does the Driver run?

Questions about Spark Shuffle and Heap

Record-at-a-time model for Spark Streaming

7 matches

Site Navigation

Mail list logo

Footer information