Re: Debugging tools for Spark Structured Streaming

Artemis User Fri, 30 Oct 2020 08:01:19 -0700

Spark distribute loads to executors and the executors are usuallypre-configured with the number of cores. You may want to check withyour Spark admin on how many executors (or slaves) your Spark cluster isconfigured with and how many cores are pre-configured for executors. The debugging tool for performance tuning in Spark would be the built-inWeb UI.

The level of parallel processing in structured streaming isn't asstraightforward as standard ETL processing. It depends on the datasource, streaming mode (continuous or microbatch), your trigger timing,etc. We have experienced similar scaling problems with structuredstreaming. Please note that Spark is designed for processing large datachunks, not for streaming type of data one piece at a time. It doesn'tlike small piece of data (the default partition size is set to 128 MB),period! The partition mechanism and its RDD-driven DAG Job schedulerare all designed for processing large-scale data for ETL. It has toaccumulate streaming data into a large chunk first, before scaling cantake place. Apparently Spark can't distribute the read operation either(only one worker, and it has to do with preserving the order of streamdata). So your data ingestion becomes a bottleneck that prevents fromscaling down the chain. The alternatives may be to look into otherstreaming frameworks, like Apache Ignite..


-- ND

On 10/29/20 8:02 PM, Eric Beabes wrote:

We're using Spark 2.4. We recently pushed to production a productthat's using Spark Structured Streaming. It's working well most of thetime but occasionally, when the load is high, we've noticed that thereare only 10+ 'Active Tasks' even though we've provided 128 cores.Would like to debug this further. Why are all the Cores not gettingused? How do we debug this? Please help. Thanks.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Debugging tools for Spark Structured Streaming

Reply via email to