RE: pyspark4.0.0 still includes "jackson-mapper-asl.jar" that was supposed to be removed according to release note

2025-06-26 Thread Haibo.Wang
+ d...@spark.apache.org HI All Could some help to look into this item? And appreciate if you can forward this thread to the correct team if this is not finding the correct contact. Thanks. Regards Harper From: Wang, Harper (FRPPE) Sent: Wednesday, June 25, 2025 10:

Technical Guidance: Dynamic Resource Allocation + External Shuffle Storage

2025-06-26 Thread Andrew M.
I'm having trouble getting dynamic resource allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. I'm trying this strategy out to battle cost via very severe data skew (I don't really care if a couple nodes run for hours while

Re: What is the current canonical way to join more than 2 watermarked streams (Spark 3.5.6)?

2025-06-26 Thread Jungtaek Lim
Hi, Starting from Spark 4.0.0, we support multiple stateful operators in append mode. You can perform the chain of stream-stream joins. One thing you need to care about is, the output of stream-stream join will have two different event time columns, which is ambiguous w.r.t. which column has to b

Inquiry About User Impersonation Support in Spark Thrift Server (Spark 1.x to 4.x)

2025-06-26 Thread Allen Chu
Dear [Team / Support / Apache Spark Community], I hope this message finds you well. I'm reaching out to inquire about the support for *user impersonation* in the *Spark Thrift Server* across different versions of Apache Spark, specifically from *Spark 1.x through Spark 4.x*. We are currently eva