Re: Inquiry in regards to a New onQuery Method for StreamingQueryListener

2025-04-06 Thread Jevon Cowell
I’ve been thinking about this quite a bit today and what an implementation on the spark side would look like. After some deliberation I concluded: We should instead have an `onQueryTriggerStart` method that is published every time a MicroBatch is triggered This should of course be disabled by d

Re: Spark Shuffle - in kubeflow spark operator installation on k8s

2025-04-06 Thread karan alang
One issue I've seen is that after about 24 hours, the sparkapplication job pods seem to be getting evicted .. i've installed spark history server, and am verifying the case. It could be due to resource constraints, checking this. Pls note : kubeflow spark operator is installed in namespace - so35

Re: Spark Shuffle - in kubeflow spark operator installation on k8s

2025-04-06 Thread karan alang
Thanks, Megh ! I did some research and realized the same - PVC is not a good option for spark shuffle, primarily for latency issues. The same is the case with S3 or MinIO. I've implemented option 2, and am testing this out currently: Storing data in host path is possible regds, Karan Alang O

Re: Spark Shuffle - in kubeflow spark operator installation on k8s

2025-04-06 Thread megh vidani
Hello Karan, Apart from Celeborn, there is Apache Uniffle (Incubating) as well. We also have similar setup as yours and we're trying out a PoC with Uniffle right now. What I've gathered so far is, with Uniffle: 1. Storing data in PVCs is not well supported 2. Storing data in host path is possible

kubernetes spark connect iceberg SparkWrite$WriterFactory not found

2025-04-06 Thread Razvan Mihai
Hello, I'm trying to run a simple Python client against a spark connect server running in Kubernetes as a proof-of-concept. The client writes a couple of records to a local Iceberg table. The Iceberg runtime is provisioned using "--packages" argument to the "start-connect-server.sh" and I see