Thanks, Megh ! I did some research and realized the same - PVC is not a good option for spark shuffle, primarily for latency issues. The same is the case with S3 or MinIO.
I've implemented option 2, and am testing this out currently: Storing data in host path is possible regds, Karan Alang On Sun, Apr 6, 2025 at 7:08 PM megh vidani <vidanimeg...@gmail.com> wrote: > Hello Karan, > > Apart from Celeborn, there is Apache Uniffle (Incubating) as well. We also > have similar setup as yours and we're trying out a PoC with Uniffle right > now. > > What I've gathered so far is, with Uniffle: > 1. Storing data in PVCs is not well supported > 2. Storing data in host path is possible > 3. Storing data in HDFS is possible, but I'm not sure about HDFS > compatible S3 (e.g. MinIO) storage yet, we're trying it out > > Thanks, > Megh > > Thanks, > Megh > > On Tue, Apr 1, 2025, 02:43 karan alang <karan.al...@gmail.com> wrote: > >> seems apache-celeborn is also an option, if anyone has used this pls let >> me know. >> >> thanks! >> >> >> On Mon, Mar 31, 2025 at 1:58 PM karan alang <karan.al...@gmail.com> >> wrote: >> >>> hello all - checking to see if anyone has any input on this >>> >>> thanks! >>> >>> >>> On Tue, Mar 25, 2025 at 12:22 PM karan alang <karan.al...@gmail.com> >>> wrote: >>> >>>> hello All, >>>> >>>> I have kubeflow Spark Operator installed on k8s and from what i >>>> understand - Spark Shuffle is not officially supported on kubernetes. >>>> >>>> Looking for feedback from the community on what approach is being taken >>>> to handle this issue - especially since dynamicAllocation cannot be >>>> enabled without Spark Shuffle. >>>> >>>> for eg. >>>> Does storing the shuffle data in PVC help ? >>>> >>>> Pls let me know. >>>> >>>> tia! >>>> >>>> >>>