Thanks, Megh !

I did some research and realized the same - PVC is not a good option for
spark shuffle, primarily for latency issues.
The same is the case with S3 or MinIO.

I've implemented option 2, and am testing this out currently:  Storing data
in host path is possible

regds,
Karan Alang



On Sun, Apr 6, 2025 at 7:08 PM megh vidani <vidanimeg...@gmail.com> wrote:

> Hello Karan,
>
> Apart from Celeborn, there is Apache Uniffle (Incubating) as well. We also
> have similar setup as yours and we're trying out a PoC with Uniffle right
> now.
>
> What I've gathered so far is, with Uniffle:
> 1. Storing data in PVCs is not well supported
> 2. Storing data in host path is possible
> 3. Storing data in HDFS is possible, but I'm not sure about HDFS
> compatible S3 (e.g. MinIO) storage yet, we're trying it out
>
> Thanks,
> Megh
>
> Thanks,
> Megh
>
> On Tue, Apr 1, 2025, 02:43 karan alang <karan.al...@gmail.com> wrote:
>
>> seems apache-celeborn is also an option, if anyone has used this pls let
>> me know.
>>
>> thanks!
>>
>>
>> On Mon, Mar 31, 2025 at 1:58 PM karan alang <karan.al...@gmail.com>
>> wrote:
>>
>>> hello all - checking to see if anyone has any input on this
>>>
>>> thanks!
>>>
>>>
>>> On Tue, Mar 25, 2025 at 12:22 PM karan alang <karan.al...@gmail.com>
>>> wrote:
>>>
>>>> hello All,
>>>>
>>>> I have kubeflow Spark Operator installed on k8s and from what i
>>>> understand - Spark Shuffle is not officially supported on kubernetes.
>>>>
>>>> Looking for feedback from the community on what approach is being taken
>>>> to handle this issue - especially since dynamicAllocation cannot be
>>>> enabled without Spark Shuffle.
>>>>
>>>> for eg.
>>>> Does storing the shuffle data in PVC help ?
>>>>
>>>> Pls let me know.
>>>>
>>>> tia!
>>>>
>>>>
>>>

Reply via email to