There is Apache incubator project Uniffle:
https://github.com/apache/incubator-uniffle
It stores shuffle data on remote servers in memory, on local disk and HDFS.
Cheers,
Enrico
Am 06.04.24 um 15:41 schrieb Mich Talebzadeh:
I have seen some older references for shuffle service for k8s,
although it is not clear they are talking about a generic shuffle
service for k8s.
Anyhow with the advent of genai and the need to allow for a larger
volume of data, I was wondering if there has been any more work on
this matter. Specifically larger and scalable file systems like HDFS,
GCS , S3 etc, offer significantly larger storage capacity than local
disks on individual worker nodes in a k8s cluster, thus allowing
handling much larger datasets more efficiently. Also the degree of
parallelism and fault tolerance with these files systems come into
it. I will be interested in hearing more about any progress on this.
Thanks
.
Mich Talebzadeh,
Technologist | Solutions Architect | Data Engineer | Generative AI
London
United Kingdom
view my Linkedin profile
https://en.everybodywiki.com/Mich_Talebzadeh
Disclaimer: The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner Von Braun)".
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]