I understand that "non-dev" persons could become confused and that some
sort of signposting/warning makes sense.
Certainly I consider my personal registry on gitlab.com as ephemeral and
not intended to publish.
We have our own private instance of gitlab where I put artifacts that are
derived and t
Hi Wenchen,
Glad to know that you like this idea.
We also looked into making this pluggable in our early design phase.
While the ShuffleManager API for pluggable shuffle systems does provide
quite some room for customized behaviors for Spark shuffle, we feel that it
is still not enough for this ca
Yeah the color on this is that 'snapshot' or 'nightly' builds are not
quite _discouraged_ by the ASF, but need to be something only devs are
likely to find and clearly signposted, because they aren't official
blessed releases. It gets into a gray area if the project is
'officially' hosting a way to
Hi, Jim.
Thank you for the proposal. I understand the request.
However, the following key benefit sounds like unofficial snapshot binary
releases.
> For example, this was used to build a version of spark that included
SPARK-28938 which has yet to be released and was necessary for
spark-operator t
This story [1] proposes adding a .gitlab-ci.yml file to make it easy to
create artifacts and images for spark.
Using this mechanism, people can submit any subsequent version of spark for
building and image hosting with gitlab.com.
There is a companion WIP branch [2] with a candidate and example f
I don't think we want to add a lot of flexibility to the PARTITION BY
expressions. It's usually just columns or nested fields, or some common
functions like year, month, etc.
If you look at the parser, we create DS V2 Expression directly.
The partition-specific expressions are for
`DataFrameWriter
The name "push-based shuffle" is a little misleading. This seems like a
better shuffle service that co-locates shuffle blocks of one reducer at the
map phase. I think this is a good idea. Is it possible to make it
completely external via the shuffle plugin API? This looks like a good use
case of th