I’ll chime in as an actual implementor of a custom DataSource who is keeping an
eye on the 3.0 DSv2 changes.
We started implementing DSv2 in the 2.4 branch, but quickly discovered that the
DSv2 in 3.0 was a complete breaking change (to the point where it could have
been named DSv3 and it wouldn
I'd like to bump this. I agree with Carlos that there is very little
information at the DataSoruceWrite/DataSourceReader level. To me, ideally, the
DataSourceWriter/Reader should have as much information as possible. Not only
the number of partitions, but also ideally the whole execution plan.
Great! Please add
joaquin.guantergonzal...@telefonica.com<mailto:joaquin.guantergonzal...@telefonica.com>
to the list of attendees.
Thanks,
Ximo
De: Ryan Blue
Enviado el: lunes, 10 de diciembre de 2018 18:46
Para: JOAQUIN GUANTER GONZALBEZ
CC: Wenchen Fan ; Spark Dev List
Asunto: Re:
: miércoles, 5 de diciembre de 2018 15:51
Para: JOAQUIN GUANTER GONZALBEZ
CC: Spark dev list
Asunto: Re: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save
optional
The bucket feature is designed to only work with data sources with table
support, and currently the table support is
Hello,
I have a proposal for a small improvement in the Datasource API and I'd like to
know if it sounds like a change the Spark project would accept.
Currently, the `.save` method in DataFrameWriter will fail if the dataframe is
bucketed and/or sorted. This makes sense, since there is no way o
[mailto:daniel.dara...@lynxanalytics.com]
Enviado el: lunes, 21 de marzo de 2016 16:20
Para: Ted Yu
CC: JOAQUIN GUANTER GONZALBEZ ;
dev@spark.apache.org
Asunto: Re: Performance improvements for sorted RDDs
There is related discussion in
https://issues.apache.org/jira/browse/SPARK-8836. It's not too ha
Hello devs,
I have found myself in a situation where Spark is doing sub-optimal
computations for my RDDs, and I was wondering whether a patch to enable
improved performance for this scenario would be a welcome addition to Spark or
not.
The scenario happens when trying to cogroup two RDDs that