RE: [DISCUSS] Spark 2.5 release

2019-09-25 Thread JOAQUIN GUANTER GONZALBEZ
I’ll chime in as an actual implementor of a custom DataSource who is keeping an eye on the 3.0 DSv2 changes. We started implementing DSv2 in the 2.4 branch, but quickly discovered that the DSv2 in 3.0 was a complete breaking change (to the point where it could have been named DSv3 and it wouldn

RE: Partitions at DataSource API V2

2019-03-13 Thread JOAQUIN GUANTER GONZALBEZ
I'd like to bump this. I agree with Carlos that there is very little information at the DataSoruceWrite/DataSourceReader level. To me, ideally, the DataSourceWriter/Reader should have as much information as possible. Not only the number of partitions, but also ideally the whole execution plan.

RE: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

2018-12-13 Thread JOAQUIN GUANTER GONZALBEZ
Great! Please add joaquin.guantergonzal...@telefonica.com<mailto:joaquin.guantergonzal...@telefonica.com> to the list of attendees. Thanks, Ximo De: Ryan Blue Enviado el: lunes, 10 de diciembre de 2018 18:46 Para: JOAQUIN GUANTER GONZALBEZ CC: Wenchen Fan ; Spark Dev List Asunto: Re:

RE: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

2018-12-10 Thread JOAQUIN GUANTER GONZALBEZ
: miércoles, 5 de diciembre de 2018 15:51 Para: JOAQUIN GUANTER GONZALBEZ CC: Spark dev list Asunto: Re: [SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional The bucket feature is designed to only work with data sources with table support, and currently the table support is

[SPARK-26160] Make assertNotBucketed call in DataFrameWriter::save optional

2018-11-26 Thread JOAQUIN GUANTER GONZALBEZ
Hello, I have a proposal for a small improvement in the Datasource API and I'd like to know if it sounds like a change the Spark project would accept. Currently, the `.save` method in DataFrameWriter will fail if the dataframe is bucketed and/or sorted. This makes sense, since there is no way o

RE: Performance improvements for sorted RDDs

2016-03-21 Thread JOAQUIN GUANTER GONZALBEZ
[mailto:daniel.dara...@lynxanalytics.com] Enviado el: lunes, 21 de marzo de 2016 16:20 Para: Ted Yu CC: JOAQUIN GUANTER GONZALBEZ ; dev@spark.apache.org Asunto: Re: Performance improvements for sorted RDDs There is related discussion in https://issues.apache.org/jira/browse/SPARK-8836. It's not too ha

Performance improvements for sorted RDDs

2016-03-21 Thread JOAQUIN GUANTER GONZALBEZ
Hello devs, I have found myself in a situation where Spark is doing sub-optimal computations for my RDDs, and I was wondering whether a patch to enable improved performance for this scenario would be a welcome addition to Spark or not. The scenario happens when trying to cogroup two RDDs that