Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
+1 As Gengliang explained, the API allows the connectors to request Spark to perform data validations, but connectors can also choose to do validation by themselves. I think it's a reasonable design as not all connectors have the ability to do data validation by themselves, such as file formats th

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
Hi Angel, This feature involves 3 parties: - The end-user specifies constraints for their tables, via the SQL syntax provided by Spark. - Spark propagates the constraints to the backend connector of the tables, and performs data validation during data writing if the connector asks Spark to do so.

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread huaxin gao
+1 On Fri, Mar 21, 2025 at 12:08 PM Denny Lee wrote: > +1 (non-binding) > > On Fri, Mar 21, 2025 at 11:52 Gengliang Wang wrote: > >> +1 >> >> On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi >> wrote: >> >>> Hi all, >>> >>> I would like to start a vote on adding support for constraints to DSv

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Szehon Ho
+1 (non binding) Agree with Anton, data sources like the open table formats define the requirement, and definitely need engines to write to it accordingly. Thanks, Szehon On Fri, Mar 21, 2025 at 1:31 PM Anton Okolnychyi wrote: > -1 (non-binding): Breaks the Chain of Responsibility. Constraints

Re: Re: [VOTE] SPIP: Constraints in DSv2

2025-03-27 Thread Anton Okolnychyi
Casting my own +1 (non-binding). Angel, I echo what Wenchen said. Connectors and Spark interact via DSv2, therefore it requires changes in that layer. It is going to be optional but will make a ton of sense for many connectors, especially in modern open table formats that decouple table metadata f

Re:Re: [VOTE] SPIP: Constraints in DSv2

2025-03-27 Thread beliefer
+1 在 2025-03-26 14:45:09,"Chao Sun" 写道: +1 On Tue, Mar 25, 2025 at 10:22 PM Ángel Álvarez Pascua wrote: I meant ... a data validation API would be great, but why in the DSv2? isn't data validation something more general? do we have to use DSv2 to have our data validated? El mié, 26

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Chao Sun
+1 On Tue, Mar 25, 2025 at 10:22 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > I meant ... a data validation API would be great, but why in the DSv2? > isn't data validation something more general? do we have to use DSv2 to > have our data validated? > > El mié, 26 mar 2025,

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
I meant ... a data validation API would be great, but why in the DSv2? isn't data validation something more general? do we have to use DSv2 to have our data validated? El mié, 26 mar 2025, 6:15, Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> escribió: > For me, data validation is one thi

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
For me, data validation is one thing, and exporting that data to an external system is something entirely different. Should data validation be coupled with the external system? I don't think so. But since I'm the only one arguing against this proposal, does that mean I'm wrong? El mié, 26 mar 2025

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Gengliang Wang
Hi Ángel, Thanks for the feedback. Besides the existing NOT NULL constraint, the proposal suggests enforcing only *check constraints *by default in Spark, as they’re straightforward and practical to validate at the engine level. Additionally, the SPIP proposes allowing connectors (like JDBC) to ha

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-24 Thread Hyukjin Kwon
+1 On Mon, 24 Mar 2025 at 09:57, Jungtaek Lim wrote: > +1 (non-binding) > > Thanks for initiating this! > > On Sun, Mar 23, 2025 at 3:45 AM serge rielau.com wrote: > >> +1 (non binding) >> >> On Mar 21, 2025, at 12:52 PM, Jules Damji wrote: >> >> +1 (non-binding) >> — >> Sent from my iPhone >>

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-23 Thread Jungtaek Lim
+1 (non-binding) Thanks for initiating this! On Sun, Mar 23, 2025 at 3:45 AM serge rielau.com wrote: > +1 (non binding) > > On Mar 21, 2025, at 12:52 PM, Jules Damji wrote: > > +1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Mar 21, 2025, at 11:47 AM, Anton O

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread serge rielau . com
+1 (non binding) On Mar 21, 2025, at 12:52 PM, Jules Damji wrote: +1 (non-binding) — Sent from my iPhone Pardon the dumb thumb typos :) On Mar 21, 2025, at 11:47 AM, Anton Okolnychyi wrote:  Hi all, I would like to start a vote on adding support for constraints to DSv2. Discussion thread:

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread Anurag Mantripragada
+1 (non-binding) Thanks for working on this Anton! Some links to other engines that also did something similar: HIVE-13076 - https://issues.apache.org/jira/browse/HIVE-13076 IMPALA-3531 - https://issues.apache.org/jira/browse/IMPALA-3531 In fact, Spark had a very old Jira SPARK-19842 - https://

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread Yuming Wang
+1 On Sat, Mar 22, 2025 at 7:01 PM Peter Toth wrote: > +1 > > On Fri, Mar 21, 2025 at 10:24 PM Szehon Ho > wrote: > >> +1 (non binding) >> >> Agree with Anton, data sources like the open table formats define the >> requirement, and definitely need engines to write to it accordingly. >> >> Thank

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread Peter Toth
+1 On Fri, Mar 21, 2025 at 10:24 PM Szehon Ho wrote: > +1 (non binding) > > Agree with Anton, data sources like the open table formats define the > requirement, and definitely need engines to write to it accordingly. > > Thanks, > Szehon > > On Fri, Mar 21, 2025 at 1:31 PM Anton Okolnychyi > wr

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread DB Tsai
+1Sent from my iPhoneOn Mar 21, 2025, at 2:25 PM, Szehon Ho wrote:+1 (non binding)Agree with Anton, data sources like the open table formats define the requirement, and definitely need engines to write to it accordingly.Thanks,SzehonOn Fri, Mar 21, 2025 at 1:31 PM Anton Okolnychyi

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread Ángel Álvarez Pascua
One thing is enforcing the quality of the data Spark is producing, and another thing entirely is defining an external data model from Spark. The proposal doesn’t necessarily facilitate data accuracy and consistency. Defining constraints does help with that, but the question remains: Is Spark trul

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Zhou Jiang
+1On Mar 21, 2025, at 12:15, huaxin gao wrote:+1On Fri, Mar 21, 2025 at 12:08 PM Denny Lee wrote:+1 (non-binding) On Fri, Mar 21, 2025 at 11:52 Gengliang Wang wrote:+1On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi wrote:Hi all,

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Anton Okolnychyi
> > -1 (non-binding): Breaks the Chain of Responsibility. Constraints should > be defined and enforced by the data sources themselves, not Spark. Spark is > a processing engine, and enforcing constraints at this level blurs > architectural boundaries, making Spark responsible for something it does

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Denny Lee
+1 (non-binding) On Fri, Mar 21, 2025 at 11:52 Gengliang Wang wrote: > +1 > > On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi > wrote: > >> Hi all, >> >> I would like to start a vote on adding support for constraints to DSv2. >> >> *Discussion thread: * >> https://lists.apache.org/thread/njqj

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Ángel Álvarez Pascua
-1 (non-binding): Breaks the Chain of Responsibility. Constraints should be defined and enforced by the data sources themselves, not Spark. Spark is a processing engine, and enforcing constraints at this level blurs architectural boundaries, making Spark responsible for something it does not contro

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Jules Damji
+1 (non-binding) — Sent from my iPhone Pardon the dumb thumb typos :) > On Mar 21, 2025, at 11:47 AM, Anton Okolnychyi wrote: > >  > Hi all, > > I would like to start a vote on adding support for constraints to DSv2. > > Discussion thread: > https://lists.apache.org/thread/njqjcryq0lot9rkbf

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread L. C. Hsieh
+1 On Fri, Mar 21, 2025 at 12:13 PM huaxin gao wrote: > > +1 > > On Fri, Mar 21, 2025 at 12:08 PM Denny Lee wrote: >> >> +1 (non-binding) >> >> On Fri, Mar 21, 2025 at 11:52 Gengliang Wang wrote: >>> >>> +1 >>> >>> On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi >>> wrote: Hi all,

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Gengliang Wang
+1 On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi wrote: > Hi all, > > I would like to start a vote on adding support for constraints to DSv2. > > *Discussion thread: * > https://lists.apache.org/thread/njqjcryq0lot9rkbf10mtvf7d1t602bj > *SPIP:* > https://docs.google.com/document/d/1EHjB4W1Lj

[VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Anton Okolnychyi
Hi all, I would like to start a vote on adding support for constraints to DSv2. *Discussion thread: * https://lists.apache.org/thread/njqjcryq0lot9rkbf10mtvf7d1t602bj *SPIP:* https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo *PR with the API changes:* https://github.