One thing is enforcing the quality of the data Spark is producing, and
another thing entirely is defining an external data model from Spark.


The proposal doesn’t necessarily facilitate data accuracy and consistency.
Defining constraints does help with that, but the question remains: Is
Spark truly responsible for enforcing those constraints on an external
system?

El vie, 21 mar 2025 a las 21:29, Anton Okolnychyi (<aokolnyc...@gmail.com>)
escribió:

> -1 (non-binding): Breaks the Chain of Responsibility. Constraints should
>> be defined and enforced by the data sources themselves, not Spark. Spark is
>> a processing engine, and enforcing constraints at this level blurs
>> architectural boundaries, making Spark responsible for something it does
>> not control.
>>
>
> I disagree that this breaks the chain of responsibility. It may be quite
> the opposite, in fact. Spark is already responsible for enforcing NOT NULL
> constraints by adding AssertNotNull for required columns today. Connectors
> like Iceberg and Delta store constraint definitions but rely on engines
> like Spark to enforce them during INSERT, DELETE, UPDATE, and MERGE
> operations. Without this API, each connector would need to reimplement the
> same logic, creating duplication.
>
> The proposal is aligned with the SQL standard and other relational
> databases. In my view, it simply makes Spark a better engine, facilitates
> data accuracy and consistency, and enables performance optimizations.
>
> - Anton
>
> пт, 21 бер. 2025 р. о 12:59 Ángel Álvarez Pascua <
> angel.alvarez.pas...@gmail.com> пише:
>
>> -1 (non-binding): Breaks the Chain of Responsibility. Constraints should
>> be defined and enforced by the data sources themselves, not Spark. Spark is
>> a processing engine, and enforcing constraints at this level blurs
>> architectural boundaries, making Spark responsible for something it does
>> not control.
>>
>> El vie, 21 mar 2025 a las 20:18, L. C. Hsieh (<vii...@gmail.com>)
>> escribió:
>>
>>> +1
>>>
>>> On Fri, Mar 21, 2025 at 12:13 PM huaxin gao <huaxin.ga...@gmail.com>
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > On Fri, Mar 21, 2025 at 12:08 PM Denny Lee <denny.g....@gmail.com>
>>> wrote:
>>> >>
>>> >> +1 (non-binding)
>>> >>
>>> >> On Fri, Mar 21, 2025 at 11:52 Gengliang Wang <ltn...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi <
>>> aokolnyc...@gmail.com> wrote:
>>> >>>>
>>> >>>> Hi all,
>>> >>>>
>>> >>>> I would like to start a vote on adding support for constraints to
>>> DSv2.
>>> >>>>
>>> >>>> Discussion thread:
>>> https://lists.apache.org/thread/njqjcryq0lot9rkbf10mtvf7d1t602bj
>>> >>>> SPIP:
>>> https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo
>>> >>>> PR with the API changes: https://github.com/apache/spark/pull/50253
>>> >>>> JIRA: https://issues.apache.org/jira/browse/SPARK-51207
>>> >>>>
>>> >>>> Please vote on the SPIP for the next 72 hours:
>>> >>>>
>>> >>>> [ ] +1: Accept the proposal as an official SPIP
>>> >>>> [ ] +0
>>> >>>> [ ] -1: I don’t think this is a good idea because …
>>> >>>>
>>> >>>> - Anton
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Reply via email to