Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-05 Thread Khalid Mammadov
Looks great! QQ: will user able to run this pipeline from normal code? I.e. can I trigger a pipeline from *driver* code based on some condition etc. or it must be executed via separate shell command ? As a background Databricks imposes similar limitation where as you cannot run normal Spark code an

[DISCUSS] SPIP: Declarative Pipelines

2025-04-05 Thread Sandy Ryza
Hi all – starting a discussion thread for a SPIP that I've been working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA ] [Doc ]. The SPIP

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-04-05 Thread Ángel Álvarez Pascua
Hi Jia, I really appreciate your very instructive answer. I truly believe that discussing topics with people who know far more than I do is a great way to learn new and interesting things. Your explanations are quite logical and make perfect sense to me. Sh**, I'm not that sure about this proposal

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
+1 As Gengliang explained, the API allows the connectors to request Spark to perform data validations, but connectors can also choose to do validation by themselves. I think it's a reasonable design as not all connectors have the ability to do data validation by themselves, such as file formats th

Re: setuptools 78.0.0 does not work with pyspark 3.x releases

2025-04-05 Thread Sean Owen
It'll have to come with the next 3.5 release regardless, that will happen later as needed. I think we should just get it in soonish to enable that if needed. On Mon, Mar 24, 2025, 12:14 PM Bjørn Jørgensen wrote: > Yes, I did make that PR a long time ago. It was merged on 20.11.2023. > > I can

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Wenchen Fan
Hi Angel, This feature involves 3 parties: - The end-user specifies constraints for their tables, via the SQL syntax provided by Spark. - Spark propagates the constraints to the backend connector of the tables, and performs data validation during data writing if the connector asks Spark to do so.

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread huaxin gao
+1 On Fri, Mar 21, 2025 at 12:08 PM Denny Lee wrote: > +1 (non-binding) > > On Fri, Mar 21, 2025 at 11:52 Gengliang Wang wrote: > >> +1 >> >> On Fri, Mar 21, 2025 at 11:46 AM Anton Okolnychyi >> wrote: >> >>> Hi all, >>> >>> I would like to start a vote on adding support for constraints to DSv

Re: [VOTE] SPIP: Constraints in DSv2

2025-04-05 Thread Szehon Ho
+1 (non binding) Agree with Anton, data sources like the open table formats define the requirement, and definitely need engines to write to it accordingly. Thanks, Szehon On Fri, Mar 21, 2025 at 1:31 PM Anton Okolnychyi wrote: > -1 (non-binding): Breaks the Chain of Responsibility. Constraints

Re: [VOTE] SPIP: Support NanoSecond Timestamps

2025-04-05 Thread Szehon Ho
Trying to catch up on this, Serge's suggestion in the doc seems the best way forward, https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?disco=AAABe5AUnWU. Spark would support the full ANSI SQL timestamp range, and Iceberg / Parquet/ other data source will throw ru

Re: Spark build failed> File line length exceeds 100 characters

2025-04-05 Thread Ángel Álvarez Pascua
I've noticed that the check is set in *scalastyle-config.xml*: true Given this configuration, how is it possible that some people have been able to commit changes violating this rule? Moreover, how were these changes even merged despite failing this validation? It seems like