[DISCUSS] SPIP: Row-level operations in Data Source V2

2021-06-24 Thread Anton Okolnychyi
Hey everyone, I'd like to start a discussion on adding support for executing row-level operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The execution should be the same across data sources and the best way to do that is to implement it in Spark. Right now, Spark can only pars

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

2021-11-12 Thread Anton Okolnychyi
t; > > my first time to shepherd a SPIP, so please let me know if > anything I can > > >> > > improve. > > >> > > > > >> > > This looks great features and the rationale claimed by the > proposal makes > > >> > > sens

Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-12 Thread Anton Okolnychyi
+1 from me too to indicate my commitment (non-binding) - Anton > On 12 Nov 2021, at 18:27, Liang Chi Hsieh wrote: > > I’d vote my +1 first. > > On 2021/11/13 02:25:05 "L. C. Hsieh" wrote: >> Hi all, >> >> I’d like to start a vote for SPIP: Row-level operations in Data Source V2. >> >> The pr

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-05 Thread Anton Okolnychyi
Sorry, I think my last message did not land on the list. I have a question about changes to exceptions used in the public connector API, such as NoSuchTableException and TableAlreadyExistsException. I consider those as part of the public Catalog API (TableCatalog uses them in method definitions

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-05 Thread Anton Okolnychyi
M Gengliang Wang wrote: > > > >> Hi Anton, > >> > >> +1 for adding the old constructors back! > >> Could you raise a PR for this? I will review it ASAP. > >> > >> Thanks > >> Gengliang > >> > >> On Wed, Apr 5, 2023 at 9

[DISCUSS][SPARK-23889] DataSourceV2: required sorting and clustering for writes

2020-03-06 Thread Anton Okolnychyi
Hi devs, I want to follow up on the dev list discussion [1] and the JIRA issue [2] created as a result of it and propose a s

[SS] Custom Sinks

2017-11-01 Thread Anton Okolnychyi
Hi all, I have a question about the future of custom data sinks in Structured Streaming. In particular, I want to know how continuous processing and the Datasource API V2 will impact them. Right now, it is possible to have custom data sinks via the current Datasource API (V1) by implementing Stre

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread Anton Okolnychyi
Recently, I came across this bug: https://issues.apache.org/jira/browse/SPARK-26706. It seems appropriate to include it in 2.3.3, doesn't it? Thanks, Anton ср, 23 янв. 2019 г. в 13:08, Takeshi Yamamuro : > Thanks for the check, Felix! > > Yea, I'll wait for the new test report. > But, it never

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread Anton Okolnychyi
ssion, no it does not need to go into 2.3.3. If it's a > real bug, sure it can be merged to 2.3.x. > > On Wed, Jan 23, 2019 at 7:54 AM Anton Okolnychyi > wrote: > > > > Recently, I came across this bug: > https://issues.apache.org/jira/browse/SPARK-26706. > >

Code Style Formatting

2016-07-01 Thread Anton Okolnychyi
fined configurations that I can import to IntelliJ IDEA to adjust hot it does the formatting. Is it possible to avoid the manual configuration? Best regards, Anton Okolnychyi

Fwd:

2016-11-15 Thread Anton Okolnychyi
Hi, I have experienced a problem using the Datasets API in Spark 1.6, while almost identical code works fine in Spark 2.0. The problem is related to encoders and custom aggregators. *Spark 1.6 (the aggregation produces an empty map):* implicit val intStringMapEncoder: Encoder[Map[Int, String]]

Typo in the programming guide?

2016-11-27 Thread Anton Okolnychyi
Hi guys, I am looking at the Accumulator section in the latest programming guide. Is there a typo in the sample code? Shouldn't the add() method accept only one param in Spark 2.0? It looks like the signature is inherited from AccumulatorParam, which was there before. object VectorAccumulatorV2 e

Expand the Spark SQL programming guide?

2016-12-15 Thread Anton Okolnychyi
Hi, I am wondering whether it makes sense to expand the Spark SQL programming guide with examples of aggregations (including user-defined via the Aggregator API) and window functions. For instance, there might be a separate subsection under "Getting Started" for each functionality. SPARK-16046 s

Re: Expand the Spark SQL programming guide?

2016-12-15 Thread Anton Okolnychyi
; I did test out implementing a distributed convex hull as a > UserDefinedAggregateFunction, and that seemed to work sensibly. > > Cheers, > > Jim > > > > On 12/15/2016 03:28 AM, Anton Okolnychyi wrote: > > Hi, > > > > I am wondering whether it makes se

Re: Expand the Spark SQL programming guide?

2016-12-18 Thread Anton Okolnychyi
gt; > Thanks! > > > On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote: > > Yes - that sounds good Anton, I can work on documenting the window > functions. > > > > *From: *Anton Okolnychyi > > *Date: *Thursday, December 15, 2016 at 4:34 PM > *To:

Re: Expand the Spark SQL programming guide?

2016-12-18 Thread Anton Okolnychyi
Any comments/suggestions are more than welcome. Thanks, Anton 2016-12-18 15:08 GMT+01:00 Anton Okolnychyi : > Here is the pull request: https://github.com/apache/spark/pull/16329 > > > > 2016-12-16 20:54 GMT+01:00 Jim Hughes : > >> I'd be happy to review a PR. At t

[SPARK-16046] PR Review

2017-01-24 Thread Anton Okolnychyi
Hi all, there is a pull request that I would like to bring back to life. It is related to the SQL programming guide and can be found here . I believe the PR should be helpful. The initial review is done already. Also, I updated it recently and checked t

[Spark SQL] ceil and floor functions on doubles

2017-05-19 Thread Anton Okolnychyi
Hi all, I am wondering why the results of ceil and floor functions on doubles are internally casted to longs. This causes loss of precision since doubles can hold bigger numbers. Consider the following example: // 9.223372036854786E20 is greater than Long.MaxValue val df = sc.parallelize(Array((

Re: [Spark SQL] ceil and floor functions on doubles

2017-05-19 Thread Anton Okolnychyi
ect 9.223372036854786E20, ceil(9.223372036854786E20); > > OK > > _c0 _c1 > > 9.223372036854786E20 9223372036854775807 > > Time taken: 2.041 seconds, Fetched: 1 row(s) > > > > Bests, > > Dongjoon. > > > > *From: *Anton Okolnychyi > *Date: *Friday

[Spark SQL] Nanoseconds in Timestamps are set as Microseconds

2017-06-01 Thread Anton Okolnychyi
Hi all, I would like to ask what the community thinks regarding the way how Spark handles nanoseconds in the Timestamp type. As far as I see in the code, Spark assumes microseconds precision. Therefore, I expect to have a truncated to microseconds timestamp or an exception if I specify a timestam

Re: [Spark SQL] Nanoseconds in Timestamps are set as Microseconds

2017-06-02 Thread Anton Okolnychyi
Then let me provide a PR so that we can discuss an alternative way 2017-06-02 8:26 GMT+02:00 Reynold Xin : > Seems like a bug we should fix? I agree some form of truncation makes more > sense. > > > On Thu, Jun 1, 2017 at 1:17 AM, Anton Okolnychyi < > anton.okolnyc...@gmai

[SQL] Return Type of Round Func

2017-07-04 Thread Anton Okolnychyi
Hi all, I have a question regarding the round() function, which was developed long time ago as SPARK-8159. Currently, the return type is exactly as the input type. That is reasonable, but does not match with Hive. As I understand, Hive produces either double or decimal as output (see here

[DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-04-19 Thread Anton Okolnychyi
Hi folks, I'd like to start a discussion on SPARK-44167 that aims to enable catalogs to expose custom routines as stored procedures. I believe this functionality will enhance Spark’s ability to interact with external connectors and allow users to perform more operations in plain SQL. SPIP [1] con

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread Anton Okolnychyi
Thanks to everyone who commented on the design doc. I updated the proposal and it is ready for another look. I hope we can converge and move forward with this effort! - Anton пт, 19 квіт. 2024 р. о 15:54 Anton Okolnychyi пише: > Hi folks, > > I'd like to start a discussion on SP

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Anton Okolnychyi
ided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https:/

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-13 Thread Anton Okolnychyi
+1 On 2024/05/13 15:33:33 Ryan Blue wrote: > +1 > > On Mon, May 13, 2024 at 12:31 AM Mich Talebzadeh > wrote: > > > +0 > > > > For reasons I outlined in the discussion thread > > > > https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo > > > > Mich Talebzadeh, > > Technologist | Arc

[DISCUSS] SPIP: Constraints in DSv2

2025-02-13 Thread Anton Okolnychyi
Hi folks, I'd like to start a discussion on SPARK-51207 that aims to extend the DSv2 API to let users define, modify, and enforce table constraints in connectors that support them. SPIP [1] contains proposed API changes and parser extensions. Any feedback is more than welcome! Wenchen was kind e

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Anton Okolnychyi
L. C. Hsieh () > escribió: > >> +1 >> >> On Fri, Mar 21, 2025 at 12:13 PM huaxin gao >> wrote: >> > >> > +1 >> > >> > On Fri, Mar 21, 2025 at 12:08 PM Denny Lee >> wrote: >> >> >> >> +1 (non-binding) &

[VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Anton Okolnychyi
Hi all, I would like to start a vote on adding support for constraints to DSv2. *Discussion thread: * https://lists.apache.org/thread/njqjcryq0lot9rkbf10mtvf7d1t602bj *SPIP:* https://docs.google.com/document/d/1EHjB4W1LjiXxsK_G7067j9pPX0y15LUF1Z5DlUPoPIo *PR with the API changes:* https://github.

Re: Re: [VOTE] SPIP: Constraints in DSv2

2025-03-27 Thread Anton Okolnychyi
oach should help improve data accuracy and consistency by >>>>> clearly defining responsibilities and enforcing constraints closer to >>>>> where >>>>> they’re best managed. >>>>> >>>>> >>>>> On Sat, Mar 22, 2025 at

Re: [VOTE] SPIP: Declarative Pipelines

2025-04-09 Thread Anton Okolnychyi
+1 (non-binding) - Anton ср, 9 квіт. 2025 р. о 15:01 Jungtaek Lim пише: > Btw who is going to shephard this SPIP? I don't see this in the > doc/JIRA/discussion thread. I understand there are PMC members in the > author list, but probably good to be explicit about "who" is > shepherding this SPI

[VOTE][RESULT] SPIP: Constraints in DSv2

2025-03-28 Thread Anton Okolnychyi
Jungtaek Lim Hyukjin Kwon (*) Wenchen Fan (*) Chao Sun (*) beliefer Anton Okolnychyi +0: None -1: Angel Alvarez Pascua

Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-08 Thread Anton Okolnychyi
+1 вт, 8 квіт. 2025 р. о 23:36 Jacky Lee пише: > +1 I'm delighted that it will be open-sourced, enabling greater > integration with Iceberg/Delta to unlock more value. > > Jungtaek Lim 于2025年4月9日周三 10:47写道: > > > > +1 looking forward to seeing this make progress! > > > > On Wed, Apr 9, 2025 at

Re: [DISCUSS] SPIP: Constraints in DSv2

2025-03-12 Thread Anton Okolnychyi
+1, the proposal will unify constraint management in DSv2 and reduce > redundant work across connectors > > > > On Thu, Feb 13, 2025 at 9:20 PM Anton Okolnychyi > wrote: > >> > >> Hi folks, > >> > >> I'd like to start a discussion on SPARK

Re: [VOTE] Release Spark 4.1.0-preview1 (RC1)

2025-07-09 Thread Anton Okolnychyi
+1 (non-binding) On Wed, Jul 9, 2025 at 8:07 AM Max Gekk wrote: > +1 > > On Wed, Jul 9, 2025 at 4:04 PM Sandy Ryza > wrote: > >> +1 (non-binding) >> >> On Wed, Jul 9, 2025 at 6:57 AM Wenchen Fan wrote: >> >>> +1 >>> >>> On Wed, Jul 9, 2025 at 1:16 AM Kousuke Saruta >>> wrote: >>> +1

Re: [DISCUSS] SPIP: Monthly preview release

2025-07-01 Thread Anton Okolnychyi
Having monthly preview releases for Spark is going to be huge for projects like Iceberg and Delta. - Anton On Tue, Jul 1, 2025 at 5:43 PM Dongjoon Hyun wrote: > Thank you for the clarification, Hyukjin. Also, thank you for sharing your > direction, DB. > > I agree with you folks that the AS-IS