Time for Spark 3.3.3 release?

2023-07-28 Thread Yuming Wang
Hi Spark devs,

Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches
 have arrived
at branch-3.3.

Shall we make a new release, Apache Spark 3.3.3, as the third release at
branch-3.3?
I'd like to volunteer as the release manager for Apache Spark 3.3.3.


Re: Apache Arrow integration issue with Spark involving Netty

2023-07-28 Thread Dane Pitkin
Update! Netty has reverted the affecting change in v4.1.96. See netty
commit here[1] and arrow PR to upgrade here[2].

The upcoming release of arrow-memory-netty v13 should work with netty
versions <4.1.94 and >=4.1.96.

[1]
https://github.com/netty/netty/commit/dc16c5818a5cd0711f17e0a966783cdc84c9db01
[2] https://github.com/apache/arrow/pull/36926

On Thu, Jul 13, 2023 at 11:47 AM Dane Pitkin  wrote:

> I just want to add that there is a Spark Jira issue[1] for upgrading Netty
> once Arrow v13.0.0 is released this month.
>
> [1] https://issues.apache.org/jira/projects/SPARK/issues/SPARK-44212
>
> On Thu, Jul 6, 2023 at 2:25 PM Dane Pitkin  wrote:
>
>> Hi all,
>>
>> The next release of Apache Arrow v13.0.0 coming this month[1] has
>> upgraded Netty to v4.1.94.Final[2] due to a moderate severity CVE[3]. We
>> are seeing that Spark using Netty v4.1.93.Final is not compatible with
>> Arrow v13.0.0, throwing an exception at runtime[4]. There has been some
>> talk in a Spark PR about upgrading to Netty v4.1.94.Final once the new
>> arrow-memory-netty is released[5].
>>
>> Should the Spark POM be updated to shade arrow-memory-netty?
>>
>> Thanks,
>> Dane
>>
>> [1] https://lists.apache.org/thread/f9r0dsd65ohdtcvc7fnnlfs23n3z0n7f
>> [2] https://github.com/apache/arrow/pull/36211
>> [3] https://github.com/advisories/GHSA-6mjq-h674-j845
>> [4] https://github.com/apache/arrow/issues/36332
>> [5] https://github.com/apache/spark/pull/41681
>>
>>


[VOTE] SPIP: XML data source support

2023-07-28 Thread Sandip Agarwala
Dear Spark community,

I would like to start the vote for "SPIP: XML data source support".

XML is a widely used data format. An external spark-xml package (
https://github.com/databricks/spark-xml) is available to read and write XML
data in spark. Making spark-xml built-in will provide a better user
experience for Spark SQL and structured streaming. The proposal is to
inline code from the spark-xml package.

SPIP link:
https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing

JIRA:
https://issues.apache.org/jira/browse/SPARK-44265

Discussion Thread:
https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh

Please vote on the SPIP for the next 72 hours:
[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because __.

Thanks, Sandip


Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Sean Owen
+1 I think that porting the package 'as is' into Spark is probably
worthwhile.
That's relatively easy; the code is already pretty battle-tested and not
that big and even originally came from Spark code, so is more or less
similar already.

One thing it never got was DSv2 support, which means XML reading would
still be somewhat behind other formats. (I was not able to implement it.)
This isn't a necessary goal right now, but would be possibly part of the
logic of moving it into the Spark code base.

On Fri, Jul 28, 2023 at 5:38 PM Sandip Agarwala
 wrote:

> Dear Spark community,
>
> I would like to start the vote for "SPIP: XML data source support".
>
> XML is a widely used data format. An external spark-xml package (
> https://github.com/databricks/spark-xml) is available to read and write
> XML data in spark. Making spark-xml built-in will provide a better user
> experience for Spark SQL and structured streaming. The proposal is to
> inline code from the spark-xml package.
>
> SPIP link:
>
> https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
>
> JIRA:
> https://issues.apache.org/jira/browse/SPARK-44265
>
> Discussion Thread:
> https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh
>
> Please vote on the SPIP for the next 72 hours:
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because __.
>
> Thanks, Sandip
>


Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Xiao Li
+1

On Fri, Jul 28, 2023 at 15:54 Sean Owen  wrote:

> +1 I think that porting the package 'as is' into Spark is probably
> worthwhile.
> That's relatively easy; the code is already pretty battle-tested and not
> that big and even originally came from Spark code, so is more or less
> similar already.
>
> One thing it never got was DSv2 support, which means XML reading would
> still be somewhat behind other formats. (I was not able to implement it.)
> This isn't a necessary goal right now, but would be possibly part of the
> logic of moving it into the Spark code base.
>
> On Fri, Jul 28, 2023 at 5:38 PM Sandip Agarwala
>  wrote:
>
>> Dear Spark community,
>>
>> I would like to start the vote for "SPIP: XML data source support".
>>
>> XML is a widely used data format. An external spark-xml package (
>> https://github.com/databricks/spark-xml) is available to read and write
>> XML data in spark. Making spark-xml built-in will provide a better user
>> experience for Spark SQL and structured streaming. The proposal is to
>> inline code from the spark-xml package.
>>
>> SPIP link:
>>
>> https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
>>
>> JIRA:
>> https://issues.apache.org/jira/browse/SPARK-44265
>>
>> Discussion Thread:
>> https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh
>>
>> Please vote on the SPIP for the next 72 hours:
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>>
>> Thanks, Sandip
>>
>


Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Adrian Pop-Tifrea
+1, the more data source formats, the better, and if the solution is
already thoroughly tested, I say we should go for it.

On Sat, Jul 29, 2023, 06:35 Xiao Li  wrote:

> +1
>
> On Fri, Jul 28, 2023 at 15:54 Sean Owen  wrote:
>
>> +1 I think that porting the package 'as is' into Spark is probably
>> worthwhile.
>> That's relatively easy; the code is already pretty battle-tested and not
>> that big and even originally came from Spark code, so is more or less
>> similar already.
>>
>> One thing it never got was DSv2 support, which means XML reading would
>> still be somewhat behind other formats. (I was not able to implement it.)
>> This isn't a necessary goal right now, but would be possibly part of the
>> logic of moving it into the Spark code base.
>>
>> On Fri, Jul 28, 2023 at 5:38 PM Sandip Agarwala
>>  wrote:
>>
>>> Dear Spark community,
>>>
>>> I would like to start the vote for "SPIP: XML data source support".
>>>
>>> XML is a widely used data format. An external spark-xml package (
>>> https://github.com/databricks/spark-xml) is available to read and write
>>> XML data in spark. Making spark-xml built-in will provide a better user
>>> experience for Spark SQL and structured streaming. The proposal is to
>>> inline code from the spark-xml package.
>>>
>>> SPIP link:
>>>
>>> https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
>>>
>>> JIRA:
>>> https://issues.apache.org/jira/browse/SPARK-44265
>>>
>>> Discussion Thread:
>>> https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because __.
>>>
>>> Thanks, Sandip
>>>
>>


Re: [VOTE] SPIP: XML data source support

2023-07-28 Thread Jia Fan

+ 1


> 2023年7月29日 13:06,Adrian Pop-Tifrea  写道:
> 
> +1, the more data source formats, the better, and if the solution is already 
> thoroughly tested, I say we should go for it.
> 
> On Sat, Jul 29, 2023, 06:35 Xiao Li  > wrote:
>> +1
>> 
>> On Fri, Jul 28, 2023 at 15:54 Sean Owen > > wrote:
>>> +1 I think that porting the package 'as is' into Spark is probably 
>>> worthwhile.
>>> That's relatively easy; the code is already pretty battle-tested and not 
>>> that big and even originally came from Spark code, so is more or less 
>>> similar already.
>>> 
>>> One thing it never got was DSv2 support, which means XML reading would 
>>> still be somewhat behind other formats. (I was not able to implement it.)
>>> This isn't a necessary goal right now, but would be possibly part of the 
>>> logic of moving it into the Spark code base.
>>> 
>>> On Fri, Jul 28, 2023 at 5:38 PM Sandip Agarwala 
>>>  wrote:
 Dear Spark community,
 
 I would like to start the vote for "SPIP: XML data source support".
 
 XML is a widely used data format. An external spark-xml package 
 (https://github.com/databricks/spark-xml) is available to read and write 
 XML data in spark. Making spark-xml built-in will provide a better user 
 experience for Spark SQL and structured streaming. The proposal is to 
 inline code from the spark-xml package.
 
 SPIP link:
 https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing
 
 JIRA:
 https://issues.apache.org/jira/browse/SPARK-44265
 
 Discussion Thread:
 https://lists.apache.org/thread/q32hxgsp738wom03mgpg9ykj9nr2n1fh
 
 Please vote on the SPIP for the next 72 hours:
 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because __.
 
 Thanks, Sandip