Re: Scala 2.13 actual class used for Seq

2020-10-19 Thread Sean Owen
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of
things will now return an immutable Seq. Almost all code doesn't care what
Seq it returns and we didn't change any of that in the code, so, this is
just what we're getting as a 'default' from whatever operations produce the
Seq. (But a user app expecting a Seq in 2.13 will still just work, as it
will be expecting an immutable.Seq then)

You're right that many things don't necessarily return a WrappedArray
anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user
apps may need to change for 2.13, but, there are N things that any 2.13 app
would have to change.

On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers  wrote:

> i have gotten used to spark always returning a WrappedArray for Seq. at
> some point i think i even read this was guaranteed to be the case. not sure
> if it still is...
>
> in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:
>
> scala> val x = Seq((1,2),(1,3)).toDF
> x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>
> scala>
> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3",
> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
> +---+--+-+
> |_1 |_3|class_of_3   |
> +---+--+-+
> |1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
> +---+--+-+
>
> but when i build current master with scala 2.13 i get:
>
> scala> val x = Seq((1,2),(1,3)).toDF
> warning: 1 deprecation (since 2.13.3); for details, enable `:setting
> -deprecation' or `:replay -deprecation'
> val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>
> scala>
> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class",
> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
> +---+--+-+
> |_1 |_3|class|
> +---+--+-+
> |1  |[2, 3]|class scala.collection.immutable.$colon$colon|
> +---+--+-+
>
> i am curious if we are planning on returning immutable Seq going forward
> (which is nice)? and if so is List the best choice? i was sort of guessing
> it would be an immutable ArraySeq perhaps (given it provides efficient ways
> to wrap an array and access the underlying array)?
>
> best
>


Re: Scala 2.13 actual class used for Seq

2020-10-19 Thread Koert Kuipers
i rebuild master for Spark 2.12 and i see it also uses List instead of
WrappedArray. so the change is in master (compared to 3.0.1) and it is not
limited to Scala 2.13.
this might impact user programs somewhat? List has different performance
characteristics than WrappedArray... for starters it is not an IndexedSeq.


On Mon, Oct 19, 2020 at 8:24 AM Sean Owen  wrote:

> Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of
> things will now return an immutable Seq. Almost all code doesn't care what
> Seq it returns and we didn't change any of that in the code, so, this is
> just what we're getting as a 'default' from whatever operations produce the
> Seq. (But a user app expecting a Seq in 2.13 will still just work, as it
> will be expecting an immutable.Seq then)
>
> You're right that many things don't necessarily return a WrappedArray
> anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user
> apps may need to change for 2.13, but, there are N things that any 2.13 app
> would have to change.
>
> On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers  wrote:
>
>> i have gotten used to spark always returning a WrappedArray for Seq. at
>> some point i think i even read this was guaranteed to be the case. not sure
>> if it still is...
>>
>> in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:
>>
>> scala> val x = Seq((1,2),(1,3)).toDF
>> x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>>
>> scala>
>> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3",
>> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
>> +---+--+-+
>> |_1 |_3|class_of_3   |
>> +---+--+-+
>> |1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
>> +---+--+-+
>>
>> but when i build current master with scala 2.13 i get:
>>
>> scala> val x = Seq((1,2),(1,3)).toDF
>> warning: 1 deprecation (since 2.13.3); for details, enable `:setting
>> -deprecation' or `:replay -deprecation'
>> val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>>
>> scala>
>> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class",
>> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
>> +---+--+-+
>> |_1 |_3|class|
>> +---+--+-+
>> |1  |[2, 3]|class scala.collection.immutable.$colon$colon|
>> +---+--+-+
>>
>> i am curious if we are planning on returning immutable Seq going forward
>> (which is nice)? and if so is List the best choice? i was sort of guessing
>> it would be an immutable ArraySeq perhaps (given it provides efficient ways
>> to wrap an array and access the underlying array)?
>>
>> best
>>
>


Re: Scala 2.13 actual class used for Seq

2020-10-19 Thread Sean Owen
It's possible the changes do change the concrete return type in 2.12 too,
though no API interface types should change. I recall that because 2.13
makes WrappedArray a typedef (not gone, actually) I believe some code had
to change that expected it, to make it work on 2.12 and 2.13. Apps
shouldn't depend on the concrete implementation of course, but yes that
could be an issue if some code is expecting a particular collection class.

On Mon, Oct 19, 2020 at 11:17 AM Koert Kuipers  wrote:

> i rebuild master for Spark 2.12 and i see it also uses List instead of
> WrappedArray. so the change is in master (compared to 3.0.1) and it is not
> limited to Scala 2.13.
> this might impact user programs somewhat? List has different performance
> characteristics than WrappedArray... for starters it is not an IndexedSeq.
>
>
> On Mon, Oct 19, 2020 at 8:24 AM Sean Owen  wrote:
>
>> Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots
>> of things will now return an immutable Seq. Almost all code doesn't care
>> what Seq it returns and we didn't change any of that in the code, so, this
>> is just what we're getting as a 'default' from whatever operations produce
>> the Seq. (But a user app expecting a Seq in 2.13 will still just work, as
>> it will be expecting an immutable.Seq then)
>>
>> You're right that many things don't necessarily return a WrappedArray
>> anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user
>> apps may need to change for 2.13, but, there are N things that any 2.13 app
>> would have to change.
>>
>> On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers  wrote:
>>
>>> i have gotten used to spark always returning a WrappedArray for Seq. at
>>> some point i think i even read this was guaranteed to be the case. not sure
>>> if it still is...
>>>
>>> in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:
>>>
>>> scala> val x = Seq((1,2),(1,3)).toDF
>>> x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>>>
>>> scala>
>>> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3",
>>> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
>>> +---+--+-+
>>> |_1 |_3|class_of_3   |
>>> +---+--+-+
>>> |1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
>>> +---+--+-+
>>>
>>> but when i build current master with scala 2.13 i get:
>>>
>>> scala> val x = Seq((1,2),(1,3)).toDF
>>> warning: 1 deprecation (since 2.13.3); for details, enable `:setting
>>> -deprecation' or `:replay -deprecation'
>>> val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>>>
>>> scala>
>>> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class",
>>> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
>>> +---+--+-+
>>> |_1 |_3|class|
>>> +---+--+-+
>>> |1  |[2, 3]|class scala.collection.immutable.$colon$colon|
>>> +---+--+-+
>>>
>>> i am curious if we are planning on returning immutable Seq going forward
>>> (which is nice)? and if so is List the best choice? i was sort of guessing
>>> it would be an immutable ArraySeq perhaps (given it provides efficient ways
>>> to wrap an array and access the underlying array)?
>>>
>>> best
>>>
>>


RE: IPv6 support

2020-10-19 Thread Rao, Abhishek (Nokia - IN/Bangalore)
Hi,

I see there is some implementation done in spark towards supporting IPv6.

https://issues.apache.org/jira/browse/SPARK-32103

As per the description, it says support for IPv6 in Yarn mode. Do we have any 
plans to have IPv6 support for other resource managers (specially Kubernetes)?


Thanks and Regards,
Abhishek

From: Steve Loughran 
Sent: Wednesday, July 17, 2019 4:52 PM
To: dev@spark.apache.org
Subject: Re: IPv6 support


Fairly neglected hadoop patch, FWIW; 
https://issues.apache.org/jira/browse/HADOOP-11890

FB have been running HDFS &c on IPv6 for a while, but their codebase has 
diverged; getting the stuff into trunk is going to take effort. At least the 
JDK has moved on and should be better

On Wed, Jul 17, 2019 at 6:42 AM Pavithra R 
mailto:pavithr...@huawei.com>> wrote:
I came across some issues which were fixed for ipv6 support.
But I cant find any documentation that claims, sparks supports ipv6 completely.

Hadoop is having a separate jira to work on ipv6 support. Is there any such 
task in spark too?
I would like to know if there is any task planned for ipv6 support of Spark?

Pavithra R