Re: Lineage between Datasets

2017-04-12 Thread Chang Chen
Does it mean any two Datasets's physical plans are independent? Thanks Chang On Thu, Apr 13, 2017 at 12:53 AM, Reynold Xin wrote: > The physical plans are not subtrees, but the analyzed plan (before the > optimizer runs) is actually similar to "lineage". You can get that by > calling explain(tr

Re: internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
i confirmed that an Encoder[Array[Int]] is no longer serializable, and with my spark build from march 7 it was. i believe the issue is commit 295747e59739ee8a697ac3eba485d3439e4a04c3 and i send wenchen an email about it. On Wed, Apr 12, 2017 at 4:31 PM, Koert Kuipers wrote: > i believe the erro

Re: internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
i believe the error is related to an org.apache.spark.sql.expressions.Aggregator where the buffer type (BUF) is Array[Int] On Wed, Apr 12, 2017 at 4:19 PM, Koert Kuipers wrote: > hey all, > today i tried upgrading the spark version we use internally by creating a > new internal release from the

internal unit tests failing against the latest spark master

2017-04-12 Thread Koert Kuipers
hey all, today i tried upgrading the spark version we use internally by creating a new internal release from the spark master branch. last time i did this was march 7. with this updated spark i am seeing some serialization errors in the unit tests for our own libraries. looks like a scala reflecti

Re: Lineage between Datasets

2017-04-12 Thread Reynold Xin
The physical plans are not subtrees, but the analyzed plan (before the optimizer runs) is actually similar to "lineage". You can get that by calling explain(true) and look at the analyzed plan. On Wed, Apr 12, 2017 at 3:03 AM Chang Chen wrote: > Hi All > > I believe that there is no lineage bet

[SparkSession] Any Listener available in Spark 2.1 to get notified when the SparkSession is getting closed

2017-04-12 Thread Naresh P R
Hi, I have an usecase where multiple users connect to 1 thirftserver, i wanted to get notified when one of the users exit the beeline. Can someone suggest whether any SparkSession Open/Close listeners available in Spark 2.1 ? -- Regards, Naresh P R

Lineage between Datasets

2017-04-12 Thread Chang Chen
Hi All I believe that there is no lineage between datasets. Consider this case: val people = spark.read.parquet("...").as[Person] val ageGreatThan30 = people.filter("age > 30") Since the second DS can push down the condition, they are obviously different logical plans and hence are different ph

Access multiple dictionaries inside list in Scala

2017-04-12 Thread Srabasti Banerjee
Hi All, Is there a way to access multiple dictionaries with different schema structures inside a list in txt file, individually in isolation/combination as needed, from Spark shell using Scala? The need is to use information from different combinations of the dictionaries to calculate for repor