RDD.cacheDataSet() not working intermittently

2017-05-08 Thread jasbir.sing
Hi, I have a scenario in which I am caching my RDDs for future use. But I observed that when I use my RDD, complete DAG is re-executed and RDD gets created again. How can I avoid this scenario and make sure that RDD.cacheDataSet() caches RDD every time. Regards, Jasbir Singh __

RDD.cacheDataSet() not working intermittently

2017-05-08 Thread jasbir.sing
Hi, I have a scenario in which I am caching my RDDs for future use. But I observed that when I use my RDD, complete DAG is re-executed and RDD gets created again. How can I avoid this scenario and make sure that RDD.cacheDataSet() caches RDD every time. Regards, Jasbir Singh __

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-08 Thread Holden Karau
So I have a PR to add this to the release process documentation - I'm waiting on the necessary approvals from PyPi folks before I merge that incase anything changes as a result of the discussion (like uploading to the legacy host or something). As for conda-forge, it's not something we need to do,

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-08 Thread cloud0fan
Hi Holden, Thanks for working on it! Do we have a JIRA ticket to track this? We should make it part of the release process in all the following Spark releases, and it will be great if we have a JIRA ticket to record the detailed steps of doing this and even automate it. Thanks, Wenchen -- View

Uploading PySpark 2.1.1 to PyPi

2017-05-08 Thread Holden Karau
Just a heads up I'm in the process of trying to upload the latest PySpark to PyPi (we are blocked on a tickets with the PyPi folks around file size but I'll follow up with them). Relatedly PySpark is available in Conda-forge, currently 2.1.0 and there is a PR to update to 2.1.1 in process. Happy

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-08 Thread Ricardo Almeida
As for build and tests, all pass on both macOS 10 and Ubuntu 16.10, with Java 8. ./build/mvn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive -Phive-thriftserver -Pscala-2.11 clean package On 8 May 2017 at 23:18, Joseph Bradley wrote: > I'll work on resolving some of the ML QA blockers this we

Re: [VOTE] Apache Spark 2.2.0 (RC2)

2017-05-08 Thread Joseph Bradley
I'll work on resolving some of the ML QA blockers this week, but it'd be great to get help. *@committers & contributors who work on ML*, many of you have helped in the past, so please help take QA tasks wherever possible. (Thanks Yanbo & Felix for jumping in already.) Anyone is welcome to chip i