Hi,
I have a scenario in which I am caching my RDDs for future use. But I observed
that when I use my RDD, complete DAG is re-executed and RDD gets created again.
How can I avoid this scenario and make sure that RDD.cacheDataSet() caches RDD
every time.
Regards,
Jasbir Singh
__
Hi,
I have a scenario in which I am caching my RDDs for future use. But I observed
that when I use my RDD, complete DAG is re-executed and RDD gets created again.
How can I avoid this scenario and make sure that RDD.cacheDataSet() caches RDD
every time.
Regards,
Jasbir Singh
__
So I have a PR to add this to the release process documentation - I'm
waiting on the necessary approvals from PyPi folks before I merge that
incase anything changes as a result of the discussion (like uploading to
the legacy host or something). As for conda-forge, it's not something we
need to do,
Hi Holden,
Thanks for working on it! Do we have a JIRA ticket to track this? We should
make it part of the release process in all the following Spark releases, and
it will be great if we have a JIRA ticket to record the detailed steps of
doing this and even automate it.
Thanks,
Wenchen
--
View
Just a heads up I'm in the process of trying to upload the latest PySpark
to PyPi (we are blocked on a tickets with the PyPi folks around file size
but I'll follow up with them).
Relatedly PySpark is available in Conda-forge, currently 2.1.0 and there is
a PR to update to 2.1.1 in process.
Happy
As for build and tests, all pass on both macOS 10 and Ubuntu 16.10, with
Java 8.
./build/mvn -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn -Phive
-Phive-thriftserver -Pscala-2.11 clean package
On 8 May 2017 at 23:18, Joseph Bradley wrote:
> I'll work on resolving some of the ML QA blockers this we
I'll work on resolving some of the ML QA blockers this week, but it'd be
great to get help. *@committers & contributors who work on ML*, many of
you have helped in the past, so please help take QA tasks wherever
possible. (Thanks Yanbo & Felix for jumping in already.) Anyone is
welcome to chip i