Hi, Is there any work going on for removing spark dependency from the
Catalyst?
Thanks
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
To prevent this in the future you could set up something that checks if
each worker has the path. If a worker doesn't satisfy the criteria then
just mark the worker offline or restart the process automatically.It's
doable with a maintenance script on the jenkins master. When it fails like
thi
Disconnecting and reconnecting each Jenkins worker appears to have resolved
the PATH issue: in the System Info page for each worker, I now see a PATH
which includes Anaconda.
To restart the worker processes, I only needed to hit the "Disconnect"
button in the Jenkins master UI for each worker, wai
hello from the canary islands! ;)
i just saw this thread, and another one about a quick power loss at the
colo where our machines are hosted. the master is on UPS but the workers
aren't... and when they come back, the PATH variable specified in the
workers' configs get dropped and we see behavi
Also another thing to look at is if you guys have any kinda of nightly
cleanup scripts for these workers that completely nuke the conda
environments. If there is one maybe that's why some of them recover after
a while. I don't know enough about your infra right now to understand all
the things th
So, right now it looks like 2 and 6 are still broken, but 7 has recovered:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/SparkPullRequestBuilder/buildTimeTrend
What I am suggesting is to just perhaps modify the SparkPullRequestBuilder
configuration and run "which python" and t
Thanks, I actually don't have access to the machines or build configs to do
proper debugging on this. It looks like these workers are shared with
other build configurations like avocado and cannoli as well and really any
of the shared configs could be changing your JAVA_HOME and python
environme
Thank you for your response Anurag.
I am not sure if I get your point. Are you suggesting that UDF somehow
serializes not only reference to Dataset, but also all the data?
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
--
This is expected. You are not accessing the DataSet Dict when calling UDF
countPositiveSimilarity. The dict dataframe as it existed when udf was created
is encoded into udf. If you change dict later on the changes will not get
automatically picked up in UDF countPositiveSimilarity.
Sent from