Marcelo, I can see that we might be misunderstanding what this change
implies for performance and some of the deeper implementation details here.
We have a community meeting tomorrow (at 10am PT), and we'll be sure to
explore this idea in detail, and understand the implications and then get
back to
One thing I forgot in my previous e-mail is that if a resource is
remote I'm pretty sure (but haven't double checked the code) that
executors will download it directly from the remote server, and not
from the driver. So there, distributed download without an init
container.
On Tue, Jan 9, 2018 at
Marcelo, to address the points you raised:
> k8s uses docker images. Users can create docker images with all the
dependencies their app needs, and submit the app using that image.
The entire reason why we support additional methods of localizing
dependencies than baking everything into docker ima
A few reasons to prefer init-containers come to mind:
Firstly, if we used spark-submit from within the driver container, the
executors wouldn’t receive the jars on their class loader until after the
executor starts because the executor has to launch first before localizing
resources. It is c
The init-container is required for use with the resource staging server (
https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md#resource-staging-server).
The resource staging server (RSS) is a spark-on-k8s component running in a
Kubernetes cluster for stagi
On Tue, Jan 9, 2018 at 6:25 PM, Nicholas Chammas
wrote:
> You can argue that executors downloading from
> external servers would be faster than downloading from the driver, but
> I’m not sure I’d agree - it can go both ways.
>
> On a tangentially related note, one of the main reasons spark-ec2 is
I’d like to point out the output of “git show —stat” for that diff:
29 files changed, 130 insertions(+), 1560 deletions(-)
+1 for that and generally for the idea of leveraging spark-submit.
You can argue that executors downloading from
external servers would be faster than downloading from the dr
We were running a change in our fork which was similar to this at one point
early on. My biggest concerns off the top of my head with this change would
be localization performance with large numbers of executors, and what we
lose in terms of separation of concerns. Init containers are a standard
co
Hello,
Me again. I was playing some more with the kubernetes backend and the
whole init container thing seemed unnecessary to me.
Currently it's used to download remote jars and files, mount the
volume into the driver / executor, and place those jars in the
classpath / move the files to the worki
That source repo is at https://github.com/palantir/spark/ with artifacts
published to Palantir's bintray at
https://palantir.bintray.com/releases/org/apache/spark/ If you're seeing
any of them in Maven Central please flag, as that's a mistake!
Andrew
On Tue, Jan 9, 2018 at 10:10 AM, Sean Owen w
Just to follow up -- those are actually in a Palantir repo, not Central.
Deploying to Central would be uncourteous, but this approach is legitimate
and how it has to work for vendors to release distros of Spark etc.
On Tue, Jan 9, 2018 at 11:43 AM Nan Zhu wrote:
> Hi, all
>
> Out of curious, I j
nvm
On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu wrote:
> Hi, all
>
> Out of curious, I just found a bunch of Palantir release under
> org.apache.spark in maven central (https://mvnrepository.com/
> artifact/org.apache.spark/spark-core_2.11)?
>
> Is it on purpose?
>
> Best,
>
> Nan
>
>
>
Hi, all
Out of curious, I just found a bunch of Palantir release under
org.apache.spark in maven central (
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)?
Is it on purpose?
Best,
Nan
If we can actually get our acts together and have integration tests in
Jenkins (perhaps not run on every commit but can be run weekly or
pre-release smoke tests), that'd be great. Then it relies less on
contributors manually testing.
On Tue, Jan 9, 2018 at 8:09 AM, Timothy Chen wrote:
> 2) will
SPARK-15463 (https://issues.apache.org/jira/browse/SPARK-15463) was implemented
in 2.2.0 and it allows you to take a Dataset[String] with raw CSV/JSON and
convert it into a Dataframe. Should we have a way to go the other way too?
Provide a way to convert Dataframe to DataSet[String]
___
15 matches
Mail list logo