On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan
wrote:
> We can start by getting a PR going perhaps, and start augmenting the
> integration testing to ensure that there are no surprises - with/without
> credentials, accessing GCS, S3 etc as well.
> When we get enough confidence and test covera
It seems we have two standard practices for resource distribution in place
here:
- the Spark way is that the application (Spark) distributes the resources
*during* app execution, and does this by exposing files/jars on an http
server on the driver (or pre-staged elsewhere), and executors downloadi
Thanks for this discussion everyone. It has been very useful in getting an
overall understanding here.
I think in general, consensus is that this change doesn't introduce
behavioral changes, and it's definitely an advantage to reuse the
constructs that Spark provides to us.
Moving on to a differen
On Wed, Jan 10, 2018 at 2:51 PM, Matt Cheah wrote:
> those sidecars may perform side effects that are undesirable if the main
> Spark application failed because dependencies weren’t available
If the contract is that the Spark driver pod does not have an init
container, and the driver handles its
With regards to separation of concerns, there’s a fringe use case here – if
more than one main container is on the pod, then none of them will run if the
init-containers fail. A user can have a Pod Preset that attaches more sidecar
containers to the driver and/or executors. In that case, those s
On Wed, Jan 10, 2018 at 2:30 PM, Yinan Li wrote:
> 1. Retries of init-containers are automatically supported by k8s through pod
> restart policies. For this point, sorry I'm not sure how spark-submit
> achieves this.
Great, add that feature to spark-submit, everybody benefits, not just k8s.
> 2.
> Sorry, but what are those again? So far all the benefits are already
> provided by spark-submit...
1. Retries of init-containers are automatically supported by k8s through
pod restart policies. For this point, sorry I'm not sure how spark-submit
achieves this.
2. The ability to use credentials t
On Wed, Jan 10, 2018 at 2:16 PM, Yinan Li wrote:
> but we can not rule out the benefits init-containers bring either.
Sorry, but what are those again? So far all the benefits are already
provided by spark-submit...
> Again, I would suggest we look at this more thoroughly post 2.3.
Actually, one
> 1500 less lines of code trump all of the arguments given so far for
> what the init container might be a good idea.
We can also reduce the #lines of code by simply refactoring the code in
such as way that a lot of code can be shared between configuration of the
main container and that of the ini
On Wed, Jan 10, 2018 at 2:00 PM, Yinan Li wrote:
> I want to re-iterate on one point, that the init-container achieves a clear
> separation between preparing an application and actually running the
> application. It's a guarantee provided by the K8s admission control and
> scheduling components th
I want to re-iterate on one point, that the init-container achieves a clear
separation between preparing an application and actually running the
application. It's a guarantee provided by the K8s admission control and
scheduling components that if the init-container fails, the main container
won't b
On Wed, Jan 10, 2018 at 1:47 PM, Matt Cheah wrote:
>> With a config value set by the submission code, like what I'm doing to
>> prevent client mode submission in my p.o.c.?
>
> The contract for what determines the appropriate scheduler backend to
> instantiate is then going to be different in Ku
> With a config value set by the submission code, like what I'm doing to
> prevent client mode submission in my p.o.c.?
The contract for what determines the appropriate scheduler backend to
instantiate is then going to be different in Kubernetes versus the other
cluster managers. The cluster ma
On Wed, Jan 10, 2018 at 1:33 PM, Matt Cheah wrote:
> If we use spark-submit in client mode from the driver container, how do we
> handle needing to switch between a cluster-mode scheduler backend and a
> client-mode scheduler backend in the future?
With a config value set by the submission code
If we use spark-submit in client mode from the driver container, how do we
handle needing to switch between a cluster-mode scheduler backend and a
client-mode scheduler backend in the future?
Something else re: client mode accessibility – if we make client mode
accessible to users even if it’s
On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah wrote:
> I’d imagine this is a reason why YARN hasn’t went with using spark-submit
> from the application master...
I wouldn't use YARN as a template to follow when writing a new
backend. A lot of the reason why the YARN backend works the way it
does i
A crucial point here is considering whether we want to have a separate
scheduler backend code path for client mode versus cluster mode. If we need
such a separation in the code paths, it would be difficult to make it possible
to run spark-submit in client mode from the driver container.
We disc
Hi, All.
Vectorized ORC Reader is now supported in Apache Spark 2.3.
https://issues.apache.org/jira/browse/SPARK-16060
It has been a long journey. From now, Spark can read ORC files faster
without feature penalty.
Thank you for all your support, especially Wenchen Fan.
It's done by two com
i just noticed we're starting to see the once-yearly rash of git timeouts
when building.
i'll be looking in to this today... i'm at our lab retreat, so my
attention will be divided during the day but i will report back here once i
have some more information.
in the meantime, if your jobs have a
On a side note, while it's great that you guys have meetings to
discuss things related to the project, it's general Apache practice to
discuss these things in the mailing list - or at the very list send
detailed info about what discussed in these meetings to the mailing
list. Not everybody can atte
In the class CachedKafkaConsumer.scala
https://github.com/apache/spark/blob/master/external/kafka-0-10/src/main/sca
la/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
what is the purpose of the following condition check in the method
get(offset: Long, timeout: Long): Consume
21 matches
Mail list logo