Hi,
I built apache-spark-on-k8s from source on Ubuntu 16.04 and it got built
without errors. Next, I wanted to create docker images, so as explained at
https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html I
used sbin/build-push-docker-images.sh to create those. While using th
Are you building on the fork or on the official release now? I built v2.3.0
from source w/out issue. One thing I noticed is that I needed to run the
build-image command from the bin which was placed in dist/ as opposed to
the one in the repo (as that's how it copies the necessary targets).
(Failed
A carry-over from the apache-spark-on-k8s project, it would be useful to
have a configurable restart policy for submitted jobs with the Kubernetes
resource manager. See the following issues:
https://github.com/apache-spark-on-k8s/spark/issues/133
https://github.com/apache-spark-on-k8s/spark/issues
How would Spark determine whether or not to apply a recommendation - a cost
threshold? And yes, it would be good to flesh out what information we get
from Spark in the datasource when providing these
recommendations/requirements - I could see statistics and the existing
outputPartitioning/Ordering
We discussed this early on in our fork and I think we should have this in a
JIRA and discuss it further. It's something we want to address in the
future.
One proposed method is using a StatefulSet of size 1 for the driver. This
ensures recovery but at the same time takes away from the completion
s
As Lucas said, those directories are generated and copied when you run a
full maven build with the -Pkubernetes flag specified (or use instructions
in
https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution
).
Also, using the Kubernetes integration in the main Ap
I think the difference is py4j is a public library whereas the R backend is
specific to SparkR.
Can you elaborate what you need JVMObjectTracker for? We have provided R
convenient APIs to call into JVM: sparkR.callJMethod for example
_
From: Jeremy Liu
Sent: Tuesday
If you need the functionality I would recommend you just copying the code
over to your project and use it that way.
On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung
wrote:
> I think the difference is py4j is a public library whereas the R backend
> is specific to SparkR.
>
> Can you elaborate what y
Thanks for starting this discussion.
When I was troubleshooting Spark on K8s, I often faced a need to turn on
debug messages on the driver and executor pods of my jobs, which would be
possible if I somehow put the right log4j.properties file inside the pods.
I know I can build custom Docker images
How would Spark determine whether or not to apply a recommendation - a cost
threshold?
Spark would always apply the required clustering and sort order because
they are required by the data source. It is reasonable for a source to
reject data that isn’t properly prepared. For example, data must be
>
> Spark would always apply the required clustering and sort order because
> they are required by the data source. It is reasonable for a source to
> reject data that isn’t properly prepared. For example, data must be written
> to HTable files with keys in order or else the files are invalid. Sort
For added color, one thing that I may want to consider as a data source
implementer is the cost / benefit of applying a particular clustering. For
example, a dataset with low cardinality in the clustering key could benefit
greatly from clustering on that key before writing to Cassandra since
Cassan
bq. this shuffle could outweigh the benefits of the organized data if the
cardinality is lower.
I wonder if you meant higher in place of the last word above.
Cheers
On Wed, Mar 28, 2018 at 7:50 PM, Russell Spitzer
wrote:
> For added color, one thing that I may want to consider as a data source
Ah yeah sorry I got a bit mixed up.
On Wed, Mar 28, 2018 at 7:54 PM Ted Yu wrote:
> bq. this shuffle could outweigh the benefits of the organized data if the
> cardinality is lower.
>
> I wonder if you meant higher in place of the last word above.
>
> Cheers
>
> On Wed, Mar 28, 2018 at 7:50 PM,
14 matches
Mail list logo