Hi Akash,
such errors might appear in large spark pipelines, the root cause is a 64kb
jvm limitation.
the reason that your job isn't failing at the end is due to spark fallback
- if code gen is failing, spark compiler will try to create the flow
without the code gen (less optimized)
if you do not w
gt; Thank you for your help. The following commands worked in terms of running
> multiple executors simultaneously. However, Spark repeats the 10 same
> jobs consecutively. It had been doing it before as well. The jobs are
> extracting data from Mssql. Why would it run the same job 10 ti
hi,
I'm not familiar with the Kinetica spark driver, but it seems that your job
has a single task which might indicate that you have a single partition in
the df
i would suggest to try to create your df with more partitions, this can be
done by adding the following options when reading the source:
Hi
if you're interested in stopping you're spark application externally, you
will probably need a way to communicate with the spark driver (which start
and holds a ref to the spark context)
this can be done by adding some code to the driver app, for example:
- you can expose a rest api that st
Hi,
it seems that you're missing the kafka-clients jar (and probably some other
dependencies as well)
how did you packaged you application jar? does it includes all the required
dependencies (as an uber jar)?
if it's not an uber jar you need to pass via the driver-class-path and the
executor-class-
n again in order to get better
parallelism
Example:
spark
.read
.csv("data")
.repartition(1)
.withColumn("rowid", monotonically_increasing_id())
.repartition(20)
.write.csv("output")
Regards
Eyal Zituny
On Sat, Mar 18, 2017 at 11:58 AM, Kazuaki Ishizaki
wrote