Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-17 Thread Eyal Zituny
Hi Akash, such errors might appear in large spark pipelines, the root cause is a 64kb jvm limitation. the reason that your job isn't failing at the end is due to spark fallback - if code gen is failing, spark compiler will try to create the flow without the code gen (less optimized) if you do not w

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Eyal Zituny
gt; Thank you for your help. The following commands worked in terms of running > multiple executors simultaneously. However, Spark repeats the 10 same > jobs consecutively. It had been doing it before as well. The jobs are > extracting data from Mssql. Why would it run the same job 10 ti

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Eyal Zituny
hi, I'm not familiar with the Kinetica spark driver, but it seems that your job has a single task which might indicate that you have a single partition in the df i would suggest to try to create your df with more partitions, this can be done by adding the following options when reading the source:

Re: Apache Spark - Structured Streaming graceful shutdown

2017-12-27 Thread Eyal Zituny
Hi if you're interested in stopping you're spark application externally, you will probably need a way to communicate with the spark driver (which start and holds a ref to the spark context) this can be done by adding some code to the driver app, for example: - you can expose a rest api that st

Re: Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

2017-12-27 Thread Eyal Zituny
Hi, it seems that you're missing the kafka-clients jar (and probably some other dependencies as well) how did you packaged you application jar? does it includes all the required dependencies (as an uber jar)? if it's not an uber jar you need to pass via the driver-class-path and the executor-class-

Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-19 Thread Eyal Zituny
n again in order to get better parallelism Example: spark .read .csv("data") .repartition(1) .withColumn("rowid", monotonically_increasing_id()) .repartition(20) .write.csv("output") Regards Eyal Zituny On Sat, Mar 18, 2017 at 11:58 AM, Kazuaki Ishizaki wrote