Re: Joining a RDD to a Dataframe

2016-05-08 Thread Ashish Dubey
Is there any reason you dont want to convert this - i dont think join b/w RDD and DF is supported. On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon wrote: > Hi, > > I have a RDD built during a spark streaming job and I'd like to join it to > a DataFrame (E/S input) to enrich it. > It seems that I

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Ashish Dubey
l.parquet.filterPushdown: true > spark.sql.parquet.mergeSchema: true > > Thanks, > J. > > On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey wrote: > >> How big is your file and can you also share the code snippet >> >> >> On Saturday, May 7, 2016, Johnny W. wro

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
gt; On May 8, 2016 5:55 PM, "Ashish Dubey" wrote: > > Brandon, > > how much memory are you giving to your executors - did you check if there > were dead executors in your application logs.. Most likely you require > higher memory for executors.. > > Ashish > >

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White wrote: > Hello all, > > I am running a Spark application

Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM, Hyukji

Re: How to verify if spark is using kryo serializer for shuffle

2016-05-07 Thread Ashish Dubey
Driver maintains the complete metadata of application ( scheduling of executor and maintaining the messaging to control the execution ) This code seems to be failing in that code path only. With that said there is Jvm overhead based on num of executors , stages and tasks in your app. Do you know yo

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-07 Thread Ashish Dubey
How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. wrote: > hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a stage with lots of sma