Is there any reason you dont want to convert this - i dont think join b/w
RDD and DF is supported.
On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon
wrote:
> Hi,
>
> I have a RDD built during a spark streaming job and I'd like to join it to
> a DataFrame (E/S input) to enrich it.
> It seems that I
l.parquet.filterPushdown: true
> spark.sql.parquet.mergeSchema: true
>
> Thanks,
> J.
>
> On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey wrote:
>
>> How big is your file and can you also share the code snippet
>>
>>
>> On Saturday, May 7, 2016, Johnny W. wro
gt; On May 8, 2016 5:55 PM, "Ashish Dubey" wrote:
>
> Brandon,
>
> how much memory are you giving to your executors - did you check if there
> were dead executors in your application logs.. Most likely you require
> higher memory for executors..
>
> Ashish
>
>
Brandon,
how much memory are you giving to your executors - did you check if there
were dead executors in your application logs.. Most likely you require
higher memory for executors..
Ashish
On Sun, May 8, 2016 at 1:01 PM, Brandon White
wrote:
> Hello all,
>
> I am running a Spark application
This limit is due to underlying inputFormat implementation. you can always
write your own inputFormat and then use spark newAPIHadoopFile api to pass
your inputFormat class path. You will have to place the jar file in /lib
location on all the nodes..
Ashish
On Sun, May 8, 2016 at 4:02 PM, Hyukji
Driver maintains the complete metadata of application ( scheduling of
executor and maintaining the messaging to control the execution )
This code seems to be failing in that code path only. With that said there
is Jvm overhead based on num of executors , stages and tasks in your app.
Do you know yo
How big is your file and can you also share the code snippet
On Saturday, May 7, 2016, Johnny W. wrote:
> hi spark-user,
>
> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
> dataframe from a parquet data source with a single parquet file, it yields
> a stage with lots of sma