Re: Convert each partition of RDD to Dataframe

Manjunath Shetty H Thu, 27 Feb 2020 05:54:29 -0800

Hi Vinodh,

Thanks for the quick response. Didn't got what you meant exactly, any reference 
or snippet  will be helpful.


To explain the problem more,

  *   I have 10 partitions , each partition loads the data from different table 
and different SQL shard.
  *   Most of the partitions will have different schema.
  *   Before persisting the data i want to do some column level manipulation 
using data frame.

So thats why i want to create 10 (based on partitions ) dataframes that maps to 
10 different table/shard from a RDD.

Regards
Manjunath
________________________________
From: Charles vinodh <mig.flan...@gmail.com>
Sent: Thursday, February 27, 2020 7:04 PM
To: manjunathshe...@live.com <manjunathshe...@live.com>
Cc: user <user@spark.apache.org>
Subject: Re: Convert each partition of RDD to Dataframe

Just split the single rdd into multiple individual rdds using a filter 
operation and then convert each individual rdds to it's respective dataframe..

On Thu, Feb 27, 2020, 7:29 AM Manjunath Shetty H 
<manjunathshe...@live.com<mailto:manjunathshe...@live.com>> wrote:

Hello All,


In spark i am creating the custom partitions with Custom RDD, each partition 
will have different schema. Now in the transformation step we need to get the 
schema and run some Dataframe SQL queries per partition, because each partition 
data has different schema.

How to get the Dataframe's per partition of a RDD?.

As of now i am doing foreachPartition on RDD and converting Iterable<Row> to 
List and converting that to Dataframe. But the problem is converting Iterable 
to List will bring all the data to memory and it might crash the process.

Is there any known way to do this ? or is there any way to handle Custom 
Partitions in Dataframes instead of using RDD ?

I am using Spark version 1.6.2.

Any pointers would be helpful. Thanks in advance

Re: Convert each partition of RDD to Dataframe

Reply via email to