Gil, 

I’d say that DataFrame is a result of transformation of any other RDD. Your 
input RDD might contains strings and numbers. But as a result of transformation 
you end up with RDD that contains GenericRowWithSchema, which is what DataFrame 
actually is. So, I’d say that DataFrame is just sort of wrapper around simple 
RDD, which provides some additional and pretty useful stuff.

Hope, this helps.
 
On 14 Jul 2015, at 11:59, Gil Vernik <g...@il.ibm.com> wrote:

> I see that most recent code doesn't has RDDApi anymore. 
> But i still would like to understand the logic of partitions of DataFrame. 
> Does DataFrame has it's own partitions and is sort of RDD by itself, or it 
> depends on the partitions of the underline RDD that was used to load the 
> data? 
> 
> For example, if i create DataFrame from HadoopRDD - does it means that 
> DataFrame has the same partitions as HadoopRDD? 
> 
> Thanks 
> Gil. 
> 
> 
> 
> 
> From:        Gil Vernik/Haifa/IBM@IBMIL 
> To:        Dev <dev@spark.apache.org> 
> Date:        12/07/2015 13:06 
> Subject:        question related partitions of the DataFrame 
> 
> 
> 
> Hi, 
> 
> DataFrame extends RDDApi, that provides RDD like methods. 
> My question is, does DataFrame is sort of  stand alone RDD with it’s own 
> partitions or it depends on the underlying RDD that was used to load the data 
> into its partitions? It's written that DataFrame has ability to scale from 
> kilobytes of data on a single laptop to petabytes on a large cluster, but i 
> don't understand if the partitions of data frame are independent of the 
> partitions of the data source that was used to load the data. 
> 
> So assume theoretically that i used external DataSource API and wrote code  
> that load 1GB of data into single partition. Then I map this DataSource to 
> DataFrame and perform some SQL that returns all the records. Will this  
> DataFrame also has one partition in memory or Spark somehow will divide this 
> DataFrame into various partitions? If so, how it will be divide it into 
> partitions? By size? (can someone point me to the code to see some example)? 
> 
> 
> Thanks, 
> Gil. 
> 

Eugene Morozov
fathers...@list.ru




Reply via email to