Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread ayan guha
Just as best practice, dataframe and datasets are preferred way, so try not to resort to rdd unless you absolutely have to... On Sun, 5 Mar 2017 at 7:10 pm, khwunchai jaengsawang wrote: > Hi Old-Scool, > > > For the first question, you can specify the number of partition in any > DataFrame by us

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-05 Thread khwunchai jaengsawang
Hi Old-Scool, For the first question, you can specify the number of partition in any DataFrame by using repartition(numPartitions: Int, partitionExprs: Column*). Example: val partitioned = data.repartition(numPartitions=10).cache() For your second question, you can transform your RDD in

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-04 Thread bryan . jeffrey
Rdd operation: rdd.map(x => (word, count)).reduceByKey(_+_) Get Outlook for Android On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School" wrote: Hi, I want to perform some simple transformations and check the execution time, under various configurations (e.g. number of