Just as best practice, dataframe and datasets are preferred way, so try not
to resort to rdd unless you absolutely have to...
On Sun, 5 Mar 2017 at 7:10 pm, khwunchai jaengsawang
wrote:
> Hi Old-Scool,
>
>
> For the first question, you can specify the number of partition in any
> DataFrame by us
Hi Old-Scool,
For the first question, you can specify the number of partition in any
DataFrame by using repartition(numPartitions: Int, partitionExprs: Column*).
Example:
val partitioned = data.repartition(numPartitions=10).cache()
For your second question, you can transform your RDD in
Rdd operation:
rdd.map(x => (word, count)).reduceByKey(_+_)
Get Outlook for Android
On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School"
wrote:
Hi,
I want to perform some simple transformations and check the execution time,
under various configurations (e.g. number of