Hi I have the following code where I use mapPartitions on RDD but then I need
to convert it into DataFrame so why do I need to convert DataFrame into RDD
and back into DataFrame for just calling mapPartitions why can I call it
directly on DataFrame?
sourceFrame.toJavaRDD().mapPartitions(new
FlatMapFunction<Iterator<Row>,Row>() {
@Override
public Iterable<Row> call(Iterable<Row> rowIterator) throws Exception {
List rowAsList = new ArrayList<>();
while(rowIterator.hasNext()) {
Row row = rowIterator.next();
rowAsList = iterate(JavaConversions.seqAsJavaList(row.toSeq()));
Row updatedRow = RowFactory.create(rowAsList.toArray());
rowAsList.add(updatedRow);
}
return rowAsList;
}
When I see method signature it
is.mapPartitions(scala.Function1<Iterator<Row>,Iterator<R>> f,ClassTag<R>
evidence$5)
How to I map above code into dataframe.mapPartitions please guide I am new
to Spark.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-call-mapPartitions-on-DataFrame-tp25791.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]