Hi There are a set pf finction which can be used with the construct Over (partition by col order by col).
You search for rank and window functions in spark documentation. On Mon, 1 Oct 2018 at 5:29 am, Riccardo Ferrari <ferra...@gmail.com> wrote: > Hi Dimitris, > > I believe the methods partitionBy > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.RDD.partitionBy> > and mapPartitions > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.RDD.mapPartitions> > are specific to RDDs while you're talking about DataFrames > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame>. > I guess you have few options including: > 1. use the Dataframe.rdd > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.rdd> > call and process the returned RDD. Please note the return type for this > call is and RDD of Row > 2. User the groupBy > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.groupBy> > from Dataframes and start from there, this may involved defining an udf or > leverage on the existing GroupedData > <https://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.GroupedData> > functions. > > It really depends on your use-case and your performance requirements. > HTH > > On Sun, Sep 30, 2018 at 8:31 PM dimitris plakas <dimitrisp...@gmail.com> > wrote: > >> Hello everyone, >> >> I am trying to split a dataframe on partitions and i want to apply a >> custom function on every partition. More precisely i have a dataframe like >> the one below >> >> Group_Id | Id | Points >> 1 | id1| Point1 >> 2 | id2| Point2 >> >> I want to have a partition for every Group_Id and apply on every >> partition a function defined by me. >> I have tried with partitionBy('Group_Id').mapPartitions() but i receive >> error. >> Could you please advice me how to do it? >> > -- Best Regards, Ayan Guha