I suggest you to use `monotonicallyIncreasingId` which is high efficient. But note that the ID it generated will not be consecutive.
On Fri, Sep 29, 2017 at 3:21 PM, Kanagha Kumar <kpra...@salesforce.com> wrote: > Thanks for the response. > I can use either row_number() or monotonicallyIncreasingId to generate > uniqueIds as in https://hadoopist.wordpress.com/2016/05/24/ > generate-unique-ids-for-each-rows-in-a-spark-dataframe/ > > I'm looking for a java example to use that to replicate a single row n > times by appending a rownum column generated as above or using explode > function. > > Ex: > > ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx)); > > columnEx needs to be of type array inorder for explode to work. > > Any suggestions are helpful. > Thanks > > > On Thu, Sep 28, 2017 at 7:21 PM, ayan guha <guha.a...@gmail.com> wrote: > >> How about using row number for primary key? >> >> Select row_number() over (), * from table >> >> On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar <kpra...@salesforce.com> >> wrote: >> >>> Hi, >>> >>> I'm trying to replicate a single row from a dataset n times and create a >>> new dataset from it. But, while replicating I need a column's value to be >>> changed for each replication since it would be end up as the primary key >>> when stored finally. >>> >>> Looked at the following reference:https://stackoverflo >>> w.com/questions/40397740/replicate-spark-row-n-times >>> >>> import org.apache.spark.sql.functions._ >>> val result = singleRowDF >>> .withColumn("dummy", explode(array((1 until 100).map(lit): _*))) >>> .selectExpr(singleRowDF.columns: _*) >>> >>> How can I create a column from an array of values in Java and pass it to >>> explode function? Suggestions are helpful. >>> >>> >>> Thanks >>> Kanagha >>> >> -- >> Best Regards, >> Ayan Guha >> > >