Re: Replicating a row n times

Weichen Xu Fri, 29 Sep 2017 02:50:10 -0700

I suggest you to use `monotonicallyIncreasingId` which is high efficient.
But note that the ID it generated will not be consecutive.


On Fri, Sep 29, 2017 at 3:21 PM, Kanagha Kumar <kpra...@salesforce.com>
wrote:

> Thanks for the response.
> I can use either row_number() or monotonicallyIncreasingId to generate
> uniqueIds as in https://hadoopist.wordpress.com/2016/05/24/
> generate-unique-ids-for-each-rows-in-a-spark-dataframe/
>
> I'm looking for a java example to use that to replicate a single row n
> times by appending a rownum column generated as above or using explode
> function.
>
> Ex:
>
> ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx));
>
> columnEx needs to be of type array inorder for explode to work.
>
> Any suggestions are helpful.
> Thanks
>
>
> On Thu, Sep 28, 2017 at 7:21 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> How about using row number for primary key?
>>
>> Select row_number() over (), * from table
>>
>> On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar <kpra...@salesforce.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to replicate a single row from a dataset n times and create a
>>> new dataset from it. But, while replicating I need a column's value to be
>>> changed for each replication since it would be end up as the primary key
>>> when stored finally.
>>>
>>> Looked at the following reference:https://stackoverflo
>>> w.com/questions/40397740/replicate-spark-row-n-times
>>>
>>> import org.apache.spark.sql.functions._
>>> val result = singleRowDF
>>>   .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
>>>   .selectExpr(singleRowDF.columns: _*)
>>>
>>> How can I create a column from an array of values in Java and pass it to
>>> explode function? Suggestions are helpful.
>>>
>>>
>>> Thanks
>>> Kanagha
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: Replicating a row n times

Reply via email to