Just in case you are more comfortable with SQL, row_number over ()
should also generate an unique id. On Thu, Jan 12, 2017 at 7:00 PM, akbar501 <akbar...@gmail.com> wrote: > The following are 2 different approaches to adding an id/index to RDDs and > 1 > approach to adding an index to a DataFrame. > > Add an index column to an RDD > > > ```scala > // RDD > val dataRDD = sc.textFile("./README.md") > // Add index then set index as key in map() transformation > // Results in RDD[(Long, String)] > val indexedRDD = dataRDD.zipWithIndex().map(pair => (pair._2, pair._1)) > ``` > > Add a unique id column to an RDD > > > ```scala > // RDD > val dataRDD = sc.textFile("./README.md") > // Add unique id then set id as key in map() transformation > // Results in RDD[(Long, String)] > val indexedRDD = dataRDD.zipWithUniqueId().map(pair => (pair._2, pair._1)) > indexedRDD.collect > ``` > > Add an index column to a DataFrame > > > Note: You could use a similar approach with a Dataset. > > ```scala > import spark.implicits._ > import org.apache.spark.sql.functions.monotonicallyIncreasingId > > val dataDF = spark.read.textFile("./README.md") > val indexedDF = dataDF.withColumn("id", monotonically_increasing_id) > indexedDF.select($"id", $"value").show > ``` > > > > ----- > Delixus.com - Spark Consulting > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Append-column-to-Data-Frame-or-RDD- > tp22385p28300.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha