Just in case you are more comfortable with SQL,
row_number over ()
should also generate an unique id.
On Thu, Jan 12, 2017 at 7:00 PM, akbar501 wrote:
> The following are 2 different approaches to adding an id/index to RDDs and
> 1
> approach to adding an index to a DataFrame.
>
> Add an index
The following are 2 different approaches to adding an id/index to RDDs and 1
approach to adding an index to a DataFrame.
Add an index column to an RDD
```scala
// RDD
val dataRDD = sc.textFile("./README.md")
// Add index then set index as key in map() transformation
// Results in RDD[(Long, Stri
RDDs, DataFrames and Datasets are all immutable. So, you cannot edit any of
these. However, the approach you should take is to call transformation
functions on the RDD/DataFrame/Dataset. RDD transformation functions will
return a new RDD, DataFrame transformations will return a new DataFrame and
so
Hi,
Iv created a simple example using the withColumn method but throws an
error. Try:
val df = List(
(1,1),
(1,1),
(1,2),
(2,2)
).toDF("col1", "col2")
val index_col = sqlContext.range( df.count() ).col("id")
val df_with_index = df.withColumn("index", index_col)
The error I get is:
org.
lelize(1 to oldRDD.count().toInt)
> //or (1 to 1 to oldRDD.count().toInt).toArray
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-colu
l colToAppend = sc.makeRDD(1 to oldRDD.count().toInt)
//or sc.parallelize(1 to oldRDD.count().toInt)
//or (1 to 1 to oldRDD.count().toInt).toArray
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html
Sent from the Apac
ut is there
> any workaround?
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22427.html
> To start a new
-IDs-column-to-data-frame-tp22385p22427.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
textFile("path/file").toDF()
>> val rowDF = sc.parallelize(1 to dataDF.count().toInt).toDF("ID")
>> dataDF = dataDF.withColumn("ID", rowDF("ID"))
>>
>> Thanks
>>
>>
>>
>> --
>> View t
D", rowDF("ID"))
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
arallelize(1 to dataDF.count().toInt).toDF("ID")
dataDF = dataDF.withColumn("ID", rowDF("ID"))
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html
Sent from the Apache Spark
11 matches
Mail list logo