Re: Add row IDs column to data frame

2017-01-12 Thread ayan guha
Just in case you are more comfortable with SQL, row_number over () should also generate an unique id. On Thu, Jan 12, 2017 at 7:00 PM, akbar501 wrote: > The following are 2 different approaches to adding an id/index to RDDs and > 1 > approach to adding an index to a DataFrame. > > Add an index

Re: Add row IDs column to data frame

2017-01-12 Thread akbar501
The following are 2 different approaches to adding an id/index to RDDs and 1 approach to adding an index to a DataFrame. Add an index column to an RDD ```scala // RDD val dataRDD = sc.textFile("./README.md") // Add index then set index as key in map() transformation // Results in RDD[(Long, Stri

Re: Add row IDs column to data frame

2017-01-11 Thread akbar501
RDDs, DataFrames and Datasets are all immutable. So, you cannot edit any of these. However, the approach you should take is to call transformation functions on the RDD/DataFrame/Dataset. RDD transformation functions will return a new RDD, DataFrame transformations will return a new DataFrame and so

Fwd: Add row IDs column to data frame

2015-10-02 Thread Josh Levy-Kramer
Hi, Iv created a simple example using the withColumn method but throws an error. Try: val df = List( (1,1), (1,1), (1,2), (2,2) ).toDF("col1", "col2") val index_col = sqlContext.range( df.count() ).col("id") val df_with_index = df.withColumn("index", index_col) The error I get is: org.

Re: Add row IDs column to data frame

2015-04-09 Thread Bojan Kostic
lelize(1 to oldRDD.count().toInt) > //or (1 to 1 to oldRDD.count().toInt).toArray > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-colu

Re: Add row IDs column to data frame

2015-04-08 Thread barmaley
l colToAppend = sc.makeRDD(1 to oldRDD.count().toInt) //or sc.parallelize(1 to oldRDD.count().toInt) //or (1 to 1 to oldRDD.count().toInt).toArray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22430.html Sent from the Apac

Re: Add row IDs column to data frame

2015-04-08 Thread Bojan Kostic
ut is there > any workaround? > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385p22427.html > To start a new

Re: Add row IDs column to data frame

2015-04-08 Thread olegshirokikh
-IDs-column-to-data-frame-tp22385p22427.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Add row IDs column to data frame

2015-04-05 Thread Xiangrui Meng
textFile("path/file").toDF() >> val rowDF = sc.parallelize(1 to dataDF.count().toInt).toDF("ID") >> dataDF = dataDF.withColumn("ID", rowDF("ID")) >> >> Thanks >> >> >> >> -- >> View t

Re: Add row IDs column to data frame

2015-04-05 Thread Xiangrui Meng
D", rowDF("ID")) > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Add row IDs column to data frame

2015-04-05 Thread olegshirokikh
arallelize(1 to dataDF.count().toInt).toDF("ID") dataDF = dataDF.withColumn("ID", rowDF("ID")) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Add-row-IDs-column-to-data-frame-tp22385.html Sent from the Apache Spark