Re: Adding a column to a SchemaRDD

2014-12-15 Thread Nathan Kronenfeld
Perfect, that's exactly what I was looking for. Thank you! On Mon, Dec 15, 2014 at 3:32 AM, Yanbo Liang wrote: > > Hi Nathan, > > #1 > > Spark SQL & DSL can satisfy your requirement. You can refer the following > code snippet: > > jdata.select(Star(Node), 'seven.getField("mod"), 'eleven.getField

Re: Adding a column to a SchemaRDD

2014-12-15 Thread Yanbo Liang
Hi Nathan, #1 Spark SQL & DSL can satisfy your requirement. You can refer the following code snippet: jdata.select(Star(Node), 'seven.getField("mod"), 'eleven.getField("mod")) You need to import org.apache.spark.sql.catalyst.analysis.Star in advance. #2 After you make the transform above, you

Re: Adding a column to a SchemaRDD

2014-12-14 Thread Tobias Pfeiffer
Nathan, On Fri, Dec 12, 2014 at 3:11 PM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > > I can see how to do it if can express the added values in SQL - just run > "SELECT *,valueCalculation AS newColumnName FROM table" > > I've been searching all over for how to do this if my added val

Re: Adding a column to a SchemaRDD

2014-12-12 Thread Nathan Kronenfeld
(1) I understand about immutability, that's why I said I wanted a new SchemaRDD. (2) I specfically asked for a non-SQL solution that takes a SchemaRDD, and results in a new SchemaRDD with one new function. (3) The DSL stuff is a big clue, but I can't find adequate documentation for it What I'm loo

Re: Adding a column to a SchemaRDD

2014-12-12 Thread Yanbo Liang
RDD is immutable so you can not modify it. If you want to modify some value or schema in RDD, using map to generate a new RDD. The following code for your reference: def add(a:Int,b:Int):Int = { a + b } val d1 = sc.parallelize(1 to 10).map { i => (i, i+1, i+2) } val d2 = d1.map { i => (i._1, i