For efficiency the row objects don't contain the schema so you can't get the column by name directly. I usually do a select followed by pattern matching. Something like the following:
caper.select('ran_id).map { case Row(ranId: String) => } On Mon, Feb 16, 2015 at 8:54 AM, Eric Bell <e...@ericjbell.com> wrote: > Is it possible to reference a column from a SchemaRDD using the column's > name instead of its number? > > For example, let's say I've created a SchemaRDD from an avro file: > > val sqlContext = new SQLContext(sc) > import sqlContext._ > val caper=sqlContext.avroFile("hdfs://localhost:9000/sma/raw_avro/caper") > caper.registerTempTable("caper") > > scala> caper > res20: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at > SchemaRDD.scala:108 > == Query Plan == > == Physical Plan == > PhysicalRDD [ADMDISP#0,age#1,AMBSURG#2,apptdt_skew#3,APPTSTAT#4, > APPTTYPE#5,ASSGNDUR#6,CANCSTAT#7,CAPERSTAT#8,COMPLAINT#9,CPT_1#10,CPT_10# > 11,CPT_11#12,CPT_12#13,CPT_13#14,CPT_2#15,CPT_3#16,CPT_4#17, > CPT_5#18,CPT_6#19,CPT_7#20,CPT_8#21,CPT_9#22,CPTDX_1#23, > CPTDX_10#24,CPTDX_11#25,CPTDX_12#26,CPTDX_13#27,CPTDX_2#28, > CPTDX_3#29,CPTDX_4#30,CPTDX_5#31,CPTDX_6#32,CPTDX_7#33, > CPTDX_8#34,CPTDX_9#35,CPTMOD1_1#36,CPTMOD1_10#37,CPTMOD1_11# > 38,CPTMOD1_12#39,CPTMOD1_13#40,CPTMOD1_2#41,CPTMOD1_3#42, > CPTMOD1_4#43,CPTMOD1_5#44,CPTMOD1_6#45,CPTMOD1_7#46, > CPTMOD1_8#47,CPTMOD1_9#48,CPTMOD2_1#49,CPTMOD2_10#50, > CPTMOD2_11#51,CPTMOD2_12#52,CPTMOD2_13#53,CPTMOD2_2#54, > CPTMOD2_3#55,CPTMOD2_4#56,CPTMOD... > scala> > > Now I want to access fields, and of course the normal thing to do is to > use a field name, not a field number. > > scala> val kv = caper.map(r => (r.ran_id, r)) > <console>:23: error: value ran_id is not a member of > org.apache.spark.sql.Row > val kv = caper.map(r => (r.ran_id, r)) > > How do I do this? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >