For efficiency the row objects don't contain the schema so you can't
get the column by name directly. I usually do a select followed by
pattern matching. Something like the following:
caper.select('ran_id).map { case Row(ranId: String) => }
On Mon, Feb 16, 2015 at 8:54 AM, Eric Bell <e...@ericjbell.com
<mailto:e...@ericjbell.com>> wrote:
Is it possible to reference a column from a SchemaRDD using the
column's name instead of its number?
For example, let's say I've created a SchemaRDD from an avro file:
val sqlContext = new SQLContext(sc)
import sqlContext._
val
caper=sqlContext.avroFile("hdfs://localhost:9000/sma/raw_avro/caper")
caper.registerTempTable("caper")
scala> caper
res20: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at
SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
PhysicalRDD
[ADMDISP#0,age#1,AMBSURG#2,apptdt_skew#3,APPTSTAT#4,APPTTYPE#5,ASSGNDUR#6,CANCSTAT#7,CAPERSTAT#8,COMPLAINT#9,CPT_1#10,CPT_10#11,CPT_11#12,CPT_12#13,CPT_13#14,CPT_2#15,CPT_3#16,CPT_4#17,CPT_5#18,CPT_6#19,CPT_7#20,CPT_8#21,CPT_9#22,CPTDX_1#23,CPTDX_10#24,CPTDX_11#25,CPTDX_12#26,CPTDX_13#27,CPTDX_2#28,CPTDX_3#29,CPTDX_4#30,CPTDX_5#31,CPTDX_6#32,CPTDX_7#33,CPTDX_8#34,CPTDX_9#35,CPTMOD1_1#36,CPTMOD1_10#37,CPTMOD1_11#38,CPTMOD1_12#39,CPTMOD1_13#40,CPTMOD1_2#41,CPTMOD1_3#42,CPTMOD1_4#43,CPTMOD1_5#44,CPTMOD1_6#45,CPTMOD1_7#46,CPTMOD1_8#47,CPTMOD1_9#48,CPTMOD2_1#49,CPTMOD2_10#50,CPTMOD2_11#51,CPTMOD2_12#52,CPTMOD2_13#53,CPTMOD2_2#54,CPTMOD2_3#55,CPTMOD2_4#56,CPTMOD...
scala>
Now I want to access fields, and of course the normal thing to do
is to use a field name, not a field number.
scala> val kv = caper.map(r => (r.ran_id, r))
<console>:23: error: value ran_id is not a member of
org.apache.spark.sql.Row
val kv = caper.map(r => (r.ran_id, r))
How do I do this?
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>