Re: How to retreive the value from sql.row by column name

Eric Bell Mon, 16 Feb 2015 11:06:52 -0800

I am just learning scala so I don't actually understand what your codesnippet is doing but thank you, I will learn more so I can figure it out.

I am new to all of this and still trying to make the mental shift fromnormal programming to distributed programming, but it seems to me thatthe row object would know its own schema object that it came from and beable to ask its schema to transform a name to a column number. Am Imissing something or is this just a matter of time constraints and thisone just hasn't gotten into the queue yet?

Baring that, do the schema classes provide methods for doing this? I'velooked and didn't see anything.

I've just discovered that the python implementation for SchemaRDD doesin fact allow for referencing by name and column. Why is this providedin the python implementation but not scala or java implementations?


Thanks,

--eric


On 02/16/2015 10:46 AM, Michael Armbrust wrote:

For efficiency the row objects don't contain the schema so you can'tget the column by name directly. I usually do a select followed bypattern matching. Something like the following:


caper.select('ran_id).map { case Row(ranId: String) => }

On Mon, Feb 16, 2015 at 8:54 AM, Eric Bell <e...@ericjbell.com<mailto:e...@ericjbell.com>> wrote:


    Is it possible to reference a column from a SchemaRDD using the
    column's name instead of its number?

    For example, let's say I've created a SchemaRDD from an avro file:

    val sqlContext = new SQLContext(sc)
    import sqlContext._
    val
    caper=sqlContext.avroFile("hdfs://localhost:9000/sma/raw_avro/caper")
    caper.registerTempTable("caper")

    scala> caper
    res20: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at
    SchemaRDD.scala:108
    == Query Plan ==
    == Physical Plan ==
    PhysicalRDD
    
[ADMDISP#0,age#1,AMBSURG#2,apptdt_skew#3,APPTSTAT#4,APPTTYPE#5,ASSGNDUR#6,CANCSTAT#7,CAPERSTAT#8,COMPLAINT#9,CPT_1#10,CPT_10#11,CPT_11#12,CPT_12#13,CPT_13#14,CPT_2#15,CPT_3#16,CPT_4#17,CPT_5#18,CPT_6#19,CPT_7#20,CPT_8#21,CPT_9#22,CPTDX_1#23,CPTDX_10#24,CPTDX_11#25,CPTDX_12#26,CPTDX_13#27,CPTDX_2#28,CPTDX_3#29,CPTDX_4#30,CPTDX_5#31,CPTDX_6#32,CPTDX_7#33,CPTDX_8#34,CPTDX_9#35,CPTMOD1_1#36,CPTMOD1_10#37,CPTMOD1_11#38,CPTMOD1_12#39,CPTMOD1_13#40,CPTMOD1_2#41,CPTMOD1_3#42,CPTMOD1_4#43,CPTMOD1_5#44,CPTMOD1_6#45,CPTMOD1_7#46,CPTMOD1_8#47,CPTMOD1_9#48,CPTMOD2_1#49,CPTMOD2_10#50,CPTMOD2_11#51,CPTMOD2_12#52,CPTMOD2_13#53,CPTMOD2_2#54,CPTMOD2_3#55,CPTMOD2_4#56,CPTMOD...
    scala>

    Now I want to access fields, and of course the normal thing to do
    is to use a field name, not a field number.

    scala> val kv = caper.map(r => (r.ran_id, r))
    <console>:23: error: value ran_id is not a member of
    org.apache.spark.sql.Row
           val kv = caper.map(r => (r.ran_id, r))

    How do I do this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>

Re: How to retreive the value from sql.row by column name

Reply via email to