I am just learning scala so I don't actually understand what your code snippet is doing but thank you, I will learn more so I can figure it out.

I am new to all of this and still trying to make the mental shift from normal programming to distributed programming, but it seems to me that the row object would know its own schema object that it came from and be able to ask its schema to transform a name to a column number. Am I missing something or is this just a matter of time constraints and this one just hasn't gotten into the queue yet?

Baring that, do the schema classes provide methods for doing this? I've looked and didn't see anything.

I've just discovered that the python implementation for SchemaRDD does in fact allow for referencing by name and column. Why is this provided in the python implementation but not scala or java implementations?

Thanks,

--eric


On 02/16/2015 10:46 AM, Michael Armbrust wrote:
For efficiency the row objects don't contain the schema so you can't get the column by name directly. I usually do a select followed by pattern matching. Something like the following:

caper.select('ran_id).map { case Row(ranId: String) => }

On Mon, Feb 16, 2015 at 8:54 AM, Eric Bell <e...@ericjbell.com <mailto:e...@ericjbell.com>> wrote:

    Is it possible to reference a column from a SchemaRDD using the
    column's name instead of its number?

    For example, let's say I've created a SchemaRDD from an avro file:

    val sqlContext = new SQLContext(sc)
    import sqlContext._
    val
    caper=sqlContext.avroFile("hdfs://localhost:9000/sma/raw_avro/caper")
    caper.registerTempTable("caper")

    scala> caper
    res20: org.apache.spark.sql.SchemaRDD = SchemaRDD[0] at RDD at
    SchemaRDD.scala:108
    == Query Plan ==
    == Physical Plan ==
    PhysicalRDD
    
[ADMDISP#0,age#1,AMBSURG#2,apptdt_skew#3,APPTSTAT#4,APPTTYPE#5,ASSGNDUR#6,CANCSTAT#7,CAPERSTAT#8,COMPLAINT#9,CPT_1#10,CPT_10#11,CPT_11#12,CPT_12#13,CPT_13#14,CPT_2#15,CPT_3#16,CPT_4#17,CPT_5#18,CPT_6#19,CPT_7#20,CPT_8#21,CPT_9#22,CPTDX_1#23,CPTDX_10#24,CPTDX_11#25,CPTDX_12#26,CPTDX_13#27,CPTDX_2#28,CPTDX_3#29,CPTDX_4#30,CPTDX_5#31,CPTDX_6#32,CPTDX_7#33,CPTDX_8#34,CPTDX_9#35,CPTMOD1_1#36,CPTMOD1_10#37,CPTMOD1_11#38,CPTMOD1_12#39,CPTMOD1_13#40,CPTMOD1_2#41,CPTMOD1_3#42,CPTMOD1_4#43,CPTMOD1_5#44,CPTMOD1_6#45,CPTMOD1_7#46,CPTMOD1_8#47,CPTMOD1_9#48,CPTMOD2_1#49,CPTMOD2_10#50,CPTMOD2_11#51,CPTMOD2_12#52,CPTMOD2_13#53,CPTMOD2_2#54,CPTMOD2_3#55,CPTMOD2_4#56,CPTMOD...
    scala>

    Now I want to access fields, and of course the normal thing to do
    is to use a field name, not a field number.

    scala> val kv = caper.map(r => (r.ran_id, r))
    <console>:23: error: value ran_id is not a member of
    org.apache.spark.sql.Row
           val kv = caper.map(r => (r.ran_id, r))

    How do I do this?

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>



Reply via email to