BTW we merged this today: https://github.com/apache/spark/pull/4640
This should allow us in the future to address column by name in a Row.
On Mon, Feb 16, 2015 at 11:39 AM, Michael Armbrust
wrote:
> I can unpack the code snippet a bit:
>
> caper.select('ran_id) is the same as saying "SELECT ra
I can unpack the code snippet a bit:
caper.select('ran_id) is the same as saying "SELECT ran_id FROM table" in
SQL. Its always a good idea to explicitly request the columns you need
right before using them. That way you are tolerant of any changes to the
schema that might happen upstream.
The n
I am just learning scala so I don't actually understand what your code
snippet is doing but thank you, I will learn more so I can figure it out.
I am new to all of this and still trying to make the mental shift from
normal programming to distributed programming, but it seems to me that
the row
For efficiency the row objects don't contain the schema so you can't get
the column by name directly. I usually do a select followed by pattern
matching. Something like the following:
caper.select('ran_id).map { case Row(ranId: String) => }
On Mon, Feb 16, 2015 at 8:54 AM, Eric Bell wrote:
> I
Is it possible to reference a column from a SchemaRDD using the column's
name instead of its number?
For example, let's say I've created a SchemaRDD from an avro file:
val sqlContext = new SQLContext(sc)
import sqlContext._
val caper=sqlContext.avroFile("hdfs://localhost:9000/sma/raw_avro/caper