Hi,
I'm processing the json I have in a text file using DataFrames, however right
now I'm trying to figure out a way to access a certain value within the rows of
my data frame if I only know the field name and not the respective field
position in the schema.
I noticed that row.schema and row.dtypes give me information about the
auto-generate schema, but I cannot see a straigh forward patch for this, I'm
trying to create a PairRdd out of this
Is there any easy way to figure out the field position by it's field name (the
key it had in the json)?
so this
val sqlContext = new SQLContext(sc)
val rawIncRdd =
sc.textFile("hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
val df = sqlContext.jsonRDD(rawIncRdd)
df.foreach(line => println(line.getString(0)))
would turn into something like this
val sqlContext = new SQLContext(sc)
val rawIncRdd =
sc.textFile("hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
val df = sqlContext.jsonRDD(rawIncRdd)
df.foreach(line => println(line.getString("field_name")))
thanks for the advice