I am using spark-1.6.1.
I create a data frame from a very complicated JSON file. I would assume that
query planer would treat both version of my transformation chains the same
way.
// org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag"
among (actor, body, generator, pip, id, inReplyTo, link, object, objectType,
postedTime, provider, retweetCount, twitter_entities, verb);
// DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag")
// .filter(rawDF.col(tagCol).isNull());
DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag");
DataFrame emptyDF = emptyDF1.filter(emptyDF1.col(³tag").isNull());
Here is the schema for the gnip structure
|-- pip: struct (nullable = true)
| |-- _profile: struct (nullable = true)
| | |-- topics: array (nullable = true)
| | | |-- element: string (containsNull = true)
| |-- rules: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- tag: string (nullable = true)
Is this a bug ?
Andy