strange behavior when I chain data frame transformations

Andy Davidson Fri, 13 May 2016 11:50:10 -0700

I am using spark-1.6.1.

I create a data frame from a very complicated JSON file. I would assume that
query planer would treat both version of my transformation chains the same
way.



// org.apache.spark.sql.AnalysisException: Cannot resolve column name "tag"
among (actor, body, generator, pip, id, inReplyTo, link, object, objectType,
postedTime, provider, retweetCount, twitter_entities, verb);

// DataFrame emptyDF = rawDF.selectExpr("*", ³pip.rules.tag")

// .filter(rawDF.col(tagCol).isNull());

DataFrame emptyDF1 = rawDF.selectExpr("*", ³pip.rules.tag");

DataFrame emptyDF =  emptyDF1.filter(emptyDF1.col(³tag").isNull());



Here is the schema for the gnip structure

 |-- pip: struct (nullable = true)

 |    |-- _profile: struct (nullable = true)

 |    |    |-- topics: array (nullable = true)

 |    |    |    |-- element: string (containsNull = true)

 |    |-- rules: array (nullable = true)

 |    |    |-- element: struct (containsNull = true)

 |    |    |    |-- tag: string (nullable = true)



Is this a bug ?



Andy

strange behavior when I chain data frame transformations

Reply via email to