It's pretty clear that df.col(xpath) is looking for a column named xpath in your df, not executing an xpath over an XML document as you wish. Try constructing a UDF which applies your xpath query, and give that as the second argument to withColumn.
On Tue, Oct 4, 2016 at 4:35 PM, Jean Georges Perrin <j...@jgp.net> wrote: > Spark 2.0.0 > XML parser 0.4.0 > Java > > Hi, > > I am trying to create a new column in my data frame, based on a value of a > sub element. I have done that several time with JSON, but not very > successful in XML. > > (I know a world with less format would be easier :) ) > > Here is the code: > df.withColumn("FulfillmentOption1", df.col("//FulfillmentOption[1] > /text()")); > > And here is the error: > Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot > resolve column name "//FulfillmentOption[1]/text()" among (x, xx, xxx, > xxxx, a, b, FulfillmentOption, c, d, e, f, g); > at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply( > Dataset.scala:220) > at org.apache.spark.sql.Dataset$$anonfun$resolve$1.apply( > Dataset.scala:220) > ... > > The XPath is valid... > > Thanks! > > jg > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >