Hi Everyone, I've been told, that accessing DataFrame object from UserDefinedFunction is not possible, but turns out that the code shown below works fine (taken from StackOverflow). Why is it so? Is it a bug, or is it expected?
Thanks in advance. case class Target(wordListOne: Seq[String], WordListTwo: Seq[String]) val targetData = Seq(Target(Seq("Spark", "Wrong", "Something"), Seq("Java", "Grape", "Banana")), Target(Seq("Java", "Scala"), Seq("Scala", "Banana")), Target(Seq(""), Seq("Grape", "Banana")), Target(Seq(""), Seq(""))) val targets = spark.createDataset(targetData) case class WordSimilarity(first: String, second: String, similarity: Double) val similarityData = Seq(WordSimilarity("Spark", "Java", 0.8), WordSimilarity("Scala", "Spark", 0.9), WordSimilarity("Java", "Scala", 0.9), WordSimilarity("Apple", "Grape", 0.66), WordSimilarity("Scala", "Apple", -0.1), WordSimilarity("Gine", "Spark", 0.1)) val dict = spark.createDataset(similarityData) val countPositiveSimilarity = udf[Long, Seq[String], Seq[String]]((a, b) => dict.filter( (($"first".isin(a: _*) && $"second".isin(b: _*)) || ($"first".isin(b: _*) && $"second".isin(a: _*))) && $"similarity" > 0.7 ).count ) val countDF = targets.withColumn("positive_count", countPositiveSimilarity($"wordListOne", $"wordListTwo")) -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org