Thank you for your response Anurag.
I am not sure if I get your point. Are you suggesting that UDF somehow
serializes not only reference to Dataset, but also all the data?
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
--
This is expected. You are not accessing the DataSet Dict when calling UDF
countPositiveSimilarity. The dict dataframe as it existed when udf was created
is encoded into udf. If you change dict later on the changes will not get
automatically picked up in UDF countPositiveSimilarity.
Sent from