Hi all, I've already opened a bug on Jira some days ago [1] but I'm starting thinking this is not the correct way to go since I haven't got any news about it yet.
Let me try to explain it briefly: with pyspark, trying to cogroup two input files with different schemas lead (nondeterministically) to some wrong behaviour: the object coming from the first input will have the fields of the second one (or vice-versa); the important fact is that the data in the row is actually correct, what's wrong is the content of the __FIELDS__ on the rows. Attached to the issue I posted a small snippet to reproduce the issue (which is a gist [2]). Does this happen to others as well? Is it a known issue? Am I doing anything wrong? Thank you all, [1]: https://issues.apache.org/jira/browse/SPARK-6677 [2]: https://gist.github.com/armisael/e08bb4567d0a11efe2db -- Dott. Stefano Parmesan Backend Web Developer and Data Lover ~ SpazioDati s.r.l. Via Adriano Olivetti, 13 – 4th floor "Le Albere" district – 38122 Trento – Italy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issue-with-pyspark-1-3-0-sql-package-and-rows-tp22405.html Sent from the Apache Spark User List mailing list archive at Nabble.com.