I need to write PySpark logic equivalent to what flatMapGroupsWithState does in Scala/Java. To be precise, I need to take an incoming stream of records, group them by an arbitrary attribute, and feed each group a record at at time to a separate instance of a user-defined (so 'black-box') Python callable, and stream out its output.
1. Is there a way to do that in Python using Structured Streaming? 2. How can I find out when flatMapGroupsWithState is coming to the Python Spark API? 3. Or can I only do something like that by using updateStateByKey(), and do I therefore have to use DStreams API instead of the Structured Streaming API (which I'd like to avoid)? Thanks a lot! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org