[Structured Streaming] Domain data refresh with flatMapGroupsWithState

2020-01-31 Thread Ashutosh Joshi
We have a structured streaming job that processes a stream of events. It needs to perform aggregation while maintaining state, for which we are using flatMapGroupsWithState. It also needs to load some domain data that needs to be refreshed periodically. To refresh domain data, we are using a solut

Re: Problems during upgrade 2.2.2 -> 2.4.4

2020-01-31 Thread bsikander
Thank you for your reply. Which resource manager has support for rolling update? YARN? Also where can I find this information in the documentation? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubs

Re: Possible to limit number of IPC retries on spark-submit?

2020-01-31 Thread Jeff Evans
Figured out the answer, eventually. The magic property name, in this case, is yarn.client.failover-max-attempts (prefixed with spark.hadoop. in the case of Spark, of course). For a full explanation, see the StackOverflow answer I just added. On Wed,

Re: Problems during upgrade 2.2.2 -> 2.4.4

2020-01-31 Thread Shixiong(Ryan) Zhu
The reason of this is Spark RPC and the persisted states of HA mode are both using Java serialization to serialize internal classes which don't have any compatibility guarantee. Best Regards, Ryan On Fri, Jan 31, 2020 at 9:08 AM Shixiong(Ryan) Zhu wrote: > Unfortunately, Spark standalone mode

Re: Problems during upgrade 2.2.2 -> 2.4.4

2020-01-31 Thread Shixiong(Ryan) Zhu
Unfortunately, Spark standalone mode doesn't support rolling update. All Spark components (master, worker, driver) must be updated to the same version. When using HA mode, the states persisted in zookeeper (or files if not using zookeeper) need to be cleaned because they are not compatible between