Is there someone focused on streaming work these days who would want to shepherd this?
On Sat, Feb 18, 2023 at 5:02 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Thank you for considering me, but may I ask what makes you think to put me > there, Mich? I'm curious about your reason. > > > I have put dongjoon.hyun as a shepherd. > > BTW, unfortunately, I cannot help you with that due to my on-going > personal stuff. I'll adjust the JIRA first. > > Thanks, > Dongjoon. > > > On Sat, Feb 18, 2023 at 10:51 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> https://issues.apache.org/jira/browse/SPARK-42485 >> >> >> Spark Structured Streaming is a very useful tool in dealing with Event >> Driven Architecture. In an Event Driven Architecture, there is generally a >> main loop that listens for events and then triggers a call-back function >> when one of those events is detected. In a streaming application the >> application waits to receive the source messages in a set interval or >> whenever they happen and reacts accordingly. >> >> There are occasions that you may want to stop the Spark program >> gracefully. Gracefully meaning that Spark application handles the last >> streaming message completely and terminates the application. This is >> different from invoking interrupts such as CTRL-C. >> >> Of course one can terminate the process based on the following >> >> 1. query.awaitTermination() # Waits for the termination of this >> query, with stop() or with error >> >> >> 1. query.awaitTermination(timeoutMs) # Returns true if this query is >> terminated within the timeout in milliseconds. >> >> So the first one above waits until an interrupt signal is received. The >> second one will count the timeout and will exit when the timeout in >> milliseconds is reached. >> >> The issue is that one needs to predict how long the streaming job needs >> to run. Clearly any interrupt at the terminal or OS level (kill process), >> may end up the processing terminated without a proper completion of the >> streaming process. >> >> I have devised a method that allows one to terminate the spark >> application internally after processing the last received message. Within >> say 2 seconds of the confirmation of shutdown, the process will invoke a >> graceful shutdown. >> >> This new feature proposes a solution to handle the topic doing work for >> the message being processed gracefully, wait for it to complete and >> shutdown the streaming process for a given topic without loss of data or >> orphaned transactions >> >> >> I have put dongjoon.hyun as a shepherd. Kindly advise me if that is the >> correct approach. >> >> JIRA ticket https://issues.apache.org/jira/browse/SPARK-42485 >> >> SPIP doc: TBC >> >> Discussion thread: in >> >> https://lists.apache.org/list.html?dev@spark.apache.org >> >> >> Thanks. >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau