Is there a plan to support sql window functions? I will give an example of use: Let’s say we have login logs. What we want to do is for each user we would want to add the number of failed logins for each successful login. How would you do it with structured streaming? As this is currently not supported, is there a plan on how to support it in the future? Assaf.
From: Herman van Hövell tot Westerflier-2 [via Apache Spark Developers List] [mailto:ml-node+s1001551n19933...@n3.nabble.com] Sent: Thursday, November 17, 2016 1:27 PM To: Mendelson, Assaf Subject: Re: structured streaming and window functions What kind of window functions are we talking about? Structured streaming only supports time window aggregates, not the more general sql window function (sum(x) over (partition by ... order by ...)) aggregates. The basic idea is that you use incremental aggregation and store the aggregation buffer (not the end result) in a state store after each increment. When an new batch comes in, you perform aggregation on that batch, merge the result of that aggregation with the buffer in the state store, update the state store and return the new result. This is much harder than it sounds, because you need to maintain state in a fault tolerant way and you need to have some eviction policy (watermarks for instance) for aggregation buffers to prevent the state store from reaching an infinite size. On Thu, Nov 17, 2016 at 12:19 AM, assaf.mendelson <[hidden email]</user/SendEmail.jtp?type=node&node=19933&i=0>> wrote: Hi, I have been trying to figure out how structured streaming handles window functions efficiently. The portion I understand is that whenever new data arrived, it is grouped by the time and the aggregated data is added to the state. However, unlike operations like sum etc. window functions need the original data and can change when data arrives late. So if I understand correctly, this would mean that we would have to save the original data and rerun on it to calculate the window function every time new data arrives. Is this correct? Are there ways to go around this issue? Assaf. ________________________________ View this message in context: structured streaming and window functions<http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930.html> Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com. ________________________________ If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930p19933.html To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1...@n3.nabble.com<mailto:ml-node+s1001551n1...@n3.nabble.com> To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930p19934.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.