Assaf, I think what you are describing is actually sessionizing, by user, where a session is ended by a successful login event. On each session, you want to count number of failed login events. If so, this is tracked by https://issues.apache.org/jira/browse/SPARK-10816 (didn't start yet)
Ofir Manor Co-Founder & CTO | Equalum Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io On Thu, Nov 17, 2016 at 2:52 PM, assaf.mendelson <assaf.mendel...@rsa.com> wrote: > Is there a plan to support sql window functions? > > I will give an example of use: Let’s say we have login logs. What we want > to do is for each user we would want to add the number of failed logins for > each successful login. How would you do it with structured streaming? > > As this is currently not supported, is there a plan on how to support it > in the future? > > Assaf. > > > > *From:* Herman van Hövell tot Westerflier-2 [via Apache Spark Developers > List] [mailto:ml-node+[hidden email] > <http:///user/SendEmail.jtp?type=node&node=19934&i=0>] > *Sent:* Thursday, November 17, 2016 1:27 PM > *To:* Mendelson, Assaf > *Subject:* Re: structured streaming and window functions > > > > What kind of window functions are we talking about? Structured streaming > only supports time window aggregates, not the more general sql window > function (sum(x) over (partition by ... order by ...)) aggregates. > > > > The basic idea is that you use incremental aggregation and store the > aggregation buffer (not the end result) in a state store after each > increment. When an new batch comes in, you perform aggregation on that > batch, merge the result of that aggregation with the buffer in the state > store, update the state store and return the new result. > > > > This is much harder than it sounds, because you need to maintain state in > a fault tolerant way and you need to have some eviction policy (watermarks > for instance) for aggregation buffers to prevent the state store from > reaching an infinite size. > > > > On Thu, Nov 17, 2016 at 12:19 AM, assaf.mendelson <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=19933&i=0>> wrote: > > Hi, > > I have been trying to figure out how structured streaming handles window > functions efficiently. > > The portion I understand is that whenever new data arrived, it is grouped > by the time and the aggregated data is added to the state. > > However, unlike operations like sum etc. window functions need the > original data and can change when data arrives late. > > So if I understand correctly, this would mean that we would have to save > the original data and rerun on it to calculate the window function every > time new data arrives. > > Is this correct? Are there ways to go around this issue? > > > > Assaf. > > > ------------------------------ > > View this message in context: structured streaming and window functions > <http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > > > > > ------------------------------ > > *If you reply to this email, your message will be added to the discussion > below:* > > http://apache-spark-developers-list.1001551.n3.nabble.com/structured- > streaming-and-window-functions-tp19930p19933.html > > To start a new topic under Apache Spark Developers List, email [hidden > email] <http:///user/SendEmail.jtp?type=node&node=19934&i=1> > To unsubscribe from Apache Spark Developers List, click here. > NAML > <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > ------------------------------ > View this message in context: RE: structured streaming and window > functions > <http://apache-spark-developers-list.1001551.n3.nabble.com/structured-streaming-and-window-functions-tp19930p19934.html> > > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. >