Thanks for your answers. I know Kafka's model but I would rather like to avoid having to setup both Spark and Kafka to handle my use case. I wonder if it might be possible to handle that using Spark's standard streams ?
-- Arnaud Bailly twitter: abailly skype: arnaud-bailly linkedin: http://fr.linkedin.com/in/arnaudbailly/ On Fri, Jul 8, 2016 at 12:00 AM, Andy Davidson < a...@santacruzintegration.com> wrote: > Kafka has an interesting model that might be applicable. > > You can think of kafka as enabling a queue system. Writes are called > producers, and readers are called consumers. The server is called a broker. > A “topic” is like a named queue. > > Producer are independent. They can write to a “topic” at will. Consumers > (I.e. You nested aggregates) need to be independent of each other and the > broker. The broker receives data from produces stores it using memory and > disk. Consumer read from broker and maintain the cursor. Because the client > maintains the cursor one consumer can not impact other produces and > consumers. > > I would think the tricky part for spark would to know when the data can be > deleted. In the Kakfa world each topic is allowed to define a TTL SLA. I.e. > The consumer must read the data with in a limited of window of time. > > Andy > > From: Michael Armbrust <mich...@databricks.com> > Date: Thursday, July 7, 2016 at 2:31 PM > To: Arnaud Bailly <arnaud.oq...@gmail.com> > Cc: Sivakumaran S <siva.kuma...@me.com>, "user @spark" < > user@spark.apache.org> > Subject: Re: Multiple aggregations over streaming dataframes > > We are planning to address this issue in the future. > > At a high level, we'll have to add a delta mode so that updates can be > communicated from one operator to the next. > > On Thu, Jul 7, 2016 at 8:59 AM, Arnaud Bailly <arnaud.oq...@gmail.com> > wrote: > >> Indeed. But nested aggregation does not work with Structured Streaming, >> that's the point. I would like to know if there is workaround, or what's >> the plan regarding this feature which seems to me quite useful. If the >> implementation is not overtly complex and it is just a matter of manpower, >> I am fine with devoting some time to it. >> >> >> >> -- >> Arnaud Bailly >> >> twitter: abailly >> skype: arnaud-bailly >> linkedin: http://fr.linkedin.com/in/arnaudbailly/ >> >> On Thu, Jul 7, 2016 at 2:17 PM, Sivakumaran S <siva.kuma...@me.com> >> wrote: >> >>> Arnauld, >>> >>> You could aggregate the first table and then merge it with the second >>> table (assuming that they are similarly structured) and then carry out the >>> second aggregation. Unless the data is very large, I don’t see why you >>> should persist it to disk. IMO, nested aggregation is more elegant and >>> readable than a complex single stage. >>> >>> Regards, >>> >>> Sivakumaran >>> >>> >>> >>> On 07-Jul-2016, at 1:06 PM, Arnaud Bailly <arnaud.oq...@gmail.com> >>> wrote: >>> >>> It's aggregation at multiple levels in a query: first do some >>> aggregation on one tavle, then join with another table and do a second >>> aggregation. I could probably rewrite the query in such a way that it does >>> aggregation in one pass but that would obfuscate the purpose of the various >>> stages. >>> Le 7 juil. 2016 12:55, "Sivakumaran S" <siva.kuma...@me.com> a écrit : >>> >>>> Hi Arnauld, >>>> >>>> Sorry for the doubt, but what exactly is multiple aggregation? What is >>>> the use case? >>>> >>>> Regards, >>>> >>>> Sivakumaran >>>> >>>> >>>> On 07-Jul-2016, at 11:18 AM, Arnaud Bailly <arnaud.oq...@gmail.com> >>>> wrote: >>>> >>>> Hello, >>>> >>>> I understand multiple aggregations over streaming dataframes is not >>>> currently supported in Spark 2.0. Is there a workaround? Out of the top of >>>> my head I could think of having a two stage approach: >>>> - first query writes output to disk/memory using "complete" mode >>>> - second query reads from this output >>>> >>>> Does this makes sense? >>>> >>>> Furthermore, I would like to understand what are the technical hurdles >>>> that are preventing Spark SQL from implementing multiple aggregation right >>>> now? >>>> >>>> Thanks, >>>> -- >>>> Arnaud Bailly >>>> >>>> twitter: abailly >>>> skype: arnaud-bailly >>>> linkedin: http://fr.linkedin.com/in/arnaudbailly/ >>>> >>>> >>>> >>> >> >