It seems interesting, however scalding seems to require be used outside of spark ?
Le lun. 18 déc. 2017 à 17:15, Anastasios Zouzias <zouz...@gmail.com> a écrit : > Hi Julien, > > I am not sure if my answer applies on the streaming part of your question. > However, in batch processing, if you want to perform multiple aggregations > over an RDD with a single pass, a common approach is to use multiple > aggregators (a.k.a. tuple monoids), see below an example from algebird: > > > https://github.com/twitter/scalding/wiki/Aggregation-using-Algebird-Aggregators#composing-aggregators > . > > Best, > Anastasios > > On Mon, Dec 18, 2017 at 10:38 AM, Julien CHAMP <jch...@tellmeplus.com> > wrote: > >> I've been looking for several solutions but I can't find something >> efficient to compute many window function efficiently ( optimized >> computation or efficient parallelism ) >> Am I the only one interested by this ? >> >> >> Regards, >> >> Julien >> > >> Le ven. 15 déc. 2017 à 21:34, Julien CHAMP <jch...@tellmeplus.com> a >> écrit : >> >>> May be I should consider something like impala ? >>> >>> Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a >>> écrit : >>> >>>> Hi Spark Community members ! >>>> >>>> I want to do several ( from 1 to 10) aggregate functions using window >>>> functions on something like 100 columns. >>>> >>>> Instead of doing several pass on the data to compute each aggregate >>>> function, is there a way to do this efficiently ? >>>> >>>> >>>> >>>> Currently it seems that doing >>>> >>>> >>>> val tw = >>>> Window >>>> .orderBy("date") >>>> .partitionBy("id") >>>> .rangeBetween(-8035200000L, 0) >>>> >>>> and then >>>> >>>> x >>>> .withColumn("agg1", max("col").over(tw)) >>>> .withColumn("agg2", min("col").over(tw)) >>>> .withColumn("aggX", avg("col").over(tw)) >>>> >>>> >>>> Is not really efficient :/ >>>> It seems that it iterates on the whole column for each aggregation ? Am >>>> I right ? >>>> >>>> Is there a way to compute all the required operations on a columns with >>>> a single pass ? >>>> Event better, to compute all the required operations on ALL columns >>>> with a single pass ? >>>> >>>> Thx for your Future[Answers] >>>> >>>> Julien >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> Julien CHAMP — Data Scientist >>>> >>>> >>>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >>>> **jch...@tellmeplus.com >>>> <jch...@tellmeplus.com>* >>>> >>>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >>>> <https://www.linkedin.com/in/julienchamp> >>>> >>>> TellMePlus S.A — Predictive Objects >>>> >>>> *Paris* : 7 rue des Pommerots, 78400 Chatou >>>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> >>>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >>>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> >>>> >>> -- >>> >>> >>> Julien CHAMP — Data Scientist >>> >>> >>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >>> **jch...@tellmeplus.com >>> <jch...@tellmeplus.com>* >>> >>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >>> <https://www.linkedin.com/in/julienchamp> >>> >>> TellMePlus S.A — Predictive Objects >>> >>> *Paris* : 7 rue des Pommerots, 78400 Chatou >>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> >>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> >>> >> -- >> >> >> Julien CHAMP — Data Scientist >> >> >> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >> **jch...@tellmeplus.com >> <jch...@tellmeplus.com>* >> >> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >> <https://www.linkedin.com/in/julienchamp> >> >> TellMePlus S.A — Predictive Objects >> >> *Paris* : 7 rue des Pommerots, 78400 Chatou >> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g> >> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g> >> >> >> Ce message peut contenir des informations confidentielles ou couvertes >> par le secret professionnel, à l’intention de son destinataire. Si vous >> n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en >> supprimer toute copie. >> This email may contain confidential and/or privileged information for the >> intended recipient. If you are not the intended recipient, please contact >> the sender and delete all copies. >> >> >> <http://www.tellmeplus.com/assets/emailing/banner.html> >> > > > > -- > -- Anastasios Zouzias > <a...@zurich.ibm.com> > -- Julien CHAMP — Data Scientist *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : **jch...@tellmeplus.com <jch...@tellmeplus.com>* *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* <https://www.linkedin.com/in/julienchamp> TellMePlus S.A — Predictive Objects *Paris* : 7 rue des Pommerots, 78400 Chatou *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière -- Ce message peut contenir des informations confidentielles ou couvertes par le secret professionnel, à l’intention de son destinataire. Si vous n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer toute copie. This email may contain confidential and/or privileged information for the intended recipient. If you are not the intended recipient, please contact the sender and delete all copies. -- <http://www.tellmeplus.com/assets/emailing/banner.html>