It seems interesting, however scalding seems to require be used outside of
spark ?


Le lun. 18 déc. 2017 à 17:15, Anastasios Zouzias <zouz...@gmail.com> a
écrit :

> Hi Julien,
>
> I am not sure if my answer applies on the streaming part of your question.
> However, in batch processing, if you want to perform multiple aggregations
> over an RDD with a single pass, a common approach is to use multiple
> aggregators (a.k.a. tuple monoids), see below an example from algebird:
>
>
> https://github.com/twitter/scalding/wiki/Aggregation-using-Algebird-Aggregators#composing-aggregators
> .
>
> Best,
> Anastasios
>
> On Mon, Dec 18, 2017 at 10:38 AM, Julien CHAMP <jch...@tellmeplus.com>
> wrote:
>
>> I've been looking for several solutions but I can't find something
>> efficient to compute many window function efficiently ( optimized
>> computation or efficient parallelism )
>> Am I the only one interested by this ?
>>
>>
>> Regards,
>>
>> Julien
>>
>
>> Le ven. 15 déc. 2017 à 21:34, Julien CHAMP <jch...@tellmeplus.com> a
>> écrit :
>>
>>> May be I should consider something like impala ?
>>>
>>> Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a
>>> écrit :
>>>
>>>> Hi Spark Community members !
>>>>
>>>> I want to do several ( from 1 to 10) aggregate functions using window
>>>> functions on something like 100 columns.
>>>>
>>>> Instead of doing several pass on the data to compute each aggregate
>>>> function, is there a way to do this efficiently ?
>>>>
>>>>
>>>>
>>>> Currently it seems that doing
>>>>
>>>>
>>>> val tw =
>>>>   Window
>>>>     .orderBy("date")
>>>>     .partitionBy("id")
>>>>     .rangeBetween(-8035200000L, 0)
>>>>
>>>> and then
>>>>
>>>> x
>>>>    .withColumn("agg1", max("col").over(tw))
>>>>    .withColumn("agg2", min("col").over(tw))
>>>>    .withColumn("aggX", avg("col").over(tw))
>>>>
>>>>
>>>> Is not really efficient :/
>>>> It seems that it iterates on the whole column for each aggregation ? Am
>>>> I right ?
>>>>
>>>> Is there a way to compute all the required operations on a columns with
>>>> a single pass ?
>>>> Event better, to compute all the required operations on ALL columns
>>>> with a single pass ?
>>>>
>>>> Thx for your Future[Answers]
>>>>
>>>> Julien
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Julien CHAMP — Data Scientist
>>>>
>>>>
>>>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
>>>> **jch...@tellmeplus.com
>>>> <jch...@tellmeplus.com>*
>>>>
>>>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
>>>> <https://www.linkedin.com/in/julienchamp>
>>>>
>>>> TellMePlus S.A — Predictive Objects
>>>>
>>>> *Paris* : 7 rue des Pommerots, 78400 Chatou
>>>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
>>>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>>>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>>>>
>>> --
>>>
>>>
>>> Julien CHAMP — Data Scientist
>>>
>>>
>>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
>>> **jch...@tellmeplus.com
>>> <jch...@tellmeplus.com>*
>>>
>>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
>>> <https://www.linkedin.com/in/julienchamp>
>>>
>>> TellMePlus S.A — Predictive Objects
>>>
>>> *Paris* : 7 rue des Pommerots, 78400 Chatou
>>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
>>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>>>
>> --
>>
>>
>> Julien CHAMP — Data Scientist
>>
>>
>> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
>> **jch...@tellmeplus.com
>> <jch...@tellmeplus.com>*
>>
>> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
>> <https://www.linkedin.com/in/julienchamp>
>>
>> TellMePlus S.A — Predictive Objects
>>
>> *Paris* : 7 rue des Pommerots, 78400 Chatou
>> <https://maps.google.com/?q=7+rue+des+Pommerots,+78400+Chatou&entry=gmail&source=g>
>> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>> <https://maps.google.com/?q=51+impasse+des+%C3%A9glantiers,+34980+St+Cl%C3%A9ment+de+Rivi%C3%A8re&entry=gmail&source=g>
>>
>>
>> Ce message peut contenir des informations confidentielles ou couvertes
>> par le secret professionnel, à l’intention de son destinataire. Si vous
>> n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en
>> supprimer toute copie.
>> This email may contain confidential and/or privileged information for the
>> intended recipient. If you are not the intended recipient, please contact
>> the sender and delete all copies.
>>
>>
>> <http://www.tellmeplus.com/assets/emailing/banner.html>
>>
>
>
>
> --
> -- Anastasios Zouzias
> <a...@zurich.ibm.com>
>
-- 


Julien CHAMP — Data Scientist


*Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email :
**jch...@tellmeplus.com
<jch...@tellmeplus.com>*

*Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
<https://www.linkedin.com/in/julienchamp>

TellMePlus S.A — Predictive Objects

*Paris* : 7 rue des Pommerots, 78400 Chatou
*Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière

-- 

Ce message peut contenir des informations confidentielles ou couvertes par 
le secret professionnel, à l’intention de son destinataire. Si vous n’en 
êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer 
toute copie.
This email may contain confidential and/or privileged information for the 
intended recipient. If you are not the intended recipient, please contact 
the sender and delete all copies.


-- 
 <http://www.tellmeplus.com/assets/emailing/banner.html>

Reply via email to