May be I should consider something like impala ?

Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a écrit :

> Hi Spark Community members !
>
> I want to do several ( from 1 to 10) aggregate functions using window
> functions on something like 100 columns.
>
> Instead of doing several pass on the data to compute each aggregate
> function, is there a way to do this efficiently ?
>
>
>
> Currently it seems that doing
>
>
> val tw =
>   Window
>     .orderBy("date")
>     .partitionBy("id")
>     .rangeBetween(-8035200000L, 0)
>
> and then
>
> x
>    .withColumn("agg1", max("col").over(tw))
>    .withColumn("agg2", min("col").over(tw))
>    .withColumn("aggX", avg("col").over(tw))
>
>
> Is not really efficient :/
> It seems that it iterates on the whole column for each aggregation ? Am I
> right ?
>
> Is there a way to compute all the required operations on a columns with a
> single pass ?
> Event better, to compute all the required operations on ALL columns with a
> single pass ?
>
> Thx for your Future[Answers]
>
> Julien
>
>
>
>
>
> --
>
>
> Julien CHAMP — Data Scientist
>
>
> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : 
> **jch...@tellmeplus.com
> <jch...@tellmeplus.com>*
>
> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
> <https://www.linkedin.com/in/julienchamp>
>
> TellMePlus S.A — Predictive Objects
>
> *Paris* : 7 rue des Pommerots, 78400 Chatou
> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>
-- 


Julien CHAMP — Data Scientist


*Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email :
**jch...@tellmeplus.com
<jch...@tellmeplus.com>*

*Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
<https://www.linkedin.com/in/julienchamp>

TellMePlus S.A — Predictive Objects

*Paris* : 7 rue des Pommerots, 78400 Chatou
*Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière

-- 

Ce message peut contenir des informations confidentielles ou couvertes par 
le secret professionnel, à l’intention de son destinataire. Si vous n’en 
êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer 
toute copie.
This email may contain confidential and/or privileged information for the 
intended recipient. If you are not the intended recipient, please contact 
the sender and delete all copies.


-- 
 <http://www.tellmeplus.com/assets/emailing/banner.html>

Reply via email to