date:20190208

Multiple column aggregations

2019-02-08 Thread Sonu Jyotshna

Hello, I have a requirement where I need to group by multiple columns and aggregate them not at same time .. I mean I have a structure which contains accountid, some cols, order id . I need to calculate some scenarios like account having multiple orders so group by account and aggregate will work

Pyspark elementwise matrix multiplication

2019-02-08 Thread Simon Dirmeier

Dear all, I wonder if there is a way to take the elementwise-product of 2 matrices (RowMatrix, DistributedMatrix, ..) in pyspark? I cannot find a good answer/API entry on the topic. Thank you for all the help. Best, Simon

Element-wise multiplication in Pyspark

2019-02-08 Thread Simon Dirmeier

Dear all, is there a way to take the elementwise-product of 2 matrices in pyspark, e.g. RowMatrix, DistributedMatrix? I cannot find a good answer/API entry? Thanks for all the help. Best, Simon - To unsubscribe e-mail: user-un

(send this email to subscribe)

2019-02-08 Thread Andre Carneiro

-- André Garcia Carneiro Software Engineer (11)982907780

Re: Spark 2.4 partitions and tasks

2019-02-08 Thread Pedro Tuero

I did a repartition to 1 (hardcoded) before the keyBy and it ends in 1.2 minutes. The questions remain open, because I don't want to harcode paralellism. El vie., 8 de feb. de 2019 a la(s) 12:50, Pedro Tuero (tuerope...@gmail.com) escribió: > 128 is the default parallelism defined for the clu

Re: Spark 2.4 partitions and tasks

2019-02-08 Thread Pedro Tuero

128 is the default parallelism defined for the cluster. The question now is why keyBy operation is using default parallelism instead of the number of partition of the RDD created by the previous step (5580). Any clues? El jue., 7 de feb. de 2019 a la(s) 15:30, Pedro Tuero (tuerope...@gmail.com) es

Re: Aws

2019-02-08 Thread Pedro Tuero

Hi Noritaka, I start clusters from Java API. Clusters running on 5.16 have not manual configurations in the Emr console Configuration tab, so I assume the value of this property should be the default on 5.16. I enabled maximize resource allocation because otherwise, the number of cores automatical

Spark 2.4 Regression with posexplode and structs

2019-02-08 Thread Andreas Weise

Hi, after upgrading from 2.3.2 to 2.4.0 we recognized a regression when using posexplode() in conjunction with select of another struct fields. Given a structure like this: = >>> df = (spark.range(1) ... .withColumn("my_arr", array(lit("1"), lit("2"))) ... .wit

Multiple column aggregations

Pyspark elementwise matrix multiplication

Element-wise multiplication in Pyspark

(send this email to subscribe)

Re: Spark 2.4 partitions and tasks

Re: Spark 2.4 partitions and tasks

Re: Aws

Spark 2.4 Regression with posexplode and structs

8 matches

Site Navigation

Mail list logo

Footer information