Hello,
I have a requirement where I need to group by multiple columns and
aggregate them not at same time .. I mean I have a structure which contains
accountid, some cols, order id . I need to calculate some scenarios like
account having multiple orders so group by account and aggregate will work
Dear all,
I wonder if there is a way to take the elementwise-product of 2 matrices
(RowMatrix, DistributedMatrix, ..) in pyspark?
I cannot find a good answer/API entry on the topic.
Thank you for all the help.
Best,
Simon
Dear all,
is there a way to take the elementwise-product of 2 matrices in pyspark,
e.g. RowMatrix, DistributedMatrix?
I cannot find a good answer/API entry?
Thanks for all the help.
Best,
Simon
-
To unsubscribe e-mail: user-un
--
André Garcia Carneiro
Software Engineer
(11)982907780
I did a repartition to 1 (hardcoded) before the keyBy and it ends in
1.2 minutes.
The questions remain open, because I don't want to harcode paralellism.
El vie., 8 de feb. de 2019 a la(s) 12:50, Pedro Tuero (tuerope...@gmail.com)
escribió:
> 128 is the default parallelism defined for the clu
128 is the default parallelism defined for the cluster.
The question now is why keyBy operation is using default parallelism
instead of the number of partition of the RDD created by the previous step
(5580).
Any clues?
El jue., 7 de feb. de 2019 a la(s) 15:30, Pedro Tuero (tuerope...@gmail.com)
es
Hi Noritaka,
I start clusters from Java API.
Clusters running on 5.16 have not manual configurations in the Emr console
Configuration tab, so I assume the value of this property should be the
default on 5.16.
I enabled maximize resource allocation because otherwise, the number of
cores automatical
Hi,
after upgrading from 2.3.2 to 2.4.0 we recognized a regression when using
posexplode() in conjunction with select of another struct fields.
Given a structure like this:
=
>>> df = (spark.range(1)
... .withColumn("my_arr", array(lit("1"), lit("2")))
... .wit