Hi, Thanks!!
this works, but I also need mean :) I am finding way.
Regards.
2016-08-16 5:30 GMT-05:00 ayan guha :
> Here is a more generic way of doing this:
>
> from pyspark.sql import Row
> df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x:
> Row(numbers=x)).toDF()
> df.show()
> from p
Here is a more generic way of doing this:
from pyspark.sql import Row
df = sc.parallelize([[1,2,3,4],[10,20,30]]).map(lambda x:
Row(numbers=x)).toDF()
df.show()
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType
u = udf(lambda c: sum(c), IntegerType())
df1 = df.withCol
Assuming you know the number of elements in the list, this should work:
df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) +
df["_1"].getItem(2))
Mike
On Mon, Aug 15, 2016 at 12:02 PM, Javier Rey wrote:
> Hi everyone,
>
> I have one dataframe with one column this column is an arr