You should be able to index columns directly either by index or column name
i.e.

from pyspark.sql.functions import max

rows = idDF2.select(max("col[id]")).collect()
firstRow = rows[0]

# by index
max = firstRow[0]

# by column name
max = firstRow["max(col[id])"]

On Tue, Mar 29, 2016 at 6:58 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> Hi Alexander
>
> Many thanks. I think the key was I needed to import that max function.
> Turns out you do not need to use col
> Df.select(max(“foo”)).show()
>
> To get the actual value of max you still need to write more code than I
> would expect. I wonder if there is a easier way to work with Rows?
>
> In [19]:
>
> from pyspark.sql.functions import max
>
> maxRow = idDF2.select(max("col[id]")).collect()
>
> max = maxRow[0].asDict()['max(col[id])']
>
> max
>
> Out[19]:
>
> 713912692155621376
>
>
> From: Alexander Krasnukhin <the.malk...@gmail.com>
> Date: Monday, March 28, 2016 at 5:55 PM
> To: Andrew Davidson <a...@santacruzintegration.com>
> Cc: "user @spark" <user@spark.apache.org>
> Subject: Re: looking for an easy to to find the max value of a column in
> a data frame
>
> e.g. select max value for column "foo":
>
> from pyspark.sql.functions import max, col
> df.select(max(col("foo"))).show()
>
> On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
>
>> I am using pyspark 1.6.1 and python3.
>>
>>
>> *Given:*
>>
>> idDF2 = idDF.select(idDF.id, idDF.col.id )
>> idDF2.printSchema()
>> idDF2.show()
>>
>> root
>>  |-- id: string (nullable = true)
>>  |-- col[id]: long (nullable = true)
>>
>> +----------+----------+
>> |        id|   col[id]|
>> +----------+----------+
>> |1008930924| 534494917|
>> |1008930924| 442237496|
>> |1008930924|  98069752|
>> |1008930924|2790311425|
>> |1008930924|3300869821|
>>
>>
>>
>> *I have to do a lot of work to get the max value*
>>
>>
>> rows = idDF2.select("col[id]").describe().collect()
>> hack = [s for s in rows if s.summary == 'max']
>> print(hack)
>> print(hack[0].summary)
>> print(type(hack[0]))
>> print(hack[0].asDict()['col[id]'])
>> maxStr = hack[0].asDict()['col[id]']
>> ttt = int(maxStr)
>> numDimensions = 1 + ttt
>> print(numDimensions)
>>
>>
>> Is there an easier way?
>>
>>
>> Kind regards
>>
>>
>> Andy
>>
>>
>
>
> --
> Regards,
> Alexander
>
>


-- 
Regards,
Alexander

Reply via email to