You should be able to index columns directly either by index or column name i.e.
from pyspark.sql.functions import max rows = idDF2.select(max("col[id]")).collect() firstRow = rows[0] # by index max = firstRow[0] # by column name max = firstRow["max(col[id])"] On Tue, Mar 29, 2016 at 6:58 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > Hi Alexander > > Many thanks. I think the key was I needed to import that max function. > Turns out you do not need to use col > Df.select(max(“foo”)).show() > > To get the actual value of max you still need to write more code than I > would expect. I wonder if there is a easier way to work with Rows? > > In [19]: > > from pyspark.sql.functions import max > > maxRow = idDF2.select(max("col[id]")).collect() > > max = maxRow[0].asDict()['max(col[id])'] > > max > > Out[19]: > > 713912692155621376 > > > From: Alexander Krasnukhin <the.malk...@gmail.com> > Date: Monday, March 28, 2016 at 5:55 PM > To: Andrew Davidson <a...@santacruzintegration.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: looking for an easy to to find the max value of a column in > a data frame > > e.g. select max value for column "foo": > > from pyspark.sql.functions import max, col > df.select(max(col("foo"))).show() > > On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> I am using pyspark 1.6.1 and python3. >> >> >> *Given:* >> >> idDF2 = idDF.select(idDF.id, idDF.col.id ) >> idDF2.printSchema() >> idDF2.show() >> >> root >> |-- id: string (nullable = true) >> |-- col[id]: long (nullable = true) >> >> +----------+----------+ >> | id| col[id]| >> +----------+----------+ >> |1008930924| 534494917| >> |1008930924| 442237496| >> |1008930924| 98069752| >> |1008930924|2790311425| >> |1008930924|3300869821| >> >> >> >> *I have to do a lot of work to get the max value* >> >> >> rows = idDF2.select("col[id]").describe().collect() >> hack = [s for s in rows if s.summary == 'max'] >> print(hack) >> print(hack[0].summary) >> print(type(hack[0])) >> print(hack[0].asDict()['col[id]']) >> maxStr = hack[0].asDict()['col[id]'] >> ttt = int(maxStr) >> numDimensions = 1 + ttt >> print(numDimensions) >> >> >> Is there an easier way? >> >> >> Kind regards >> >> >> Andy >> >> > > > -- > Regards, > Alexander > > -- Regards, Alexander