On Mar 29, 2016 8:56 AM, "Alexander Krasnukhin" <the.malk...@gmail.com> wrote:
> e.g. select max value for column "foo": > > from pyspark.sql.functions import max, col > df.select(max(col("foo"))).show() > > On Tue, Mar 29, 2016 at 2:15 AM, Andy Davidson < > a...@santacruzintegration.com> wrote: > >> I am using pyspark 1.6.1 and python3. >> >> >> *Given:* >> >> idDF2 = idDF.select(idDF.id, idDF.col.id ) >> idDF2.printSchema() >> idDF2.show() >> >> root >> |-- id: string (nullable = true) >> |-- col[id]: long (nullable = true) >> >> +----------+----------+ >> | id| col[id]| >> +----------+----------+ >> |1008930924| 534494917| >> |1008930924| 442237496| >> |1008930924| 98069752| >> |1008930924|2790311425| >> |1008930924|3300869821| >> >> >> >> *I have to do a lot of work to get the max value* >> >> >> rows = idDF2.select("col[id]").describe().collect() >> hack = [s for s in rows if s.summary == 'max'] >> print(hack) >> print(hack[0].summary) >> print(type(hack[0])) >> print(hack[0].asDict()['col[id]']) >> maxStr = hack[0].asDict()['col[id]'] >> ttt = int(maxStr) >> numDimensions = 1 + ttt >> print(numDimensions) >> >> >> Is there an easier way? >> >> >> Kind regards >> >> >> Andy >> >> > > > -- > Regards, > Alexander >