Hi Bob,I tested your scenario with Spark 1.3 and I assumed you did not miss the
second parameter of pow(x,y)
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)
df = sqlContext.jsonFile("/vagrant/people.json")# Displays the content of the
DataFrame to stdoutdf.show()#These are all finedf.select("name",
(df.age)*(df.age)).show()
name (age * age)
Michael null
Andy 900
Justin 361
df.select("name", (df.age)+1).show()
name (age + 1)
Michael null
Andy 31
Justin 20
However the following tests give the same error.df.select("name",
pow(df.age,2)).show()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-27-ce7299d3ef76> in <module>()
----> 1 df.select("name", pow(df.age,2)).show()
TypeError: unsupported operand type(s) for ** or pow(): 'Column' and 'int'
df.select("name", (df.age)**2).show()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-29540c3536bf> in <module>()
----> 1 df.select("name", (df.age)**2).show()
TypeError: unsupported operand type(s) for ** or pow(): 'Column' and 'int'
Moreover testing the functions individually they are working fine.pow(2,4)
162**4
16
Kind Regards
Salih Oztop
From: Bob Corsaro <[email protected]>
To: user <[email protected]>
Sent: Monday, June 29, 2015 7:27 PM
Subject: SparkSQL built in functions
I'm having trouble using "select pow(col) from table" It seems the function is
not registered for SparkSQL. Is this on purpose or an oversight? I'm using
pyspark.