sorry you want the sum for each row or sum for each Colum? assuming all rows are numeric
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 4 August 2016 at 14:41, Javier Rey <jre...@gmail.com> wrote: > Hi everybody, > > Sorry, I sent last mesage it was imcomplete this is complete: > > I'm using PySpark and I have a Spark dataframe with a bunch of numeric > columns. I want to add a column that is the sum of all the other columns. > > Suppose my dataframe had columns "a", "b", and "c". I know I can do this: > > df.withColumn('total_col', df.a + df.b + df.c) > > The problem is that I don't want to type out each column individually and > add them, especially if I have a lot of columns. I want to be able to do > this automatically or by specifying a list of column names that I want to > add. Is there another way to do this? > > I find this solution: > > df.withColumn('total', sum(df[col] for col in df.columns)) > > But I get this error: > > "AttributeError: 'generator' object has no attribute '_get_object_id" > > Additionally I want to sum onlt not nulls values. > > Thanks in advance, > > Samir >