Fwd: [Help] PySpark Dynamic mean calculation

2018-05-31 Thread Aakash Basu
e_fold_2', 'Separated_fold_2', 'Married-civ-spouse_fold_2', 'Widowed_fold_2', 'Divorced_fold_2', 'Never-married_fold_2'] for folds in range(k_folds): for column in orig_list: col_namer = [] for fold in range(k_folds): if fold != folds: col_namer.appe

[Help] PySpark Dynamic mean calculation

2018-05-31 Thread Aakash Basu
Hi, Using - Python 3.6 Spark 2.3 Original DF - key a_fold_0 b_fold_0 a_fold_1 b_fold_1 a_fold_2 b_fold_2 1 1 2 3 4 5 6 2 7 5 3 5 2 1 I want to calculate means from the below dataframe as follows (like this for all columns and all folds) - key a_fold_0 b_fold_0 a_fold_1 b_fold_1 a_fold_2 b_fol