Hi,
I wanted to obtain a grouped by frame from a dataframe.
A snippet of the column on which I need to perform groupby is below.
> df.select("To").show()
To
ArrayBuffer(vance...
ArrayBuffer(vance...
ArrayBuffer(rober...
ArrayBuffer(richa...
ArrayBuffer(guill...
ArrayBuffer(m..pr...
ArrayBuffer(rich....
ArrayBuffer(issue...
ArrayBuffer(jim.f...
ArrayBuffer(richa...
A sample field is as below
> df.select("To").collect()[0]
Row(To=[u'[email protected]', u'[email protected]',
u'[email protected]', u'[email protected]',
u'[email protected]', u'[email protected]',
u'[email protected]', u'[email protected]',
u'[email protected]', u'[email protected]',
u'[email protected]', u'[email protected]',
u'[email protected]'])
I want to perform a group by on "To" column but perform it by each
recipient of the email rather than the entire field.
Is there a way to do this using the dataframe groupBy command ?
Regards,
Suraj