Hi SurajWhat will be your output after group by? Since GroupBy is for
aggregations like sum, count etc.
If you want to count the 2015 records than it is possible. Kind Regards
Salih Oztop
From: Suraj Shetiya <[email protected]>
To: [email protected]
Sent: Tuesday, June 30, 2015 3:05 PM
Subject: Spark Dataframe 1.4 (GroupBy partial match)
I have a dataset (trimmed and simplified) with 2 columns as below.
Date Subject
2015-01-14 "SEC Inquiry"
2014-02-12 "Happy birthday"
2014-02-13 "Re: Happy birthday"
2015-01-16 "Re: SEC Inquiry"
2015-01-18 "Fwd: Re: SEC Inquiry"
I have imported the same in a Spark Dataframe. What I am looking at is groupBy
subject field (however, I need a partial match to identify the discussion
topic).
For example in the above case.. I would like to group all messages, which have
subject containing "SEC Inquiry" which returns following grouped frame:
2015-01-14 "SEC Inquiry"
2015-01-16 "Re: SEC Inquiry"
2015-01-18 "Fwd: Re: SEC Inquiry"
Another usecase for a similar problem could be group by year (in the above
example), it would mean partial match of the date field, which would mean
groupBy Date by matching year as "2014" or "2015".
Keenly Looking forward to reply/solution to the above.
- Suraj