Hello Users, I've got a real-world use case that seems common enough that its pattern would be documented somewhere, but I can't find any references to a simple solution. The challenge is that data is getting dumped into a directory structure, and that directory structure itself contains features that I need in my model. For example:
bank_code Trader Day-1.csv Day-2.csv ... Each CVS file contains a list of all the trades made by that individual each day. The problem is that the bank & trader should be part of the feature set. I.e. We need the RDD to look like: (bank, trader, day, <list-of-trades>) Anyone got any elegant solutions for doing this? Cheers, - SteveN