Compact RDD representation

Сергей Лихоман Sun, 19 Jul 2015 10:41:09 -0700

Hi,

I am looking for suitable issue for Master Degree project(it sounds like
scalability problems and improvements for spark streaming) and seems like
introduction of grouped RDD(for example: don't store
"Spark", "Spark", "Spark", instead store ("Spark", 3)) can:


1. Reduce memory needed for RDD (roughly, used memory will be:  % of uniq
messages)
2. Improve performance(no need to apply function several times for the same
message).

Can I create ticket and introduce API for grouped RDDs? Is it make sense?
Also I will be very appreciated for critic and ideas

Compact RDD representation

Reply via email to