Hi folks,
I am writing to ask how to filter and partition a set of files thru Spark.
The situation is that I have N big files (cannot fit into single machine). And
each line of files starts with a category (say Sport, Food, etc), while only
have less than 100 categories actually. I need a progr
from TABLE-NAME GROUP BY FIELD1, FIELD2;”
JavaSchemaRDD result = hsc.hql(hql);
List grp = result.collect();
for (int z = 2; z
< row.length(); z++) {
// Do something with the results
}
Curt
From: SiMaYunRui
Date: Sunday, February 15, 2015 at 10:37 AM
To: "use
getting an exact answer this way -- the approximation is only important
for distributing work among all executors. Even if the approximation is
inaccurate, you'll still correct for it, you will just have unequal partitions.
Imran On Sun, Feb 15, 2015 at 9:37 AM, SiMaYunRui wrote:
hello,
hello,
I am a newbie to spark and trying to figure out how to get percentile against a
big data set. Actually, I googled this topic but not find any very useful code
example and explanation. Seems that I can use transformer SortBykey to get my
data set in order, but not pretty sure how can I ge