What’s the fastest way to map/parallel read all values in a table? Kind of like a mini map only job.
I’m doing this to compute stats across our entire corpus. What I did to begin with was use token() and then spit it into the number of splits I needed. So I just took the total key range space which is -2^63 to 2^63 - 1 and broke it into N parts. Then the queries come back as: select * from mytable where token(primaryKey) >= x and token(primaryKey) < y >From reading on this list I thought this was the correct way to handle this problem. However, I’m seeing horrible performance doing this. After about 1% it just flat out locks up. Could it be that I need to randomize the token order so that it’s not contiguous? Maybe it’s all mapping on the first box to begin with. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>