Fastest way to map/parallel read all values in a table?

Kevin Burton Sun, 08 Feb 2015 12:52:06 -0800

What’s the fastest way to map/parallel read all values in a table?

Kind of like a mini map only job.


I’m doing this to compute stats across our entire corpus.

What I did to begin with was use token() and then spit it into the number
of splits I needed.

So I just took the total key range space which is -2^63 to 2^63 - 1 and
broke it into N parts.

Then the queries come back as:

select * from mytable where token(primaryKey) >= x and token(primaryKey) < y

>From reading on this list I thought this was the correct way to handle this
problem.

However, I’m seeing horrible performance doing this.  After about 1% it
just flat out locks up.

Could it be that I need to randomize the token order so that it’s not
contiguous?  Maybe it’s all mapping on the first box to begin with.



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Fastest way to map/parallel read all values in a table?

Reply via email to