On 20/05/13 13:35, Alexander Sicular wrote:
I think the following line is your problem. As others have said, you
should not be M/R'ing over an entire bucket the performance will only
degrade as you store more items in Riak. You should feed an M/R with
results from a search, index or list of bucke
I think the following line is your problem. As others have said, you should not
be M/R'ing over an entire bucket the performance will only degrade as you store
more items in Riak. You should feed an M/R with results from a search, index or
list of bucket/key pairs.
-Alexander Sicular
@siculars
Hi Kurt,
A Riak cluster can handle very large amounts of data, and 500 000 000 keys
should not be a problem. Riak's MapReduce implementation is however not
designed or meant to be used for this type of large bulk processing, so
inserting all the data and then periodically performing MapReduce
So just to provide a bit of context.
We want a datastore that can hold over 500 000 000 keys and will those keys
will map reduced routinely.
I would love to use Riak for this but the question is can it handle this
amount of data (and possibly more) and can it be done cheaply?
What sort of hostin
Kurt,
I'm not sure about the cause of the MapReduce crash (I suspect it's running
out of resources of some kind, even with the increase of vm count and mem).
One word of advice about the list keys timeout, though:
Be sure to use streaming list keys.
In Python, this would look something like:
for
Hi Kurt,
In order to be able to provide some feedback on why the mapreduce job might be
timing out and try to help you address this, I will need some additional
information:
- Which version of Riak are you running on?
- What does your app.config file look like?
- What does you data look like?
-