I'm trying to run a very simplified key filter that's timing out. I've got
about 8M keys in a 3-node cluster, 15 GB memory, num_partitions=256, LevelDB
backend.
I'm thinking this should be pretty quick. What am I doing wrong?
Jim
Here's the query:
curl -v -d
'{"inputs":{"bucket":"nodes","key_
If you are doing just a simple equality check in the key filter, then why
not skip key filters and lookup the key directly? Key filters are not
performant over large data sets.
On Sun, Oct 23, 2011 at 2:38 PM, Jim Adler wrote:
> I'm trying to run a very simplified key filter that's timing out.
I will be loosening the key filter criterion after I get the basics working,
which I thought would be a simple equality check. 8M keys isn't really a
large data set, is it? I thought that keys were stored in memory and key
filters just operated on those memory keys and not data.
Jim
From: Ryan
On 10/23/2011 12:11 PM, Jim Adler wrote:
I will be loosening the key filter criterion after I get the basics
working, which I thought would be a simple equality check. 8M keys
isn't really a large data set, is it? I thought that keys were stored
in memory and key filters just operated on those me
Jim,
Looks like you are possibly using both the legacy key listing option and the
legacy map reduce. Assuming all your nodes are on Riak 1.0, check your
app.config files on all nodes and make sure mapred_system is set to pipe and
legacy_keylisting is set to false. If that's not already the case
Thanks for the pointers guys:
@eric - Am not sure the python library lets me do that - looking at the docs
and playing with the object this morning. I probably can't query by that
header at a later stage though can I?
@siculars - Thanks for that - I think that's the option I'm going to go for
- i
Thanks Kelly. You were spot-on. I had upgraded app.config on my other two
nodes but had missed one.
So, I updated app.config, restarted all cluster nodes, and reran the same
query. It now looks like a pipe error:
22:56:37.697 [error] Supervisor riak_pipe_builder_sup had child undefined
started
A little context on my use-case here. I've got about 8M keys in this 3 node
cluster. I need to clean out some bad keys and some bad data. So, I'm
using the key filter and search functionality to accomplish this (I tend to
use the riak python client). But, to be honest, I'm having a helluva time
By secondary index I meant the new indexing features in Riak 1.0. In order
to write out a link you need to write out the key/value/link (in the
header). Just include an indexing header for what you want to index, link
creation date in your case. Read up on secondary indexes in Riak.
@siculars
http
Thanks Alex - I've been researching this today and looking at the
implications because it appears I have to change the storage backend in
order to enable it - additionally the python client library isn't up to date
with this feature so it means merging in dev code to get access to it - so
I'm not d
I think some other threads mentioned that the python client will be getting
some 1.0 feature love sooner rather than later.
@siculars
http://siculars.posterous.com
Sent from my rotary phone.
On Oct 23, 2011 10:21 PM, "Andrew Fisher" wrote:
> Thanks Alex - I've been researching this today and lo
Jim,
A couple of things to note. First, bitcask stores all keys in memory, but
eleveldb does not necessarliy, so the performance of your disks could be a
factor. Not saying it is, but just a difference to be aware of between bitcask
and eleveldb.
Second, the latest error you shared was a time
Thanks Kelly. Much appreciated! I'll try your suggestions and get back.
Jim
From: Kelly McLaughlin
Date: Sun, 23 Oct 2011 22:02:52 -0600
To: Jim Adler
Cc: "riak-users@lists.basho.com"
Subject: Re: Key Filter Timeout
Jim,
A couple of things to note. First, bitcask stores all keys in memo
13 matches
Mail list logo