problem with baseball MR example?

2011-10-07 Thread Nate Lawson
I was looking at the baseball MR example on the blog. http://basho.com/blog/technical/2011/01/20/Baseball-Batting-Averages-Riak-Map-Reduce/ One thing I was wondering was how the file split mechanism is aware of record lengths. It doesn't look like the author is using any particular split functio

batch adding of keys to LevelDB backend

2011-10-07 Thread Nate Lawson
We're working on a test where we want to add batches of keys to Riak. We're using the LevelDB backend with 1.0. One suggestion we heard was to take down a node, batch insert directly into the backing store, and then bring the node up again. Could someone give us some more details on this? How w

Re: batch adding of keys to LevelDB backend

2011-10-11 Thread Nate Lawson
use one of the clients to load the data > > 3. archive the data dir on all nodes > > Next time you want to stand up a cluster with the same data simply unarchive > the data dirs before starting the cluster. > > -Ryan > > > On Fri, Oct 7, 2011 at 9:29 PM, Nate Laws

Re: Secondary index scalability

2011-10-11 Thread Nate Lawson
A related question -- does the secondary index implementation make some attempt to cluster "nearby" integer keys for range queries? In other words, if I have an integer secondary index on a set of keys, is this taken into account in the partition function? Since you have to query the full cover

Re: batch adding of keys to LevelDB backend

2011-10-11 Thread Nate Lawson
nd setting w to a low value. > > Jeremiah Peschka > Founder, Brent Ozar PLF > > On Oct 11, 2011 8:50 AM, "Nate Lawson" wrote: > It's not just for testing. We found that loading/updating keys was somewhat > slow, even with a low replication count. So we're ex

Re: Do not expose Riak to the Internet

2011-10-19 Thread Nate Lawson
The fact that it allows any client to execute arbitrary code as the database user, on the database server? You can call 'os:cmd' to shell out from a M-R job. You can't do that directly in MySQL. I think this requirement should be extended to: "Don't allow clients to connect who aren't equivalen

Re: secondary indexes: how to know the saved indexes for a key?

2011-10-20 Thread Nate Lawson
On Oct 20, 2011, at 8:35 PM, Antonio Rohman Fernandez wrote: > Hi All, > > Imagine that I store "some_item" with key "some_key" into a "products" bucket > with a category "whatever" and a price "300". > I save it using cURL like this: > > $data = 'some_item'; > $key = 'some_key'; >

automatically expiring keys with LevelDB?

2011-10-21 Thread Nate Lawson
I know Bitcask has the expiry_secs option for expiring keys, but what about LevelDB? We're thinking of using Luwak as a file cache frontend to S3, and it would be nice for older entries to be deleted in LRU order as we store newer files. This could be implemented as a storage quota also (high/lo

Re: Riak and Distributed Image Processing

2011-11-07 Thread Nate Lawson
On Nov 7, 2011, at 1:23 PM, andrew cooke wrote: > Apologies if this is a dumb idea, or I am asking in the wrong place. I'm > muddling around trying to understand various bits of technology while piecing > together a possible project. So feel free to tell me I'm wrong :o) > > I am considering ho

Re: 2i for single-result lookup

2011-11-07 Thread Nate Lawson
On Nov 7, 2011, at 5:45 PM, Greg Pascale wrote: > Hi, > > I'm thinking about using 2i for a certain piece of my system, but I'm worried > that the document-based partitioning may make it suboptimal. > > The issue is that the secondary fields I want to query over (email and > username) are uniq

Re: 2i for single-result lookup

2011-11-07 Thread Nate Lawson
olution like this. Suddenly a user create > operation requires n writes to be considered a success. If one fails, I need > to delete the others, etc… It quickly becomes a pain. > > I don't know what you mean by "some relationship between the keys" > > -- > Greg

Re: 2i for single-result lookup

2011-11-07 Thread Nate Lawson
#x27;ll never have half of a user or some dangling index or anything. The > validity checks at read-time ensure this. But, some periodically run task > that cleans up your DB with MapReduce operations would be smart. > > Maybe there's a better way to do this, but I thought I

configurable prefix for consistent hashing?

2011-11-09 Thread Nate Lawson
We have been looking into ways to cluster keys to benefit from the LevelDB backend's prefix compression. If we were doing a batch of lookups and the keys from a given document could be grouped on a partition, they could be read with less disk IO. However, consistent hashing in Riak currently spr

Re: configurable prefix for consistent hashing?

2011-11-09 Thread Nate Lawson
On Nov 9, 2011, at 3:33 PM, Elias Levy wrote: > On Wed, Nov 9, 2011 at 3:29 PM, Phil Stanhope wrote: > Tread carefully here ... by forcing localilty ... you will sacrifice high > availability by algorithmically creating a bias and a single point of failure > in the cluster. > > You don't have

Re: configurable prefix for consistent hashing?

2011-11-09 Thread Nate Lawson
On Nov 9, 2011, at 3:49 PM, Nate Lawson wrote: > On Nov 9, 2011, at 3:33 PM, Elias Levy wrote: > >> On Wed, Nov 9, 2011 at 3:29 PM, Phil Stanhope wrote: >> Tread carefully here ... by forcing localilty ... you will sacrifice high >> availability by algorithmically crea

Re: Problem installing Riak Python client

2011-11-10 Thread Nate Lawson
On Nov 10, 2011, at 8:25 AM, Nitish Sharma wrote: Hi, > I am trying to install Riak's python client library using Pip. But it throws > an IOError while installing: IOError: [Errno 2] No such file or directory: > 'protobuf/setup.py'. Apparently, a lot of guys are facing the same problem. > The pr

Re: Problem installing Riak Python client

2011-11-10 Thread Nate Lawson
On Nov 10, 2011, at 4:04 PM, Greg Stein wrote: > On Thu, Nov 10, 2011 at 11:51, Nate Lawson wrote: >> ... >> BTW, are there any plans for the Riak python client to use the protobuf C >> library directly via ctypes? The pure python implementation of protobuf >> se

Re: Eleveldb backend randomly chewing all disk

2011-11-16 Thread Nate Lawson
On Nov 16, 2011, at 5:50 AM, David Smith wrote: > On Tue, Nov 15, 2011 at 6:01 PM, Jeremy Raymond wrote: >> I've seen issues when leveldb runs out of file handles. The leveldb >> log then fills with error messages. > > Hmm -- this could be. However, I would expect that the Erlang VM would > even

Re: Secondary Indexes - Feedback?

2011-11-16 Thread Nate Lawson
On Nov 16, 2011, at 9:57 AM, Rusty Klophaus wrote: > Now that you've had a few weeks to investigate and experiment with > Secondary Indexes, I'm hoping to hear about your experiences to help > us focus future development efforts most effectively: > • Have you tried Secondary Indexes? >

Re: Secondary Indexes - Feedback?

2011-11-16 Thread Nate Lawson
On Nov 16, 2011, at 11:41 AM, Rusty Klophaus wrote: >> 2. We need a guaranteed order of inputs from a 2I query. If we select on a >> range, each key we get on a given node in the M-R job should be ordered >> according to the 2I values. Of course we understand that keys won't be >> ordered acros