Re: bitcask hash algo

2011-06-08 Thread David Smith
On Tue, Jun 7, 2011 at 11:39 PM, Aaron Blohowiak
 wrote:
> as far as i can tell, bitcask c_src is using the murmurhash2 algo, which has
> a known flaw ( https://sites.google.com/site/murmurhash/murmurhash2flaw )..
> while this is not *likely* to cause an issue, I was wondering if there was a
> reason that it does not use murmurhash3 ?
> if this is not the appropriate list for this question, please let me know!

The reason is simple -- murmurhash3 hasn't been released yet (that I
can see). :)

Once there is a final version of m3, we'll certainly upgrade. Thanks
for the pointer.

D.

-- 
Dave Smith
Director, Engineering
Basho Technologies, Inc.
diz...@basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


RFC: riak-python-client: Added a class for constructing key filters

2011-06-08 Thread Eric Moritz
Hi I added a class to the Python Riak client to hopefully help build
key filters easier.  I wanted to solicit the opinion of other Python
Riak develops to ensure that it smells right.

If you have a moment, add your 2 cents to the pull request's comment
thread here: 
https://github.com/basho/riak-python-client/pull/29#issuecomment-1325131

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A script to check bitcask keydir sizes

2011-06-08 Thread Justin Sheehy
On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer  wrote:

> The bigger concern for me would be the way the bucket/key tuple is
> serialized:
>
> Eshell V5.8  (abort with ^G)
> 1> iolist_size(term_to_binary({<<>>,<<>>})).
> 13
>
> That's 13 bytes of overhead per key were only 2 bytes is needed with
> reasonable bucket/key length limits of 256 bytes each. Or if that is not
> enough, one could also use a variable length encoding, so bucket/keys
> can be arbitrarily large and the most common cases (less then 128 bytes)
> still only use 2 bytes of overhead.

I've made a branch of bitcask that effectively does this.  It uses 3
bytes per record instead of 13, saving 10 bytes (both in RAM and on
disk) per element stored.

The tricky thing, however, is backward compatibility.  There are many
Riak installations out there with data stored in bitcask using the old
key encoding, and we shouldn't force them all to do a very costly
full-sweep of their existing data in order to get these savings.  When
we sort out the best way to manage a smooth upgrade, I would happily
push out the smaller encoding.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A script to check bitcask keydir sizes

2011-06-08 Thread Nico Meyer

Hi Justin,

I wanted to write this earlier, but I just had to much on my plate:

Am 08.06.2011 16:11, schrieb Justin Sheehy:

On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer  wrote:


The bigger concern for me would be the way the bucket/key tuple is
serialized:

Eshell V5.8  (abort with ^G)
1>  iolist_size(term_to_binary({<<>>,<<>>})).
13

That's 13 bytes of overhead per key were only 2 bytes is needed with
reasonable bucket/key length limits of 256 bytes each. Or if that is not
enough, one could also use a variable length encoding, so bucket/keys
can be arbitrarily large and the most common cases (less then 128 bytes)
still only use 2 bytes of overhead.

I've made a branch of bitcask that effectively does this.  It uses 3
bytes per record instead of 13, saving 10 bytes (both in RAM and on
disk) per element stored.

The tricky thing, however, is backward compatibility.  There are many
Riak installations out there with data stored in bitcask using the old
key encoding, and we shouldn't force them all to do a very costly
full-sweep of their existing data in order to get these savings.  When
we sort out the best way to manage a smooth upgrade, I would happily
push out the smaller encoding.



I think the possible gains of this change are fairly limited. Shaving of 
about 10 bytes per key compared to 43 bytes of overhead plus lets say at 
least 10 bytes for bucket and key combined is already less than 20 
percent savings.
The saving seems even smaller if you consider the overhead imposed by 
the memory allocator. I wrote a small test program in C++ which 
allocates one million blocks of memory of a given size and prints the 
overhead for each allocation. Turns out the overhead ranges from 8 to 23 
bytes in a sawtooth like pattern (on a 64bit Linux machine):


size=56: overhead=8
size=57: overhead=23
size=58: overhead=22
size=59: overhead=21
size=60: overhead=20
size=61: overhead=19
size=62: overhead=18
size=63: overhead=17
size=64: overhead=16
size=65: overhead=15
size=66: overhead=14
size=67: overhead=13
size=68: overhead=12
size=69: overhead=11
size=70: overhead=10
size=71: overhead=9
size=72: overhead=8

Not much you can do about that, unless one wants to use unaligned 
memory, which one doesn't.




-Justin



Cheers,
Nico


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bundler Version Requirement Problem

2011-06-08 Thread Keith Bennett
All -

I'm getting the following error when running my rspecs for my project:


>   /Library/Ruby/Site/1.8/rubygems.rb:274:in `activate': can't activate 
> builder (~> 2.1.2, runtime) for ["riak-client-0.9.4"], already activated 
> builder-3.0.0 for ["savon-0.9.1"] (Gem::LoadError)


Would it be possible to remove the riak client's requirement not to exceed 
version 2.1?

Thanks,
Keith




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Erlang API

2011-06-08 Thread Evans, Matthew
Hi List,

I'm new to riak, and I am thinking of using riak to store log file / statistics 
information from a client application. The main benefit riak could offer here 
are its map-reduce/search capabilities.

The client side applications are developed in a variety of languages, but in 
the end all requests will be tunneled via a light weight API to our own Erlang 
"admin" application.

I've noticed that the riak Erlang API really dispatches requests to the riak 
cluster via the gen_tcp module. The logging (insert) rate could be very high, 
so what would be neat is if I could embed our Erlang "admin" application 
directly into the riak cluster to avoid the IPC hop between the 
riak-erlang-client node and the riak node(s).

Now obviously I can fork the code and implement my own way of doing this, but 
the question is there an official way to embed your own code directly into a 
riak cluster to avoid that extra IPC hop?

Thanks

Matt
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Fwd: Bundler Version Requirement Problem

2011-06-08 Thread Keith Bennett
All -

This may not be an issue.  If so, I apologize for wasting your time.  I'm doing 
further research.

- Keith


Begin forwarded message:

> From: Keith Bennett 
> Date: June 8, 2011 11:24:43 AM EDT
> To: riak-users users 
> Subject: Bundler Version Requirement Problem
> 
> All -
> 
> I'm getting the following error when running my rspecs for my project:
> 
> 
>>  /Library/Ruby/Site/1.8/rubygems.rb:274:in `activate': can't activate 
>> builder (~> 2.1.2, runtime) for ["riak-client-0.9.4"], already activated 
>> builder-3.0.0 for ["savon-0.9.1"] (Gem::LoadError)
> 
> 
> Would it be possible to remove the riak client's requirement not to exceed 
> version 2.1?
> 
> Thanks,
> Keith
> 
> 
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: A script to check bitcask keydir sizes

2011-06-08 Thread Mike Oxford
> Am 08.06.2011 16:11, schrieb Justin Sheehy:
> The saving seems even smaller if you consider the overhead imposed by the
> memory allocator. I wrote a small test program in C++ which allocates one
> million blocks of memory of a given size and prints the overhead for each
> allocation. Turns out the overhead ranges from 8 to 23 bytes in a sawtooth
> like pattern (on a 64bit Linux machine):
>
--snip
>
> Not much you can do about that, unless one wants to use unaligned memory,
> which one doesn't.

Memory pools.
Preallocation+slicing.

Games to be played.

-mox

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Process for acceptance of pull requests

2011-06-08 Thread Anthony Molinaro
Hi,

  I had a pretty minor pull request against riak_kv

https://github.com/basho/riak_kv/pull/105

which I sent a few weeks ago, however, I've not seen any comments
or anything.  So I wanted to understand a little better what the 
process for this sort of thing should be.  Is a github pull request
the preferred mode of communication, or should we also send emails
or open bugs in the bugtracker?

Maybe the process is spelled out somewhere and I'm just missing it?
Or maybe the process is just to submit a pull request and wait and
I'm impatient ;)

Thanks for the info,

-Anthony

-- 

Anthony Molinaro   

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Pruning (merging) after storage reaches a certain size?

2011-06-08 Thread Justin Sheehy
Hi, Steve.

Check out this page: 
http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings

Basically, a "merge trigger" must be met in order to have the merge process 
occur.  When it does occur, it will affect all existing files that meet a 
"merge threshold."

One note that is relevant for your specific use: the expiry_secs parameter will 
cause a given item to disappear from the client API immediately after expiry, 
and to be cleaned if it is in a file already being merged, but will not 
currently contribute toward merge triggers or thresholds on its own if not 
otherwise "dead".

-Justin


On Jun 7, 2011, at 4:29 PM, Steve Webb wrote:

> Hello there.
> 
> I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch cluster 
> with the spritzer twitter feed.  I used the bitcask 'expiry_secs' to expire 
> data after 3 days.
> 
> I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll be 
> full in 3-4 more days of ingesting data.  I have no idea if/when a merge will 
> run to expire the older data.
> 
> Q: Is there a method or command to force a merge at any time?
> Q: Is there a way to run a merge when the storage size reaches a specific 
> threshold?
> 
> - Steve
> 
> --
> Steve Webb - Senior System Administrator for gnip.com
> http://twitter.com/GnipWebb
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Ubuntu repository

2011-06-08 Thread Mark Phillips
Hi Jon,

Great suggestion. This is on a long(ish) list of packaging and build
improvements we are working to put in place. We are aiming to have
this done before the next major release.

Thanks and keep the suggestions coming.

Mark


On Wed, Jun 1, 2011 at 10:37 PM, Jonathan Langevin
 wrote:
> Any chance of Basho making an Ubuntu Riak repository available anytime soon?
> A ppa via launchpad seems pretty painless to put together.
>
> - Jon Langevin -- sent from my Android phone
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for June 6 - 7

2011-06-08 Thread Mark Phillips
Afternoon, Evening, Morning to All -

For today's Recap: blog posts, some chatter from #riak, slides, code, and more.

Enjoy -

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups
--

Riak Recap for June 6 - 7
===

1) jjb, Vagabond, and ericmoritz/0 had a quick chat in #riak about how
well Riak would work as a session store.

* Read here ---> https://gist.github.com/1015798

2) Q --- Is there a simple way to reload an erlang module for
map/reduce across a cluster? (from ericmoritz\0 via #riak)

A --- Assuming the module is in your code path, you can run
c:nl(ModName) from the erlang console.

3) The Bay Area Basho Crew moved into a new space in San Francisco,
and you should all come visit.

 * Read all about it here --->
http://blog.basho.com/2011/06/07/BashoWest-is-all-new-and-we-have-desks-for-you/

4) Rick Olson (@technoweenie) posted a whole host of materials related
to the talk he gave last night at the Riak Meetup in San Francisco.

* Short blog post about the Dropbox clone he built with Riak and
ZeroMQ ---> http://techno-weenie.net/2011/6/7/dropbear/
* PDF of his slides --->
http://dl.dropbox.com/u/3561619/talks/zeromq-riak-technoweenie.pdf
* Code --->https://gist.github.com/122849a52c5b33c5d890

(Also, Rick really likes ZeroMQ. You should ask him about it if you're curious.)

5) Last reminder about Grant Schofield's Riak talk he is giving
tomorrow night at the Dallas Big Data Group.

* Details here ---> http://www.dfwbigdata.org/events/17758631/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com