date:20120511

Re: Riak newbie, need to know if I can switch from Cassandra for a messageQueue For Erlang

2012-05-11 Thread Eric Moritz

I am guessing that in Cassandra that each user's mailbox has a single row
key and each message has a lexicographic or timestamp for a column key to
preserve order.

This can be emulated by using a bucket name as the row key, something like
"mailbox-{user-key}" for the bucket and the lexicographic key for the
object key.  So a message's fully qualified URI in Riak would be something
like /buckets/mailbox-ericmoritz/2012-05-01T09:41:00.0324Z-0.0.0.29/

The message Id I generated is the message's timestamp as ISO-8601 with a
slugified reference() tacked on the end, this makes it unique and
lexicographically sortable.

To query the messages, you can add a secondary index for the user key on
the object.  You will have to sort the objects by key at query time.  If
sorting on reads is an issue, you can store a ordset() index under a
different bucket.  You'll have to turn off last write wins and do your
own conflict resolution, but resolving a set is as easy as ordsets:union().
 This ordset() is going to be much smaller and easier to manage than
keeping a sorted list of messages as a single document.  Granted you'll
have to pair it with a tombstone set for removing the messages from the
index document and worry about garbage collecting the delete items.

This technique may or may not end up performing worse than sorting on in
the mapred query so you'll have to benchmark it all yourself.

Since these queues are temporary, have you thought about using a persistent
messaging queue system?  I have not really explored how to set up a Highly
Available messaging queue like RabbitMQ, but it may be worth exploring for
you.

TL;DR You can do it in Riak, Cassandra may be better, a messaging queue
might be best.

Eric.

On Wed, May 9, 2012 at 4:25 PM, Morgan Segalis  wrote:

> Hi Bogunov,
>
> Thank you for your fast answer.
>
> If I understand correctly your though, for every insert, I should retrieve
> the list of message, append a new message and then store the list again ?
> If it is, doesn't it performance eating ? retrieve a whole list (that can
> be long if the user has not connected since a long time) append a new
> message and store it ? there is 2 operations just for storing… Or is there
> a way to append data directly on a key ?
>
> Best regards,
>
> Le 9 mai 2012 à 22:16, Bogunov a écrit :
>
> Hi, morgan.
>
>
> - Store from 1 to X messages per registered user.
>
> Store all messages as one key.
>
> Get the number of stored messages per user. (may be stored on a variable)
>
> yes
>
>> retrieve all messages from an user at once.
>
> get one key =)
>
>> delete all messages from an user at once.
>
> delete one key
>
> delete all messages that are older than X months no matter the user
>
> you can store index entry like "written in X, X1 month" and find all users
> who has old messages and truncate them
>
> Is it realistic to have a bucket per user ?
> - bucket is prefix, for bucket you keep your default preferences: R/W/N,
> post/pre-commit hooks, etc. Not much gain in doing so.
>
>
>
> On Wed, May 9, 2012 at 11:51 PM, Morgan Segalis wrote:
>
>> Hi everyone !
>>
>> I have followed with interest the riak evolution.
>>
>> I have a chat server written in Erlang from scratch with my own protocol.
>> Right now I'm using MySQL in order to store Users credentials and friend
>> list.
>>
>> I'm using Cassandra via thrift to store message that an offline user has
>> got, until the user retrieves it.
>>
>> My Cassandra data model, is quite simple, a Column Family, each row is an
>> user, each column is a message (title = timestamp for getting time ordered
>> data, value = message).
>>
>> The thing is, despite the fact that I'm happy with Cassandra performance
>> and TimeToLive feature, I would like to avoid the hassle of thrift to
>> update my code.
>>
>> Since Riak is (As I have seen on multiple website) the closer thing to
>> cassandra (but simpler).
>>
>> However Riak paradigm seems to be different somehow, with bucket which
>> I'm not yet familiar with.
>>
>> Before getting to know Riak better I would like to have some expert
>> opinion on the matter.
>>
>> I need to do several things :
>>
>> - Store from 1 to X messages per registered user.
>> - Get the number of stored messages per user. (may be stored on a
>> variable)
>> - retrieve all messages from an user at once.
>> - delete all messages from an user at once.
>> - delete all messages that are older than X months no matter the user
>>
>> I would really love your opinion on, is Riak fit my needs, and if so,
>> what would be the data model ?
>> Is it realistic to have a bucket per user ?
>>
>> Best regards,
>>
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
>
> --
> email: bogu...@gmail.com
> skype: i.bogunov
> phone: +7 903 131 8499
> Regards, Bogunov Ilya
>
>
>
>
> ___
> riak-users mail

Re: How do I improve Level DB performance?

2012-05-11 Thread Tim Haines

David,

I ran the benchmark again for 9 hours overnight, just doing puts.
 Performance fell steadily from 400 puts/s to 250 puts/s.

Graph: http://twitpic.com/9jtjmu/full

Cheers,

Tim.

On Thu, May 10, 2012 at 3:01 PM, David Smith  wrote:

> On Thu, May 10, 2012 at 2:33 PM, Tim Haines  wrote:
>
> > I've set up a new cluster, and have been doing pre-deployment benchmarks
> on
> > it. The benchmark I was running slowly sunk from 1000 TPS to 250 TPS over
> > the course of the single 8 hour benchmark doing 1 read+1 update using 1k
> > values.  I'm wondering if anyone might have suggestions on how I can
> improve
> > this.
>
> Generally, this suggests that you are becoming seek-time bound. The
> test config, as specified, will generate a pretty huge number of
> not_founds which are (currently) crazy expensive w/ LevelDB,
> particularly as the dataset grows.
>
> Assuming you start with an empty database, a sample of this test will
> generate operations like so:
>
> Key 1000 - get -> not_found
> Key 1001 - update -> not_found + write
> Key 1002 - get -> not_found
> etc..
>
> I.e. the leveldb cache never gets a chance to be useful, because
> you're always writing new values and the cost of writing each new
> value goes up, since you have to thrash the cache to determine if
> you're ever seen the key that doesn't exist. :)
>
> The root problem here is going to be the key_generator --
> partitioned_sequential_int will just run through all the ints in order
> and never revisit a key.
>
>
> > {write_buffer_size, 16777216},
> > {max_open_files, 100},
> > {block_size, 262144},
> > {cache_size, 168430088}
>
> I strongly recommend not changing write_buffer_size; it can have
> unexpected latency side-effects when LevelDB compaction occurs.
> Smaller == more predictable.
>
> Does that help?
>
> D.
>
> --
> Dave Smith
> VP, Engineering
> Basho Technologies, Inc.
> diz...@basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Ryan Zezeski

On Thu, May 10, 2012 at 11:14 PM, Tim Haines  wrote:
>
>
>
> With the adjusted ring size and settings, and adjusted to only do puts (so
> no missed reads), my cluster is doing about 400 puts per second:
> http://twitpic.com/9jnhlm/full
>

Actually, every put (put from a riak API level) does a read on the backend
[1].  This is needed to merge contents from the two objects [2].

Like Dave already mentioned the key generation strategy along with
leveldb's degrading performance on not-found means your benchmark will just
get worse the longer it runs.

Are you testing an actual use case here?  Do you envision 100M objects
being written in a constant stream?  Will your objects have a median size
of 1000 bytes?  Basho bench also provides a pareto key generator which uses
a fraction of the key space most of the time.  I'm not sure it matches your
use case but thought I'd mention it is there.

-Z

[1]: https://github.com/basho/riak_kv/blob/1.1.2/src/riak_kv_vnode.erl#L669

[2]: https://github.com/basho/riak_kv/blob/1.1.2/src/riak_kv_vnode.erl#L686
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Ryan Zezeski

On Fri, May 11, 2012 at 11:00 AM, Tim Haines  wrote:
>
>
> I ran the benchmark again for 9 hours overnight, just doing puts.
>  Performance fell steadily from 400 puts/s to 250 puts/s.
>
> Graph: http://twitpic.com/9jtjmu/full
>
>
1> Secs = 9 * 60 * 60.
32400
2> Secs * 400.
1296

Given your writing 100M in that 9 hour span you've still only written ~13%
of the key space.  Each write is doing a get and causing the costly
multi-seek not-found.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Tim Haines

On Fri, May 11, 2012 at 9:13 AM, Ryan Zezeski  wrote:

>
>
> On Thu, May 10, 2012 at 11:14 PM, Tim Haines  wrote:
>>
>>
>>
>> With the adjusted ring size and settings, and adjusted to only do puts
>> (so no missed reads), my cluster is doing about 400 puts per second:
>> http://twitpic.com/9jnhlm/full
>>
>
> Actually, every put (put from a riak API level) does a read on the backend
> [1].  This is needed to merge contents from the two objects [2].
>
> Like Dave already mentioned the key generation strategy along with
> leveldb's degrading performance on not-found means your benchmark will just
> get worse the longer it runs.
>
> Are you testing an actual use case here?  Do you envision 100M objects
> being written in a constant stream?  Will your objects have a median size
> of 1000 bytes?  Basho bench also provides a pareto key generator which uses
> a fraction of the key space most of the time.  I'm not sure it matches your
> use case but thought I'd mention it is there.
>
>
Hi Ryan,

Thanks. Greg just mentioned the reads on puts too.  I'd changed the config
to 250 bytes (matching about what I store for a tweet), and reran it
overnight, and observed performance drop from 400 puts/s to 250 puts/s.
 Right now my use case has me constantly writing about 200 new tweets per
second, so unless I'm missing something, this throughput measurement is a
realistic indicator for me.

Tim.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Tim Haines

On Fri, May 11, 2012 at 9:20 AM, Tim Haines  wrote:

>
>
> On Fri, May 11, 2012 at 9:13 AM, Ryan Zezeski  wrote:
>
>>
>>
>> On Thu, May 10, 2012 at 11:14 PM, Tim Haines  wrote:
>>>
>>>
>>>
>>> With the adjusted ring size and settings, and adjusted to only do puts
>>> (so no missed reads), my cluster is doing about 400 puts per second:
>>> http://twitpic.com/9jnhlm/full
>>>
>>
>> Actually, every put (put from a riak API level) does a read on the
>> backend [1].  This is needed to merge contents from the two objects [2].
>>
>> Like Dave already mentioned the key generation strategy along with
>> leveldb's degrading performance on not-found means your benchmark will just
>> get worse the longer it runs.
>>
>> Are you testing an actual use case here?  Do you envision 100M objects
>> being written in a constant stream?  Will your objects have a median size
>> of 1000 bytes?  Basho bench also provides a pareto key generator which uses
>> a fraction of the key space most of the time.  I'm not sure it matches your
>> use case but thought I'd mention it is there.
>>
>>
> Hi Ryan,
>
> Thanks. Greg just mentioned the reads on puts too.  I'd changed the config
> to 250 bytes (matching about what I store for a tweet), and reran it
> overnight, and observed performance drop from 400 puts/s to 250 puts/s.
>  Right now my use case has me constantly writing about 200 new tweets per
> second, so unless I'm missing something, this throughput measurement is a
> realistic indicator for me.
>
> Tim.
>


I guess I was hoping that someone could look at these results and say
"Given the use case and the hardware, Riak should be performing 10x what
you're seeing, so something is configured wrong."  I'm not hearing that
though.  What I'm hearing is "Is that a realistic use case?".

So given this use case, and the hardware I have, these are expected results?

Tim.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Ryan Zezeski

On Fri, May 11, 2012 at 12:42 PM, Tim Haines  wrote:

>
> I guess I was hoping that someone could look at these results and say
> "Given the use case and the hardware, Riak should be performing 10x what
> you're seeing, so something is configured wrong."  I'm not hearing that
> though.  What I'm hearing is "Is that a realistic use case?".
>
> So given this use case, and the hardware I have, these are expected
> results?
>
> Tim.
>

Yea, I don't want to come of as dodging the question.  I've seen lots of
people run benchmarks for use cases they don't even have.  That doesn't
seem to be the case here.

I take very little stock in absolute numbers for the most part.  I'm not
sure what numbers you should see because I've never tried this particular
case, with that hardware.  One question to ask is if it's truly leveldb or
riak causing the slowness?  I'm assuming you chose level either for indexes
or keys-not-in-memory but I imagine if you ran the same bench with bitcask
you'd see much better results.

Since the application's semantics are to always write a unique key you can
also take advantage of the `last_write_wins` bucket property.  It will
avoid some work in your case but still has to read the object for a backend
with index capabilities (in order to delete the old indexes).  Using it
with something like bitcask avoids the read.  It seems to me, for use cases
like this, it would be good to have a 'just_write_it' option with the
semantic of "I know this key is unique, and even if for some weird reason
it isn't, I don't care so just write what I pass you."

All that said, there is work currently going on to put blooms in leveldb to
alleviate the not-found issue.  I'm not sure what the status is but perhaps
someone else will chime in on that.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: How do I improve Level DB performance?

2012-05-11 Thread Tim Haines

On Fri, May 11, 2012 at 10:05 AM, Ryan Zezeski  wrote:

>
>
> On Fri, May 11, 2012 at 12:42 PM, Tim Haines  wrote:
>
>>
>> I guess I was hoping that someone could look at these results and say
>> "Given the use case and the hardware, Riak should be performing 10x what
>> you're seeing, so something is configured wrong."  I'm not hearing that
>> though.  What I'm hearing is "Is that a realistic use case?".
>>
>> So given this use case, and the hardware I have, these are expected
>> results?
>>
>> Tim.
>>
>
> Yea, I don't want to come of as dodging the question.  I've seen lots of
> people run benchmarks for use cases they don't even have.  That doesn't
> seem to be the case here.
>
>
Ryan, thanks.  I appreciate your response.  I'm waiting to hear back from
John this morning to see if I can engage one of your consultants to answer
the question for me.

Tim.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: erlang-http-client and riak::search

2012-05-11 Thread Ryan Zezeski

On Thu, May 10, 2012 at 8:15 PM, Gregory Haskins
wrote:

> Hi All,
>
> I'm working on an erlang client of the riak::search functionality built
> into riak-1.1.2.  While I do see the support for querying data that is
> already indexed (rhc:search), I do not see how to index and/or remove data
> from the database.  I do see this functionality in other clients, such as
> ruby or the raw HTTP/SOLR interface, so I know its technically possible on
> the backend  Am I just missing something, or is the erlang client for
> riak::search not yet feature complete?  If its not, I would be willing to
> submit some patches to help bring it up to speed, but I figured I would ask
> in case I am just looking in the wrong places.
>

Greg,

The best way to use Search is for indexing KV data by installing the
precommit hook.  There are APIs to write docs that don't go through KV but
IMO they never should have been added to Search and I plan to focus it on
indexing KV data only in the coming months.

I would love to see better Search support across all the clients.  I'm
actually surprised people use the erlang-http-client.  It seems if you're
in Erlang already why not just use the protobuff client?

Looking at the client, one good thing to do right off the bat is _not_ use
map/reduce just to pull back the keys.  That is adding latency and using
more resources on the server for no good reason.

Anyways, I'm supposed to be on vacation right now and not writing emails.
 If you decide to write some pull-requests I'm happy to help with
questions/reviewing.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: erlang-http-client and riak::search

2012-05-11 Thread Greg

On May 11, 2012, at 1:20 PM, Ryan Zezeski wrote:

> 
> The best way to use Search is for indexing KV data by installing the 
> precommit hook.  There are APIs to write docs that don't go through KV but 
> IMO they never should have been added to Search and I plan to focus it on 
> indexing KV data only in the coming months.

Ah, ok.  I hadn't thought of using it like that.  Would you like me to 
potentially submit a wiki entry on using search this way so others may benefit 
(if/once I figure it out)?

> I'm actually surprised people use the erlang-http-client.  It seems if you're 
> in Erlang already why not just use the protobuff client?

Actually, historically I have used the pb client.  I only switched to the http 
client b/c I couldn't find _any_ reference to the search functionality in the 
pb variant.  I didn't notice right away that the rhc client only seems to have 
rhc:search but no index/delete functions.

That said, I have often wondered if the HTTP client also might be more 
load-balancer friendly against the REST interface?  Or does the pb client have 
provisions for this as well?

> 
> Looking at the client, one good thing to do right off the bat is _not_ use 
> map/reduce just to pull back the keys.  That is adding latency and using more 
> resources on the server for no good reason.

Yeah, that makes sense.  I was actually a little surprised to notice via 
wireshark that it uses the MR engine for a search/3 invocation instead of the 
SOLR query interface.

> 
> Anyways, I'm supposed to be on vacation right now and not writing emails.  If 
> you decide to write some pull-requests I'm happy to help with 
> questions/reviewing.

Great, enjoy the remainder of your vacation!

-Greg

signature.asc
Description: Message signed with OpenPGP using GPGMail
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Questions about Riak Enterprise

2012-05-11 Thread Andrew Thompson

On Thu, May 10, 2012 at 11:09:36AM -0700, Ahmed Bashir wrote:
> Hey Andrew,
> 
> Can you elaborate on how EDS replication does this mirroring?  Does
> each vnode have the ability to connect to the other cluster, or is
> there a coordinator that sends data to the other cluster, etc?

There is a coordinator.

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Questions about Riak Enterprise

2012-05-11 Thread Andrew Thompson

On Wed, May 09, 2012 at 04:35:06PM +, Elias Levy wrote:
> That's problematic.  We will be rolling out an EDS implementation in the
> near future.  One cluster will be in metal, but we are likely to place the
> other on EC2, and we'd use elastic IPs and split horizon DNS.
> 
> When is this expected to be fixed?

Its on my shortlist. I'm not sure it'll make 1.2 though, given my
current workload. We should be able to provide a workaround that will
simply cause a delay in connecting the clusters but not have any other
harmful effects.

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Riak newbie, need to know if I can switch from Cassandra for a messageQueue For Erlang

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: How do I improve Level DB performance?

Re: erlang-http-client and riak::search

Re: erlang-http-client and riak::search

Re: Questions about Riak Enterprise

Re: Questions about Riak Enterprise

12 matches

Site Navigation

Mail list logo

Footer information