Re: Questions about Riak Enterprise

2012-05-09 Thread Mark Rose
On Wed, May 9, 2012 at 12:38 AM, Andrew Thompson  wrote:
>
> > Does the approximately 1 ms of latency between av zones affect Riak's
> > performance that much?
>
> If the latency is *guranteed* to be that low, then you should be ok,
> although I'm not sure how the networking works across zones. If the
> latency can do crazy things in outage conditions, you'll stand a decent
> chance of screwing the cluster. A downed node is better than a really,
> really slow one.
>

Well, one thing about AWS is that nothing is guaranteed. I have seen
latency spike up to 10 ms between zones, but it's brief. The zones may or
may not be in the same building, but they are close together and share the
same 10.x.x.x space. Amazon currently charges 1¢/GB for transfer between
zones, so there's obviously some network constraints between them compared
to machines inside a zone.


> > We were planning to run across av zones for fault tolerance, just beefing
> > up single nodes for the moment until rack awareness is available. So the
> > recommended solution is to use EDS to accomplish this?
>
> I'm not sure what you're describing here.
>

Basically, we were are planning to run a single 3 node cluster, with 1 node
in each av zone. We use this technique with a 3 node Galera cluster
(synchronous MySQL replication). Galera handles a disappearing node very
well, so if an av zone starts acting up the remaining machines continue
working. We run all our instance types in multiple zones so we can handle
an av zone going down.

>From what you're describing, Riak/Erlang doesn't handle a flaky
node/network well, so some manual intervention would be needed in the case
a node/network starts acting funny.

Because Riak doesn't offer rack awareness (we could treat each av zone as a
rack), and we still want copies of our data in multiple zones, our only
option to ensure live data is replicated in all the zones (for high
availability) is to set the number of replicas equal to the number of
nodes. We'll be fine until we outgrow the largest EC2 instance type.

Is rack awareness a planned feature? If so, when (ballpark) is it planned
for?

Actually, its worse than that because of some legacy behaviour. EDS
> wants to know the bind IP, not a hostname, and it will exchange node IPs
> with the other side of the connection, so internal IPs can 'leak' to the
> other cluster and cause connection problems. There is a workaround for
> this, and I do plan to address it.
>

I suppose the interim solution for EDS across EC2 regions is to use a VPC
in each region and use unique 10.x.x.x subnets in each and VPN between
them. But for us, we're not at the point of deploying to multiple regions
yet, so no need to dig more into this at the moment.

Thank you for answering my questions. This really helps!

-Mark
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Questions about Riak Enterprise

2012-05-09 Thread Elias Levy
On Wed, May 9, 2012 at 4:00 PM,  wrote:

>
> Actually, its worse than that because of some legacy behaviour. EDS
> wants to know the bind IP, not a hostname, and it will exchange node IPs
> with the other side of the connection, so internal IPs can 'leak' to the
> other cluster and cause connection problems. There is a workaround for
> this, and I do plan to address it.
>

That's problematic.  We will be rolling out an EDS implementation in the
near future.  One cluster will be in metal, but we are likely to place the
other on EC2, and we'd use elastic IPs and split horizon DNS.

When is this expected to be fixed?

Elias
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Most efficient way to determine if 1000 specific keys exist?

2012-05-09 Thread Shuhao Wu
Without reading all the emails.. why can't you just cache the keys in an
object and maintain that list? The you could check against that list. This
way you don't have to go through every object in riak.

Shuhao
On May 2, 2012 2:47 PM, "Tim Haines"  wrote:

> Hey guys,
>
> Still a relative newbie here.
>
> I was hoping to be able to setup a MapReduce job that I could feed 1000
> keys to, and have it tell me of the 1000, which keys exist in the bucket.
>  I was hoping this could use the key index (such a thing exists right?)
> without having to read the objects.
>
> The methods I've tried for doing this fail when the first non-existing key
> is found though.
>
> Is there a way to do this?
>
> Or alternatively, is there a way to check for the presence of one key at a
> time without riak having to read the object?
>
> Cheers,
>
> Tim.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Most efficient way to determine if 1000 specific keys exist?

2012-05-09 Thread Alexander Sicular
Or just do an exists set operation in redis.

Or use a bloom filter. (that you kept in Riak)

Or use your own binary encoding n keys long and flip bits. (that you kept
in Riak)

Scanning a list of keys in Riak might be one of the most inefficient ways
to do it. Also I don't like to keep values in Riak that mutate in some
unbounded way due to compaction issues.

-alexander

@siculars
http://siculars.posterous.com

Sent from my rotary phone.
On May 9, 2012 1:13 PM, "Shuhao Wu"  wrote:

> Without reading all the emails.. why can't you just cache the keys in an
> object and maintain that list? The you could check against that list. This
> way you don't have to go through every object in riak.
>
> Shuhao
> On May 2, 2012 2:47 PM, "Tim Haines"  wrote:
>
>> Hey guys,
>>
>> Still a relative newbie here.
>>
>> I was hoping to be able to setup a MapReduce job that I could feed 1000
>> keys to, and have it tell me of the 1000, which keys exist in the bucket.
>>  I was hoping this could use the key index (such a thing exists right?)
>> without having to read the objects.
>>
>> The methods I've tried for doing this fail when the first non-existing
>> key is found though.
>>
>> Is there a way to do this?
>>
>> Or alternatively, is there a way to check for the presence of one key at
>> a time without riak having to read the object?
>>
>> Cheers,
>>
>> Tim.
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Most efficient way to determine if 1000 specific keys exist?

2012-05-09 Thread Shuhao Wu
Agreed. If you need something like that, it's time to combine technologies
or hack something together for now until you need the efficiency later,
then switch to something like redis.

Shuhao


On Wed, May 9, 2012 at 1:35 PM, Alexander Sicular wrote:

> Or just do an exists set operation in redis.
>
> Or use a bloom filter. (that you kept in Riak)
>
> Or use your own binary encoding n keys long and flip bits. (that you kept
> in Riak)
>
> Scanning a list of keys in Riak might be one of the most inefficient ways
> to do it. Also I don't like to keep values in Riak that mutate in some
> unbounded way due to compaction issues.
>
> -alexander
>
> @siculars
> http://siculars.posterous.com
>
> Sent from my rotary phone.
> On May 9, 2012 1:13 PM, "Shuhao Wu"  wrote:
>
>> Without reading all the emails.. why can't you just cache the keys in an
>> object and maintain that list? The you could check against that list. This
>> way you don't have to go through every object in riak.
>>
>> Shuhao
>> On May 2, 2012 2:47 PM, "Tim Haines"  wrote:
>>
>>> Hey guys,
>>>
>>> Still a relative newbie here.
>>>
>>> I was hoping to be able to setup a MapReduce job that I could feed 1000
>>> keys to, and have it tell me of the 1000, which keys exist in the bucket.
>>>  I was hoping this could use the key index (such a thing exists right?)
>>> without having to read the objects.
>>>
>>> The methods I've tried for doing this fail when the first non-existing
>>> key is found though.
>>>
>>> Is there a way to do this?
>>>
>>> Or alternatively, is there a way to check for the presence of one key at
>>> a time without riak having to read the object?
>>>
>>> Cheers,
>>>
>>> Tim.
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak newbie, need to know if I can switch from Cassandra for a messageQueue For Erlang

2012-05-09 Thread Morgan Segalis
Hi everyone !

I have followed with interest the riak evolution.

I have a chat server written in Erlang from scratch with my own protocol.
Right now I'm using MySQL in order to store Users credentials and friend list.

I'm using Cassandra via thrift to store message that an offline user has got, 
until the user retrieves it.

My Cassandra data model, is quite simple, a Column Family, each row is an user, 
each column is a message (title = timestamp for getting time ordered data, 
value = message).

The thing is, despite the fact that I'm happy with Cassandra performance and 
TimeToLive feature, I would like to avoid the hassle of thrift to update my 
code.

Since Riak is (As I have seen on multiple website) the closer thing to 
cassandra (but simpler).

However Riak paradigm seems to be different somehow, with bucket which I'm not 
yet familiar with.

Before getting to know Riak better I would like to have some expert opinion on 
the matter.

I need to do several things : 

- Store from 1 to X messages per registered user.
- Get the number of stored messages per user. (may be stored on a variable)
- retrieve all messages from an user at once.
- delete all messages from an user at once.
- delete all messages that are older than X months no matter the user

I would really love your opinion on, is Riak fit my needs, and if so, what 
would be the data model ?
Is it realistic to have a bucket per user ? 

Best regards,



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak newbie, need to know if I can switch from Cassandra for a messageQueue For Erlang

2012-05-09 Thread Morgan Segalis
Hi Bogunov,

Thank you for your fast answer.

If I understand correctly your though, for every insert, I should retrieve the 
list of message, append a new message and then store the list again ?
If it is, doesn't it performance eating ? retrieve a whole list (that can be 
long if the user has not connected since a long time) append a new message and 
store it ? there is 2 operations just for storing… Or is there a way to append 
data directly on a key ?

Best regards,

Le 9 mai 2012 à 22:16, Bogunov a écrit :

> Hi, morgan.
> 
> - Store from 1 to X messages per registered user.
> Store all messages as one key.
> Get the number of stored messages per user. (may be stored on a variable)
> yes 
> retrieve all messages from an user at once.
> get one key =) 
> delete all messages from an user at once. 
> delete one key
> delete all messages that are older than X months no matter the user 
> you can store index entry like "written in X, X1 month" and find all users 
> who has old messages and truncate them  
> 
> Is it realistic to have a bucket per user ? 
> - bucket is prefix, for bucket you keep your default preferences: R/W/N, 
> post/pre-commit hooks, etc. Not much gain in doing so.
> 
> 
> On Wed, May 9, 2012 at 11:51 PM, Morgan Segalis  wrote:
> Hi everyone !
> 
> I have followed with interest the riak evolution.
> 
> I have a chat server written in Erlang from scratch with my own protocol.
> Right now I'm using MySQL in order to store Users credentials and friend list.
> 
> I'm using Cassandra via thrift to store message that an offline user has got, 
> until the user retrieves it.
> 
> My Cassandra data model, is quite simple, a Column Family, each row is an 
> user, each column is a message (title = timestamp for getting time ordered 
> data, value = message).
> 
> The thing is, despite the fact that I'm happy with Cassandra performance and 
> TimeToLive feature, I would like to avoid the hassle of thrift to update my 
> code.
> 
> Since Riak is (As I have seen on multiple website) the closer thing to 
> cassandra (but simpler).
> 
> However Riak paradigm seems to be different somehow, with bucket which I'm 
> not yet familiar with.
> 
> Before getting to know Riak better I would like to have some expert opinion 
> on the matter.
> 
> I need to do several things :
> 
> - Store from 1 to X messages per registered user.
> - Get the number of stored messages per user. (may be stored on a variable)
> - retrieve all messages from an user at once.
> - delete all messages from an user at once.
> - delete all messages that are older than X months no matter the user
> 
> I would really love your opinion on, is Riak fit my needs, and if so, what 
> would be the data model ?
> Is it realistic to have a bucket per user ?
> 
> Best regards,
> 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> -- 
> email: bogu...@gmail.com
> skype: i.bogunov
> phone: +7 903 131 8499
> Regards, Bogunov Ilya
> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Community Release Notes

2012-05-09 Thread Mark Phillips
Hi All,

We have a new repo on GitHub called The Riak Community [1] that I wanted to
bring to your attention. This repo is home of the Riak Community Release
Notes.  Here's the first installment (spanning April 1 - May 3):

https://github.com/basho/the-riak-community/blob/master/release-notes/riak-community-0.2.md

In short, it's a way to track and chronicle what we are doing as a
community. Instead of code changes, it consists of thing like blog posts,
new releases, and talks. (There is also a spot for new known production
users that could use some love [2].)

There's also a blog post up on basho.com with more details [3].

Enjoy. Looking forward to your contributions and feedback.

Mark
twitter.com/pharkmillups

[1] 
https://github.com/basho/the-riak-community/
[2]
https://github.com/basho/the-riak-community/blob/master/release-notes/riak-community-0.2.md#new-known-production-deployments
[3] http://basho.com/blog/technical/2012/05/09/Riak-Community-Release-Notes/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com