Re: ssl support?

2012-07-13 Thread John E. Vincent
SSL is working for me (for riak-control) using self-signed
certificates. However I've not yet tried it with an external client.

On Fri, Jul 13, 2012 at 2:34 PM, Michael Johnson  wrote:
> I've been having problems getting riak to function via https and have not
> been able to find anything online that seems to help so far.  I am using a
> self-signed certificate (which is one I generated specifically for this
> testing, and thus could post as it will not be used for anything else) and
> have it stored as separate .crt and .key files.  I've used open SSL to
> verify the certificate and it appears to be all good.  Here is what the
> relevant bits of my app.config look like (I can post the rest as needed, but
> I'm trying to be consise):
>
>   {http, [{"0.0.0.0", 8091}]},
>   {https, [{"0.0.0.0", 8092}]},
>   {ssl, [
>  {certfile, "/etc/riak/ssl.crt"},
>  {keyfile, "/etc/riak/ssl.key"}
> ]},
>
> Starting riak does not generate any errors, and 'riak-admin test' works:
> [root@riak01 riak]# riak-admin test
> Attempting to restart script through sudo -u riak
> Successfully completed 1 read/write cycle to 'r...@riak01.mediatemple.net'
>
> Manuallly querying riak via http also works fine:
>
> [root@riak01 riak]# curl -k -vvv
> http://127.0.0.1:8091/riak/__riak_client_test__
> * About to connect() to 127.0.0.1 port 8091 (#0)
> *   Trying 127.0.0.1... connected
> * Connected to 127.0.0.1 (127.0.0.1) port 8091 (#0)
>> GET /riak/__riak_client_test__ HTTP/1.1
>> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7
>> NSS/3.13.1.0 zlib/1.2.3 libidn/1.18 libssh2/1.2.2
>> Host: 127.0.0.1:8091
>> Accept: */*
>>
> < HTTP/1.1 200 OK
> < Vary: Accept-Encoding
> < Server: MochiWeb/1.1 WebMachine/1.9.0 (someone had painted it blue)
> < Date: Fri, 13 Jul 2012 18:03:13 GMT
> < Content-Type: application/json
> < Content-Length: 410
> <
> * Connection #0 to host 127.0.0.1 left intact
> * Closing connection #0
> {"props":{"name":"__riak_client_test__","allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":1,"last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":1,"notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":1,"rw":1,"small_vclock":50,"w":1,"young_vclock":20}}
>
>
> But the minute I try to connect via https I have problems:
>
> [root@riak01 riak]# curl -k -vvv
> https://127.0.0.1:8092/riak/__riak_client_test__
> * About to connect() to 127.0.0.1 port 8092 (#0)
> *   Trying 127.0.0.1... connected
> * Connected to 127.0.0.1 (127.0.0.1) port 8092 (#0)
> * Initializing NSS with certpath: sql:/etc/pki/nssdb
> * warning: ignoring value of ssl.verifyhost
> * NSS error -5938
> * Closing connection #0
> * SSL connect error
> curl: (35) SSL connect error
>
> And I see the following in the logs:
>
> console.log:
> 2012-07-13 11:05:52.023 [error] <0.5313.0> CRASH REPORT Process <0.5313.0>
> with 0 neighbours crashed with reason:
> {ekeyfile,[{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}
> 2012-07-13 11:05:52.026 [error] <0.134.0> Supervisor ssl_connection_sup had
> child undefined started with {ssl_connection,start_link,undefined} at
> <0.5313.0> exit with reason ekeyfile in context child_terminated
> 2012-07-13 11:05:52.031 [error] <0.139.0> application: mochiweb, "Accept
> failed error", "{error,ekeyfile}"
> 2012-07-13 11:05:52.033 [error] <0.139.0> CRASH REPORT Process <0.139.0>
> with 0 neighbours crashed with reason: {error,accept_failed}
> 2012-07-13 11:05:52.035 [error] <0.135.0>
> {mochiweb_socket_server,310,{acceptor_error,{error,accept_failed}}}
>
> crash.log:
> 2012-07-13 11:05:52 =ERROR REPORT
> [83,83,76,58,32,"1112",58,32,"error",58,"[]",32,"/etc/riak/ssl.key","\n",32,32,[91,[[123,["ssl_connection",44,"init_private_key",44,"5"],125],44,10,"
> ",[123,["ssl_connection",44,"ssl_init",44,"2"],125],44,10,"
> ",[123,["ssl_connection",44,"init",44,"1"],125],44,10,"
> ",[123,["gen_fsm",44,"init_it",44,"6"],125],44,10,"
> ",[123,["proc_lib",44,"init_p_do_apply",44,"3"],125]],93],"\n"]2012-07-13
> 11:05:52 =CRASH REPORT
>   crasher:
> initial call: ssl_connection:init/1
> pid: <0.5313.0>
> registered_name: []
> exception exit: ekeyfile
>   in function  gen_fsm:init_it/6
>   in call from proc_lib:init_p_do_apply/3
> ancestors: [ssl_connection_sup,ssl_sup,<0.130.0>]
> messages: []
> links: [<0.134.0>]
> dictionary: []
> trap_exit: false
> status: running
> heap_size: 1597
> stack_size: 24
> reductions: 1185
>   neighbours:
> 2012-07-13 11:05:52 =SUPERVISOR REPORT
>  Supervisor: {local,ssl_connection_sup}
>  Context:child_terminated
>  Reason: ekeyfile
>  Offender:
> [{pid,<0.5313.0>},{name,undefined},{mfargs,{ssl_connection,start_link,undefined}},{restart_type,temporary},{shutdown,4

Re: Riak significant downtime

2012-08-02 Thread John E. Vincent
FWIW, on ubuntu you can drop a file in /etc/security/limits.d/ called
'riak.conf' with the appropriate stanzas you would normally include in
limits.conf and it will get read.

If you use chef or puppet, this is much easier than trying to have
multiple resources attempt to manage limits.conf together. I do this
for our java apps as well.

On Wed, Aug 1, 2012 at 8:20 PM, Sean Carey  wrote:
> John,
>  Glad things are starting to run smoothly. This pam.d setting has tripped me
> a couple of times.
>
> Best,
>
> Sean
>
> On Wednesday, August 1, 2012 at 8:16 PM, John Roy wrote:
>
> All --
>
> The pam.d/su and limits.conf changes seem to have brought us back to
> reliability -- so far so good.  The time consuming part was the reboot.  I
> double checked the ulimit in the risk console and all came up to 8192 -- my
> new limit.
>
> thanks for all your help,
>
> John
>
>
> On Aug 1, 2012, at 2:56 PM, Sean Carey wrote:
>
> John,
> Please make sure in /etc/pam.d/su, that the following line is uncommented:
>
> sessionrequired   pam_limits.so
>
> I have noticed lately in Ubuntu that this line is commented out by default.
>
>
> Best,
>
>
> Sean
>
> On Wednesday, August 1, 2012 at 5:47 PM, Jared Morrow wrote:
>
> You will need to make the adjustments in the /etc/security/limits.conf file
> as described here http://wiki.basho.com/Open-Files-Limit.html
>
> -Jared
>
>
> On Aug 1, 2012, at 3:33 PM, John Roy  wrote:
>
> Hi Reid --
>
> I added a risk.conf file in /etc/default with the line:
>
> ulimit -n 8192
>
> then rebooted, restarted risk and then did the attach.
>
> I got this line (which is also in the crash.log), then the limit of 1024.
> See below:
>
> 16:28:58.041 [error] Hintfile
> '/disk1/riak/bitcask/159851741583067506678528028578343455274867621888/12.bitcask.hint'
> contains pointer 118308596 570 that is greater than total data size
> 118308864
>
>> os:cmd("ulimit -n").
> "1024\n"
>
> I also set in manually prior to the reboot and got the same result.
>
>
> On Aug 1, 2012, at 2:09 PM, Reid Draper wrote:
>
> ulimit of 4096 might be too low. I'd also double-check the ulimit
> has taken effect either by attaching to the node (riak attach) or
> starting the node in the console (riak console), then type this:
>
> os:cmd("ulimit -n").
>
> Be sure to include the period (.) that
> is above as well.
>
> Reid
>
>
> On Aug 1, 2012, at 5:00 PM, John Roy wrote:
>
> Hi --
>
> Riak 1.1.1
> three nodes
> Ubuntu 10.04.1 LTS
> downtime means one node drops off then the other two follow so the entire
> cluster falls down.
>
> On Aug 1, 2012, at 1:48 PM, Mark Phillips wrote:
>
> Hey John,
>
> First questions would be:
>
> * What version of Riak?
> * How many nodes?
> * Which OS?
> * When you say "downtime" do you mean the entire cluster? Or just a subset
> of your nodes?
>
> Mark
>
> On Wed, Aug 1, 2012 at 1:42 PM, John Roy  wrote:
>
> I'm seeing significant downtime on Riak now.  Much like the "Riak Crashing
> Constantly" thread.  However in this case we get a "Too many open files"
> error, and also "contains pointer that is greater than the total data size."
> See the error messages below for more details.
>
> If others have an idea on this I'd appreciate your help.
>
> Thanks!
>
> John
>
> 2012-08-01 14:53:28 =ERROR REPORT
> Hintfile
> '/disk1/riak/bitcask/1347321821914426127719021955160323408745312813056/12.bitcask.hint'
> contains pointer 119561351 4415 that is greater than total data size
> 119562240
>
> 2012-08-01 14:54:26 =CRASH REPORT
>   crasher:
> initial call: riak_core_vnode:init/1
> pid: <0.29239.0>
> registered_name: []
> exception exit: [{riak_kv_eleveldb_backend,{db_open,"IO error:
> /disk1/riak/leveldb/753586781748746817198774991869333432010090217472/CURRENT:
> Too many open files"}}]
>   in function  gen_fsm:init_it/6
>   in call from proc_lib:init_p_do_apply/3
> ancestors: [riak_core_vnode_sup,riak_core_sup,<0.88.0>]
> messages: []
> links: [<0.92.0>]
> dictionary:
> [{#Ref<0.0.0.37922>,{bc_state,"/disk1/riak/bitcask/753586781748746817198774991869333432010090217472",fresh,undefined,[{filestate,read_only,"/disk1/riak/bitcask/753586781748746817198774991869333432010090217472/1.bitcask.data",1,<<>>,undefined,0,0},{filestate,read_only,"/disk1/riak/bitcask/753586781748746817198774991869333432010090217472/2.bitcask.data",2,<<>>,undefined,0,0}],2147483648,[{expiry_secs,-1},{read_write,true}],<<>>}},{random_seed,{17770,26756,17419}}]
> trap_exit: true
> status: running
> heap_size: 4181
> stack_size: 24
> reductions: 13734
>   neighbours:
> 2012-08-01 14:54:26 =SUPERVISOR REPORT
>  Supervisor: {local,riak_core_vnode_sup}
>  Context:child_terminated
>  Reason: [{riak_kv_eleveldb_backend,{db_open,"IO error:
> /disk1/riak/leveldb/753586781748746817198774991869333432010090217472/CURRENT:
> Too many open files"}}]
>  Offender:
> [{pid,<0.29239.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link,undefined}},{restart_type,tempo

Re: Node cannot join

2012-08-21 Thread John E. Vincent
It might be worth looking at the Chef cookbook for how it does it. As
I see it on a fresh install with no data, there's probably not much
major risk in concurrent joins. On an existing install, however, I'd
think you would want to go serially.

We stand up all of our riak clusters from scratch using the chef
cookbook from Basho and it works fine but we do it serially with 3
nodes as the base. Haven't gotten to the point where we need to bump
up to 5 nodes (and thus not had to tackle the addition of those
nodes).

On Tue, Aug 21, 2012 at 10:17 AM, Shane McEwan  wrote:
> Speaking of joining clusters . . .
>
> I'm in the middle of writing a CFEngine promise to automatically install
> Riak, configure it and join a cluster.
>
> Is it best to have all nodes issue a 'riak-admin cluster join' command
> before a single node issues the final 'riak-admin cluster commit' command?
> (Potentially hard to do with CFEngine) Or is it OK to have each node issue
> its own 'join' and 'commit' commands? (Reasonably easy)
>
> Or is it a bad idea to build the cluster automatically and I should really
> be doing it manually?
>
> Secondly, is there a riak-admin command or status value that definitively
> says "This node is part of a cluster" (even if all other nodes in the
> cluster are down) so that CFEngine knows not to re-issue the cluster join
> command a second time?
>
> Thanks!
>
>
> On 21/08/12 14:36, Ryan Zezeski wrote:
>>
>> Daniel,
>>
>> Cluster operations (invoked by `riak-admin cluster`) are now
>> multi-phase.  Instead of calling an operation and it taking effect
>> immediately you need to stage your changes, view your changes, and then
>> commit them.  E.g.
>>
>> riak-admin cluster join blah
>>
>> riak-admin cluster plan
>>
>> riak-admin cluster commit
>>
>> There is an `-f` option to keep the old behavior but we recommend
>> against this.  Staged clustering was put in place to keep users from
>> hurting their clusters and to make multiple changes more efficient.
>>
>> -Z
>>
>> On Tue, Aug 21, 2012 at 9:28 AM, Daniel Iwan > > wrote:
>>
>> Hi
>>
>> In my setup everything worked fine until I upgraded to riak 1.2
>> (although this may be a coincidence)
>> Nodes are installed from scratch with changes only to db backend (I'm
>> using eLevelDB)
>> and names.
>> For some reason node cannot join to another.
>> What am I doing wrong?
>>
>> I'm using Ubuntu 10.04 but I've seen the same behaviour on Ubuntu
>> 12.04
>> I may miss openssl dependency. I don't know if that matters
>>
>>
>> user@node2:~$ riak-admin cluster join riak@10.173.240.1
>> 
>>
>> Attempting to restart script through sudo -H -u riak
>> Success: staged join request for 'riak@10.173.240.2
>> ' to 'riak@10.173.240.1
>> '
>>
>>
>>
>> user@node1:~$ riak-admin member-status
>> Attempting to restart script through sudo -H -u riak
>> = Membership
>> ==
>> Status RingPendingNode
>>
>> ---
>> joining 0.0%  --  'riak@10.173.240.2
>> '
>> valid 100.0%  --  'riak@10.173.240.1
>> '
>>
>>
>> ---
>>
>> Valid:1 / Leaving:0 / Exiting:0 / Joining:1 / Down:0
>> user@node1:~$
>> user@node1:~$ riak-admin transfers
>>
>> Attempting to restart script through sudo -H -u riak
>> No transfers active
>>
>> Active Transfers:
>>
>> user@node1:~$
>>
>> riak-admin status
>> Attempting to restart script through sudo -H -u riak
>> 1-minute stats for 'riak@10.173.240.1 '
>>
>> ---
>> vnode_gets : 0
>> vnode_gets_total : 0
>> vnode_puts : 0
>> vnode_puts_total : 0
>> vnode_index_reads : 0
>> vnode_index_reads_total : 0
>> vnode_index_writes : 0
>> vnode_index_writes_total : 0
>> vnode_index_writes_postings : 0
>> vnode_index_writes_postings_total : 0
>> vnode_index_deletes : 0
>> vnode_index_deletes_total : 0
>> vnode_index_deletes_postings : 0
>> vnode_index_deletes_postings_total : 0
>> node_gets : 0
>> node_gets_total : 0
>> node_get_fsm_siblings_mean : 0
>> node_get_fsm_siblings_median : 0
>> node_get_fsm_siblings_95 : 0
>> node_get_fsm_siblings_99 : 0
>> node_get_fsm_siblings_100 : 0
>> node_get_fsm_objsize_mean : 0
>> node_get_fsm_objsize_median : 0
>> node_get_fsm_objsize_95 : 0
>> node_get_fsm_objsize_99 : 0
>> node_get_fsm_objsize_100 : 0
>> node_get_fsm_time_mean : 0
>> node_get_fsm_time_median : 0
>> node_get_fsm_time_95 : 0
>> node_get_fsm_time_99 : 0
>

Re: Riak Memory Usage Constantly Growing

2012-10-02 Thread John E. Vincent
I would highly suggest you upgrade to 1.2 when possible. We were, up
until recently, running on 1.4 and seeing the same problems you
describe. Take a look at this graph:

http://i.imgur.com/0RtsU.png

That's just one of our nodes but all of them exhibited the same
behavior. The falloffs are where we had to bounce riak.

This is what one of our nodes looks like now and has looked like since
the upgrade:

http://i.imgur.com/pm7Nk.png

The change was SO dramatic that I seriously though /stats was broken.
I've verified outside of Riak and inside. The memory usage change was
very positive. Evidently there's even still a memory leak.

We're heavy 2i users. No multi backend.

On Tue, Oct 2, 2012 at 4:08 AM, Shane McEwan  wrote:
> G'day!
>
> Just recently we've noticed memory usage in our Riak cluster constantly
> increasing.
>
> The memory usage reported by the Riak stats "memory_total" parameter has
> been less than 100MB for nearly a year but has recently increased to over
> 1GB.
>
> If we restart the cluster memory usage usually returns back to what we would
> call "normal" but after a week or so of stability the memory usage starts
> gradually growing again. Sometimes after a growth spurt over a few days the
> memory usage will plateau and be stable again for a week or two and then put
> on another growth spurt. The memory usage starts increasing at the same
> moment on all 4 nodes.
>
> This graph [http://imagebin.org/230614] shows what I mean. The green shows
> the memory usage as reported by "memory_total" (left-hand y-axis scale). The
> red line shows the memory used by Riak's beam.smp process (right-hand y-axis
> scale).
>
> Also notice that the gradient of the recent growth seems to be increasing
> compared to the memory increases we had in August.
>
> We might have just assumed that the memory usage was normal Riak behaviour.
> Perhaps we have just tipped over some sort of internal buffer or cache and
> that causes some more memory to be allocated. However, whenever we notice
> the memory usage increasing it always coincides with the "riak-admin top"
> command failing to run.
>
> We try to run "riak-admin top" to diagnose what is using the memory but it
> returns: "Output server crashed: connection_lost". If we restart the cluster
> the top command works fine (but, of course, there's nothing interesting to
> see after a restart!).
>
> So our theory at the moment is that some sort of instability or race
> condition is causing Riak to start consuming more and more memory. A side
> effect of this instability is that the internal processes needed for running
> the top command are not working correctly. The actual functionality of Riak
> doesn't seem to be affected. Our application is running fine. We see a
> slight increase in "FSM Put" times and CPU usage during the memory growth
> phases but all other parameters we're monitoring on the system seem
> unaffected.
>
> There's nothing abnormal in the logs. We get a lot of "riak_pipe_builder_sup
> {sink_died,normal}" messages but they can be ignored, apparently. The
> cluster is under constant load so we would expect to see either gradual
> memory increase or a steady state but not both. Erlang process count, open
> file handles, etc are stable.
>
> So I was wondering if anyone has seen similar behaviour before?
> Is there anything else we can do to diagnose the problem?
> I'm accessing the stats URL once per minute, could that have any side
> effects?
> We'll be upgrading to Riak 1.2 and new hardware in the next few weeks so
> should we just ignore it and hope it goes away?
> Any other ideas?
> Or is this just normal?
>
> Riak config:
> 4 VMware nodes
> ring_creation_size, 256
> n_val, 3
> eleveldb backend:
>   max_open_files, 20
>   cache_size, 15728640
> "riak_kv_version":"1.1.1",
> "riak_core_version":"1.1.1",
> "stdlib_version":"1.17.4",
> "kernel_version":"2.14.4"
> Erlang R14B03 (erts-5.8.4)
>
> Thanks!
>
> Shane.
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Memory Usage Constantly Growing

2012-10-02 Thread John E. Vincent
On Tue, Oct 2, 2012 at 7:55 AM, Kelly McLaughlin  wrote:
> John and Shane,
>
> I have been looking into some memory issues lately and I would be very 
> interested in more
> information about your particular problems. If either of you are able to get 
> some output
> from etop using the -sort memory option when you are having elevated memory 
> usage it
> would be very helpful to see. I know that sometimes you get the 
> connection_lost message
> when trying to use etop, but I have found that sometimes if you keep trying 
> it may succeed
> after a few attempts.
>
> Are either of you using MapReduce? I see that John is using 2I. Shane, do you 
> also use 2I?
> Finally, do you notice a lot of messages to the console or console log that 
> have the either the
> phrase 'monitor large_heap' or 'monitor long_gc'?
>
> Kelly
>

Kelly,

As it stands now, we're not seeing the problem anymore. If it crops up
again, I'll be ready to go. To answer your original question r.e. M/R,
internally we consider it a bug if our in-house ODM generates a MR
call but that's the fall back path it takes. And right now we ARE
generating considerable MR volume. When we upgraded to 1.2, we didn't
change any application code. We still get the following in the logs (I
haven't had the time to open a ticket on it) pretty frequently:

2012-10-02 15:12:37.190 [error]
<0.344.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup
2012-10-02 15:12:37.190 [error]
<0.356.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup
2012-10-02 15:12:37.191 [error]
<0.354.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup
2012-10-02 15:12:37.236 [error]
<0.344.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup
2012-10-02 15:13:13.718 [error]
<0.340.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup
2012-10-02 15:13:13.719 [error]
<0.332.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup
failed:fitting was gone before startup

And that hasn't changed with the upgrade. There's other debugging
information as well that I'll need to do via ticket. I'd love to have
something to post back to the group though afterwards.



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Memory Usage Constantly Growing

2012-10-02 Thread John E. Vincent
Forgot to paste this in the last email. We ARE seeing the long_gc
messages (but no large_heap):

erlang.log.3:10:43:21.406 [info] monitor long_gc <0.29654.5763>
[{initial_call,{riak_pipe_fitting,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
[{timeout,102},{old_heap_block_size,0},{heap_block_size,610},{mbuf_size,0},{stack_size,24},{old_heap_size,0},{heap_size,150}]
erlang.log.3:12:50:06.214 [info] monitor long_gc <0.2047.5836>
[{initial_call,{riak_pipe_vnode_worker,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
[{timeout,113},{old_heap_block_size,0},{heap_block_size,1597},{mbuf_size,0},{stack_size,50},{old_heap_size,0},{heap_size,688}]
erlang.log.3:14:39:06.280 [info] monitor long_gc <0.336.0>
[{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
[{timeout,150},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,11},{old_heap_size,0},{heap_size,2093}]
erlang.log.3:14:44:45.888 [info] monitor long_gc <0.672.0>
[{name,riak_kv_stat},{initial_call,{riak_kv_stat,init,1}},{almost_current_function,{gen_server,loop,6}},{message_queue_len,0}]
[{timeout,155},{old_heap_block_size,0},{heap_block_size,233},{mbuf_size,0},{stack_size,9},{old_heap_size,0},{heap_size,55}]

On Tue, Oct 2, 2012 at 7:55 AM, Kelly McLaughlin  wrote:
> John and Shane,
>
> I have been looking into some memory issues lately and I would be very 
> interested in more
> information about your particular problems. If either of you are able to get 
> some output
> from etop using the -sort memory option when you are having elevated memory 
> usage it
> would be very helpful to see. I know that sometimes you get the 
> connection_lost message
> when trying to use etop, but I have found that sometimes if you keep trying 
> it may succeed
> after a few attempts.
>
> Are either of you using MapReduce? I see that John is using 2I. Shane, do you 
> also use 2I?
> Finally, do you notice a lot of messages to the console or console log that 
> have the either the
> phrase 'monitor large_heap' or 'monitor long_gc'?
>
> Kelly
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Memory Usage Constantly Growing

2012-10-02 Thread John E. Vincent
On Tue, Oct 2, 2012 at 8:51 AM, Shane McEwan  wrote:
> Thanks John and Kelly. It's nice to know we're not the only ones. :-)
>
> As I said, we'll be upgrading to 1.2 in the coming weeks so it's good to
> know that the memory issues might go away after that. It's not a showstopper
> for us, more of a curiosity and concern it might develop into something
> worse.
>
> I'll persist with the etop and see if I can get it to run and will report
> back.
>
> We're still using key filters in our MapReduce functions but we plan to move
> to 2i at the same time as upgrading to 1.2.
>
> The word "monitor" doesn't appear in any of our logs for the last 5 days.
> Just lots of:
>
> 2012-10-02 00:10:47.869 [error] <0.31890.1344> gen_fsm <0.31890.1344> in
> state wait_pipeline_shutdown terminated with reason: {sink_died,normal}
> 2012-10-02 00:10:47.909 [error] <0.31890.1344> CRASH REPORT Process
> <0.31890.1344> with 0 neighbours crashed with reason: {sink_died,normal}
> 2012-10-02 00:10:47.981 [error] <0.166.0> Supervisor riak_pipe_builder_sup
> had child undefined started with {riak_pipe_builder,start_link,undefined} at
> <0.31890.1344> exit with reason {sink_died,normal} in context
> child_terminated
>
> Thanks!
>

We had this same error before the upgrade. It's much less noisy now
but same thing - sink_died
>
> On 02/10/12 15:55, Kelly McLaughlin wrote:
>>
>> John and Shane,
>>
>> I have been looking into some memory issues lately and I would be very
>> interested in more
>> information about your particular problems. If either of you are able to
>> get some output
>> from etop using the -sort memory option when you are having elevated
>> memory usage it
>> would be very helpful to see. I know that sometimes you get the
>> connection_lost message
>> when trying to use etop, but I have found that sometimes if you keep
>> trying it may succeed
>> after a few attempts.
>>
>> Are either of you using MapReduce? I see that John is using 2I. Shane, do
>> you also use 2I?
>> Finally, do you notice a lot of messages to the console or console log
>> that have the either the
>> phrase 'monitor large_heap' or 'monitor long_gc'?
>>
>> Kelly
>>
>> On Oct 2, 2012, at 6:11 AM, "John E. Vincent"
>>  wrote:
>>
>>> I would highly suggest you upgrade to 1.2 when possible. We were, up
>>> until recently, running on 1.4 and seeing the same problems you
>>> describe. Take a look at this graph:
>>>
>>> http://i.imgur.com/0RtsU.png
>>>
>>> That's just one of our nodes but all of them exhibited the same
>>> behavior. The falloffs are where we had to bounce riak.
>>>
>>> This is what one of our nodes looks like now and has looked like since
>>> the upgrade:
>>>
>>> http://i.imgur.com/pm7Nk.png
>>>
>>> The change was SO dramatic that I seriously though /stats was broken.
>>> I've verified outside of Riak and inside. The memory usage change was
>>> very positive. Evidently there's even still a memory leak.
>>>
>>> We're heavy 2i users. No multi backend.
>>>
>>> On Tue, Oct 2, 2012 at 4:08 AM, Shane McEwan  wrote:
>>>>
>>>> G'day!
>>>>
>>>> Just recently we've noticed memory usage in our Riak cluster constantly
>>>> increasing.
>>>>
>>>> The memory usage reported by the Riak stats "memory_total" parameter has
>>>> been less than 100MB for nearly a year but has recently increased to
>>>> over
>>>> 1GB.
>>>>
>>>> If we restart the cluster memory usage usually returns back to what we
>>>> would
>>>> call "normal" but after a week or so of stability the memory usage
>>>> starts
>>>> gradually growing again. Sometimes after a growth spurt over a few days
>>>> the
>>>> memory usage will plateau and be stable again for a week or two and then
>>>> put
>>>> on another growth spurt. The memory usage starts increasing at the same
>>>> moment on all 4 nodes.
>>>>
>>>> This graph [http://imagebin.org/230614] shows what I mean. The green
>>>> shows
>>>> the memory usage as reported by "memory_total" (left-hand y-axis scale).
>>>> The
>>>> red line shows the memory used by Riak's beam.smp process (right-hand
>>>> y-axis
>

Re: Official Basho Package Repositories Now Available

2012-10-16 Thread John E. Vincent
Wow. This is very much appreciated and will simplify our stuff
considerably. Thanks!
On Oct 16, 2012 5:48 PM, "James Martin"  wrote:

> Folks,
>
> I'm pleased to announce we now have official Basho package
> repositories for CentOS/Red Hat versions 5 & 6, Debian Squeeze, and
> Ubuntu Lucid/Natty/Precise.
>
> For information how to use these repos, please see our blog announcement:
>
> http://basho.com/blog/technical/2012/10/16/Basho-Package-Repos-Now-Online/
>
>
> Official documentation is in the works.
>
> Thanks,
>
> - James
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak fails to start on Ubtunu Precise

2012-10-24 Thread John E. Vincent
Brian,

Can you please contact enstratus about this. I'm not sure why this is
even happening since our installer is tested on precise but these
requests should come through enStratus as the installer creates a
working Riak install that shouldn't need any changes.

You can either email supp...@enstratus.com, your internal contact here
or me directly.

On Wed, Oct 24, 2012 at 12:26 PM, brian.thoma...@gmail.com
 wrote:
> I installed enstratus which makes use of riak and unfortunately, i can't get
> riak to start.  Here is the console output:
>
> Attempting to restart script through sudo -H -u riak
> Exec: /usr/lib/riak/erts-5.9.1/bin/erlexec -boot
> /usr/lib/riak/releases/1.2.0/riak -embedded -config
> /etc/riak/app.config -pa /usr/lib/riak/basho-patches
> -args_file /etc/riak/vm.args -- console
> Root: /usr/lib/riak
> Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:2:2] [async-threads:64]
> [kernel-poll:true]
>
>
> =INFO REPORT 24-Oct-2012::12:25:46 ===
> alarm_handler: {set,{system_memory_high_watermark,[]}}
> ** /usr/lib/riak/lib/observer-1.1/ebin/etop_txt.beam hides
> /usr/lib/riak/lib/basho-patches/etop_txt.beam
> ** Found 1 name clashes in code paths
> 12:25:46.667 [info] Application lager started on node 'riak@127.0.0.1'
> 12:25:46.762 [info] Upgrading legacy ring
> 12:25:46.991 [error] gen_server riak_core_capability terminated with reason:
> no function clause matching orddict:fetch('riak@127.0.0.1', []) line 72
> Erlang has closed
>
> =INFO REPORT 24-Oct-2012::12:25:47 ===
> alarm_handler: {clear,system_memory_high_watermark}
> /usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed.
>{"Kernel
> pid
> terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak@127.0.0.1',[]],[{file,[111,114,100,100,105,99,116,46,101,114,108]},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,[115,114,99,47,114,105,97,107,95,99,111,114,101,95,99,97,112,97,98,105,108,105,116,121,46,101,114,108]},{line,414}]},{riak_core_capability,handle_call,3,[{file,[115,114,99,47,114,105,97,107,95,99,111,114,101,95,99,97,112,97,98,105,108,105,116,121,46,101,114,108]},{line,208}]},{gen_server,handle_msg,5,[{file,[103,101,110,95,115,101,114,118,101,114,46,101,114,108]},{line,588}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,227}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}"}
>
> Crash dump was written to: /var/log/riak/erl_crash.dump
> Kernel pid terminated (application_controller)
> ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak@127.0.0.1',[]],[{
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Protection from listing buckets and other rogue queries

2013-03-06 Thread John E. Vincent
Just a thought but we've been working on disabling certain API
operations at the proxy level. We have a subsystem that uses riak that
should NEVER see a DELETE call ever and we're planning on guaranteeing
that by blocking it at the proxy level.

Combined with the actual nodes being inaccessible in front of the
proxy it seems like this would work. Fair warning I haven't yet DONE
this. It's just on my todo.

On Wed, Mar 6, 2013 at 12:27 PM, John Daily  wrote:
> I've been asking our engineers for help with this, and here's what I've 
> found...
>
> In theory, you could use the standard OTP mechanism[1] via the console to 
> stop and restart the riak_kv application to stop a (non-MapReduce) query, but 
> that has not worked across all versions of Riak. The rolling cluster restart 
> seems to be your best bet.
>
>
> Regarding the HTTP API...
>
> I do not recommend doing this without careful testing, because it doesn't 
> appear to be something we or our customers have done, but you should be able 
> to comment out the http listener configuration item under riak_core in 
> app.config on each node and then restart. In my very, very limited testing, I 
> did have one node that seemed to die quietly after making the change, but 
> I've not been able to reproduce it and the logs didn't have anything useful.
>
> The administrative console appears to work properly despite the change; the 
> only functionality you should lose would be link walking.
>
>
> It is possible to disable the Webmachine routes that expose bucket and key 
> lists, but there's no built-in way to make that configuration change persist 
> across restarts, so I haven't looked into the mechanism for that.
>
>
> [1] application:stop(riak_kv), followed by application:start(riak_kv)
>
> -John Daily
> Technical Evangelist
> jda...@basho.com
>
>
>
> On Mar 5, 2013, at 10:01 AM, Chris Read  wrote:
>
>> Greetings all...
>>
>> We have had a situation where someone ran a List Buckets query on the HTTP 
>> interface on a large cluster. This caused the whole system to become 
>> unresponsive and the only way we could think of to stop the load on the 
>> system was to restart the whole cluster.
>>
>> I've spent the morning trawling through the documentation, and can't find 
>> answers to the following:
>>
>> - Is it possible, and if so how does one go about terminating a query like 
>> this? I can see things like Pid in riak-admin top, what can I do with it?
>> - Can we run useful things over HTTP like the /admin console but have the 
>> rest of the HTTP API disabled?
>>
>> Thanks,
>>
>> Chris
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: iron.io post commit hook

2013-03-13 Thread John E. Vincent
Personally I'm interested in all manner of post-commit hook example
code. While I don't have a need personally, it would be a great way to
see more examples to learn from as I'm keen on something myself.

On Wed, Mar 13, 2013 at 9:21 AM, Jon Brisbin  wrote:
> Just wondering if anyone else in Riakland would be interested in a post
> commit hook that could trigger an IronWorker or send an IronMQ message
> (using the REST API)?
>
> I have a need for one myself for a iOS photography app I'm experimenting
> with (when I have free time...which is almost never!) and wondered if anyone
> else is using a similar stack and would benefit from making it Apache 2.0
> and available on GitHub.
>
> Thanks!
>
> Jon Brisbin
> http://about.me/jbrisbin
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Receiving ulimit warning despite setting it

2013-05-16 Thread John E. Vincent
As an opposing viewpoint, I'd argue that it's NOT the requirement of Riak
to go automatically changing things outside of its domain. Ulimits and
tunables in the same class are not things that should be blindly tweaked by
an incoming package. These are things the system administrator needs to be
aware of and scope for the system in use.

I appreciate the idea and desire that Riak work out of the box but I'd
argue it already does. What DOESN'T work is an untuned Riak at load. And it
shouldn't. There are some things that need to be an informed decision. Is
the default ulimit in most distros too low? Absolutely but it's in the
domain of the OS/Distro provider and not a third-party package to tweak
possible dangerous knobs.

The only sane default here is to use what the distro sets and provide
information for users on how to change it.



On Thu, May 16, 2013 at 9:48 AM, Jared Morrow  wrote:

> Toby,
>
> It seems to me like it would be nice if Riak "just worked" when you
>> installed it, instead of requiring each and every user to have to track
>> something down in the docs and then configure it in their chef/puppet
>> manifests. Don't you agree that is a desirable feature of good software?
>> (ie. Sensible defaults)
>>
>
> That's a good point, and like I said above, I was willing to accept that I
> was the only one with those views.
>
> I filed an issue for my backlog
> https://github.com/basho/node_package/issues/55 to take a look at that.
>  It is probably too late for our next major release to get in, but I do
> indeed want to make this easier on everyone, so thanks for the feedback.
>
> -Jared
>
>
> On Wed, May 15, 2013 at 11:44 PM, Toby Corkindale <
> toby.corkind...@strategicdata.com.au> wrote:
>
>> On 16/05/13 15:38, Jared Morrow wrote:
>>
>>> I've considered packaging separate files for configuring the limit
>>> for people, but the user in me always felt like that was something
>>> the sysadmin should have a say in. I rather dislike packages that
>>> make system changes without my knowledge or consent.  Maybe that is
>>> just me?
>>>
>>
>> It's not making a system change though -- it's only adjusting things for
>> the riak/riakcs user.
>>
>> Can you think of any situation where a user would WANT to stick to the
>> default 1024 file-handle limit and yet be running Riak?
>>
>> Now think of how often that situation occurs, compared to the number of
>> times where the user DOES want the "good" number setup, and would just like
>> to install Riak and then get on with their work?
>>
>> It seems to me like it would be nice if Riak "just worked" when you
>> installed it, instead of requiring each and every user to have to track
>> something down in the docs and then configure it in their chef/puppet
>> manifests. Don't you agree that is a desirable feature of good software?
>> (ie. Sensible defaults)
>>
>> Cheers,
>> Toby
>>
>>
>>  On May 15, 2013, at 10:54 PM, Toby Corkindale
>>> >
>>> wrote:
>>>
>>>  On 16/05/13 14:39, Toby Corkindale wrote:

> On 16/05/13 14:24, Jared Morrow wrote:
>
>> Well the riak-cs / riak / stanchion scripts all drop privileges
>> using sudo.  On RHEL/Centos this sudo exec carries the settings
>> from the calling user (in the case of init.d, root) so things
>> are fine there.  On Ubuntu/Debian that does not always work.
>> So if you set the ulimit for the root user, it might not
>> propagate through to the riak-cs / riak / stanchion users.
>>
>> So to change that, you should try to change it in
>> /etc/security/limits.conf.
>>
>
> My understanding is that only sessions opened through PAM will
> be effected by the limits.d/* config files. ie. Not daemon
> processes. (I've checked this anyway with the following
> /etc/security/limits.d/riakcs.**conf: riak-cshardnofile
> 32002 riak-cssoftnofile32001 )
>
> As noted previously, this problem was not occurring on the
> current Ubuntu LTS nodes; just the Debian Squeeze ones. Which
> makes it particularly odd.
>
> Thanks for your help so far; I'll continue to investigate and
> report back if I find a solution.
>

 I realised that the Riak CS user was called "riakcs" and not
 "riak-cs". Once I changed that in the limits.d/riakcs.conf file,
 riakcs started working without the file warning.

 I also added in a line for the regular "riak" user while I was
 there.

 May I suggest you add this to the debian/ubuntu packages by
 default? (ie. a file in /etc/security/limits.d/ )


 Cheers, Toby

 __**_ riak-users mailing
 list riak-users@lists.basho.com
 http://lists.basho.com/**mailman/listinfo/riak-users_**lists.basho.com

>>>
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.co

Re: Receiving ulimit warning despite setting it

2013-05-16 Thread John E. Vincent
For the cookbook, I think that's fine. It's not the actually package itself
doing the change. In fact, I expect that the cookbook and puppet modules
(or ansible playbooks...whatever) probably should do that. I'm making
a conscious decision to use those to manage the configuration. I think
that's the distinction installation vs. configuration.


On Thu, May 16, 2013 at 10:28 AM, Hector Castro  wrote:

> Slightly related, we just recently updated file descriptor limit
> support in the Riak cookbook [0]. As of right now, ulimits
> automatically get increased (4096 by default) for the `riak` and
> `riak-cs` users based on what cookbook you use.
>
> Perhaps we should make that increase conditional?
>
> --
> Hector
>
> [0]
> https://github.com/basho/riak-chef-cookbook/commit/2315fcc9dd31145e14526add2d8881456d191bcb
>
>
> On Thu, May 16, 2013 at 10:07 AM, John E. Vincent
>  wrote:
> > As an opposing viewpoint, I'd argue that it's NOT the requirement of
> Riak to
> > go automatically changing things outside of its domain. Ulimits and
> tunables
> > in the same class are not things that should be blindly tweaked by an
> > incoming package. These are things the system administrator needs to be
> > aware of and scope for the system in use.
> >
> > I appreciate the idea and desire that Riak work out of the box but I'd
> argue
> > it already does. What DOESN'T work is an untuned Riak at load. And it
> > shouldn't. There are some things that need to be an informed decision. Is
> > the default ulimit in most distros too low? Absolutely but it's in the
> > domain of the OS/Distro provider and not a third-party package to tweak
> > possible dangerous knobs.
> >
> > The only sane default here is to use what the distro sets and provide
> > information for users on how to change it.
> >
> >
> >
> > On Thu, May 16, 2013 at 9:48 AM, Jared Morrow  wrote:
> >>
> >> Toby,
> >>
> >>> It seems to me like it would be nice if Riak "just worked" when you
> >>> installed it, instead of requiring each and every user to have to track
> >>> something down in the docs and then configure it in their chef/puppet
> >>> manifests. Don't you agree that is a desirable feature of good
> software?
> >>> (ie. Sensible defaults)
> >>
> >>
> >> That's a good point, and like I said above, I was willing to accept
> that I
> >> was the only one with those views.
> >>
> >> I filed an issue for my backlog
> >> https://github.com/basho/node_package/issues/55 to take a look at
> that.  It
> >> is probably too late for our next major release to get in, but I do
> indeed
> >> want to make this easier on everyone, so thanks for the feedback.
> >>
> >> -Jared
> >>
> >>
> >> On Wed, May 15, 2013 at 11:44 PM, Toby Corkindale
> >>  wrote:
> >>>
> >>> On 16/05/13 15:38, Jared Morrow wrote:
> >>>>
> >>>> I've considered packaging separate files for configuring the limit
> >>>> for people, but the user in me always felt like that was something
> >>>> the sysadmin should have a say in. I rather dislike packages that
> >>>> make system changes without my knowledge or consent.  Maybe that is
> >>>> just me?
> >>>
> >>>
> >>> It's not making a system change though -- it's only adjusting things
> for
> >>> the riak/riakcs user.
> >>>
> >>> Can you think of any situation where a user would WANT to stick to the
> >>> default 1024 file-handle limit and yet be running Riak?
> >>>
> >>> Now think of how often that situation occurs, compared to the number of
> >>> times where the user DOES want the "good" number setup, and would just
> like
> >>> to install Riak and then get on with their work?
> >>>
> >>> It seems to me like it would be nice if Riak "just worked" when you
> >>> installed it, instead of requiring each and every user to have to track
> >>> something down in the docs and then configure it in their chef/puppet
> >>> manifests. Don't you agree that is a desirable feature of good
> software?
> >>> (ie. Sensible defaults)
> >>>
> >>> Cheers,
> >>> Toby
> >>>
> >>>
> >>>> On May 15, 2013, at 10:54 PM, Toby Corkindale
> >>>>  wrote:
> >>

Re: Riak on SAN

2013-10-02 Thread John E. Vincent
I'm going to take a competing view here.

SAN is a bit overloaded of a term at this point. Nothing precludes a SAN
from being performant or having SSDs. Yes the cost is overkill for fiber
but iSCSI is much more realistic. Alternately you can even do ATAoE.

>From a hardware perspective, if I have 5 pizza boxes as riak nodes, I can
only fit so many disks in them. Meanwhile I can add another shelf to my SAN
and expand as needed. Additionally backup of a SAN is MUCH easier than
backup of a riak node itself. It's a snapshot and you're done. Mind you
nothing precludes you from doing LVM snapshots in the OS but you still need
to get the data OFF that system for it to be truly backed up.

I love riak and other distributed stores but backing them up is NOT a
solved problem. Walking all keys, coordinating the take down of all your
nodes in a given order or whatever your strategy is a serious pain point.

Using a SAN or local disk also doesn't excuse you from watching I/O
performance. With a SAN I get multiple redundant paths to a block device
and I don't get that necessarily with local storage.

Just my two bits.



On Wed, Oct 2, 2013 at 2:18 AM, Jeremiah Peschka  wrote:

> Could you do it? Sure.
>
> Should you do it? No.
>
> An advantage of Riak is that you can avoid the cost of SAN storage by
> getting duplication at the machine level rather than rely on your storage
> vendor to provide it.
>
> Running Riak on a SAN also exposes you to the SAN becoming your
> bottleneck; you only have so many fiber/iSCSI ports and a fixed number of
> disks. The risk of storage contention is high, too, so you can run into
> latency issues that are difficult to diagnose without looking into both
> Riak as well as the storage system.
>
> Keeping cost in mind, too, SAN storage is about 10x the cost of consumer
> grade SSDs. Not to mention feature licensing and support... The cost
> comparison isn't favorable.
>
> Please note: Even though your vendor calls it a SAN, that doesn't mean
> it's a SAN.
>  On Oct 1, 2013 11:08 PM, "Guy Morton"  wrote:
>
>> Does this make sense?
>>
>> --
>> Guy Morton
>> Web Development Manager
>> Brüel & Kjær EMS
>>
>> This e-mail is confidential and may be read, copied and used only by the
>> intended recipient. If you have received it in error, please contact the
>> sender immediately by return e-mail. Please then delete the e-mail and do
>> not disclose its contents to any other person.
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak on SAN

2013-10-02 Thread John E. Vincent
Man I go away for a few hours for family time and off things go ;)

So this lead to some interesting convos on twitter and here. Others have
addressed some things. I figure it helps to explain the sad lonely world I
live in - It's called "Enterprise".

A few folks are somewhat aware that our product uses Riak under the covers.
We have a hosted SaaS version and we also allow customers to install it
entirely isolated on their own networks. The only people who do this are
traditional enterprises.

The very first question that comes up during an installation after "you
need HOW many servers?" is "How do I back this up".  Since we use LevelDB,
we have the worst of backup options - coordinated node shutdown and
tarball'ing. The thing is we can't say to them "you never need to back this
up. Just add more nodes!"

That doesn't check the box they have. That doesn't meet the legal and
industry guidelines they have to follow. So now that they've swallowed the
"you need 5 servers just for the DB" we now hit them with "your backup
strategy involves this complicated orchestrated shutdown process and some
tarballs". When faced with that, we ran into a new issue. They started
doing vm snapshots and stupid shit like vmotioning the instances (oh yes,
they virtualize it =/).

If you aren't familiar with vmotion, it's basically vmware's bullshit that
says they can somehow defy the laws of physics. If you read the details on
exactly what vmotion does (hint - it doesn't actually take the node offline
- vmware just "buffers" the pending network requests among other things),
you can see how this can TOTALLY fuck up Riak clusters.

Anyway so this is the world we have to live in and we have to provide
something that resembles a backup they can DR from. Our normal course of
action is to tell them to contact Basho for RiakDS and go multi-site. SAN
based snapshots largely meet that need for them.

For what it's worth, this is not just a problem with Riak and there are
legitimate use cases for wanting to have a "copy" of production data for
testing new code against. The biggest problem is once you get data IN to
riak (and other stores), it's REALLY difficult to prune it outside of
expensive "walk all the things", an external index of some kind or
resorting to application-level business logic tooling.

I'm not making a judgement call. Trade offs are a thing but it's definitely
a issue. At this point I'm considering resorting to a post-commit hook
machination of some kind.


On Wed, Oct 2, 2013 at 6:02 PM, Jeremiah Peschka  wrote:

> Responses inline.
>
> TL;DR - I actually agree with John, SANs make management of storage
> stupidly easy, but you pay more money for it. Make the right decision for
> your org, but make sure you can monitor and backup that decision. The SAN
> isn't a magic box. And  a Drobo b1200i [2] is definitely not a SAN.
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Wed, Oct 2, 2013 at 2:12 PM, John E. Vincent <
> lusis.org+riak-us...@gmail.com> wrote:
>
>> I'm going to take a competing view here.
>>
>> SAN is a bit overloaded of a term at this point. Nothing precludes a SAN
>> from being performant or having SSDs. Yes the cost is overkill for fiber
>> but iSCSI is much more realistic. Alternately you can even do ATAoE.
>>
>
> Agreed. You can buy a glorified direct attached storage device with a few
> ethernet ports in it, but vendors will call it a SAN.
>
>
>>
>> From a hardware perspective, if I have 5 pizza boxes as riak nodes, I can
>> only fit so many disks in them. Meanwhile I can add another shelf to my SAN
>> and expand as needed.
>>
>
> We have the ability to cram 16x 960GB SSDs into the front of a Dell R720
> for about $550 per drive... no SAN vendor can beat you on price for that.
> SAN storage is an order of magnitude more expensive, but...
>
>
>> Additionally backup of a SAN is MUCH easier than backup of a riak node
>> itself. It's a snapshot and you're done. Mind you nothing precludes you
>> from doing LVM snapshots in the OS but you still need to get the data OFF
>> that system for it to be truly backed up.
>>
>
> The products worth of being called a SAN offer you fantastic features like
> application aware volume snapshots, multi-site async and synchronous block
> level synchronization, and all kinds of amazing features that mean you
> never need to think about your storage beyond "HEY THERE, MAGIC BOX, I NEED
> 500GB OF SPACE!"
>
>
>>
>> I love riak and other distributed stores but backing t

Re: redundant writers

2011-02-15 Thread John E. Vincent
Les,

This is pretty much Dynamo 101 territory at this point and one of the
tradeoffs with a distributed model.

If you aren't familiar with Dynamo, Andy Gross (from Basho) gives an
AWESOME walkthrough in an episode of TheNoSQLTapes:

http://nosqltapes.com/video/understanding-dynamo-with-andy-gross

Concurrent writes really do behave the same way they would during a
network partition event. You can just stick your riak cluster behind a
load balancer and go to town.

On Tue, Feb 15, 2011 at 3:38 PM, Les Mikesell  wrote:
> In this scenario, the behavior I would want would be to only see one copy
> for each unique key, but to always have one even if one of the feeds or
> writers fails or the cluster is partitioned, then re-joined. This sounds
> good in theory, but what about the details? Is there likely to be a big
> performance hit from the normally-colliding writes?  Will it take twice the
> disk space, the eventually clean up the duplicates? Would it be reasonable
> to do this to riak-search with something like news stories?
>
>
> On 2/15/2011 2:02 PM, Alexander Sicular wrote:
>>
>> What about siblings? http://wiki.basho.com/REST-API.html (seach for
>> sibling)
>>
>> On Tue, Feb 15, 2011 at 14:57, Dan Reverri  wrote:
>>>
>>> Riak maintains a single value per key and provides mechanisms (vector
>>> clocks) to detect/resolve conflicting values. In the proposed use case
>>> the
>>> multiple copies would overwrite each other and Riak, by default, would
>>> return a single value for a requested key.
>>> Behind the scenes Riak determines the appropriate value per key using
>>> vector
>>> clocks. More information about vector clocks is available here:
>>> http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/
>>> http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/
>>> Thanks,
>>> Dan
>>> Daniel Reverri
>>> Developer Advocate
>>> Basho Technologies, Inc.
>>> d...@basho.com
>>>
>>>
>>> On Tue, Feb 15, 2011 at 10:11 AM, Les Mikesell
>>> wrote:

 Is riak suitable as a very reliable store where you have multiple feeds
 of
 streaming data that are at least theoretically identical?  That is, can
 you
 count on writing multiple copies with the same keys at the same time to
 do
 something reasonable regardless of cluster partitioning?  And is this a
 common usage scenario?

 --
  Les Mikesell
   lesmikes...@gmail.com

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: EC2 and node names

2011-05-03 Thread John E. Vincent
One option, if you are using a config management tool like Chef is to
use the Node name in Chef as the vm.args name and add entries to the
hosts file mapping those names to current ips.

On Tue, May 3, 2011 at 9:24 AM, Grant Schofield  wrote:
>
> On May 2, 2011, at 7:58 PM, Jeff Pollard wrote:
>
> I was reviewing the Riak Operations webinar, and it was mentioned that the
> preferred vm.args -name for EC2 environments should be "riak@hostname"
> because you don't have to "rename data or do anything weird" like you would
> if your nodes were named "riak@ip.address" (approximately 40:05 in the
> video).
> I was looking for some elaboration on this tip, namely:
>
> What is meant by "rename data or do anything weird"
>
> When you bring a cluster together using data copied from a different set of
> nodes you need to re-ip the first node you plan to start but you also have
> to change the ring manually on that node so when the subsequent nodes join
> everything works properly.
>
> Is "hostname" in riak@hostname a public DNS host that you configure in your
> DNS to map to the EC2 public hostname (ec2-50-18-...)?
>
> You can use DNS (public or private) or host file entries that reference the
> private IP of the node if you choose to use hostname.
>
> Does anyone have any best practices around vm.args -name in EC2
> environments?
>
> We haven't outlined any best practices ourselves, but I tend to believe
> using a hostname that you can change the IP for via DNS or a hosts file is a
> more flexible way of approaching the problem.
> Grant
>
> Thanks!
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak in AWS

2011-09-07 Thread John E. Vincent
If you're not uncomfortable with Chef, you can use the riak cookbook
to get things up and going.

On Wed, Sep 7, 2011 at 10:37 PM, Jeremiah Peschka
 wrote:
> Does anyone have a CloudFormation template (or other script) for getting a 
> Riak cluster up and running in AWS?
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: is there easy way to transfer mongo collections to riak?

2012-02-23 Thread John E. Vincent
2012/2/23 Will Moss :
> Hey Dominik,
>
> To answer the question in your subject line, no. To quote one of the guys I
> work with, "You're the bitch of data." That said, I'm extremely happy we did
> it. Riak does what it says it's going to do when it says it will and doesn't
> seem to really ever go down. I can't say the same for Mongo.
>
> We did this for some of the data we have in Mongo (and are planning to move
> the rest in the near future) and it certainly wasn't painless. What we did
> that worked reasonably well was:
> 1. Change all our mongo query patterns so that we were effectively using it
> a key-value store (i.e. no secondary indexes or other complex queries).
> 2. Write code that read an existing record from Mongo and wrote it to Riak.
> The actual schema stayed about the same, but we decided to go with protocol
> buffers in Riak since they are much more efficient at representing data. We
> mapped a single collection to a single bucket in Riak.
> 3. Change all the queries to Riak instead of Mongo.
> 4. Then, when we were ready to actually kick off the migration, we migrated
> uses as the logged in. So, if they had not been migrated, we delayed their
> login slightly while we copied the necessary data out of mongo and into
> Riak.
>
>
> Will

What Will said is a nice practice in general for migrating between
data storage systems. My general recommendation is:

- Dark launch new code paths that concurrently writes data to both systems
- Optionally dark launch reads against the new subsystem as well but
only log for comparison reasons. No presentation changes.
- Validate data in new subsystem
- When you're happy that things are looking good, you come to a bit of a fork:

- migrate the user wholesale on login
or
- run in an incremental fashion on reading a subset of data (old user
messages are copied into Riak only when accessed).

The later requires a few more iterations and there's a probably a
slight performance hit from talking to multiple datastores. However
the alternative is to take a big maintenance window and risk a big
migration that may or may not work. By going incremental with dark
launches you make it mostly transparent to the user and can work out
any latent bugs.

I've done this personally at a few places. The last time was migrating
blobs from Voldemort into S3 over the period of a month using a
similar approach. We'll probably do something similar at the new gig
when we migrate into Riak from MySQL.

- John
http://about.me/lusis

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: native erlang client on non-Riak node?

2012-04-05 Thread John E. Vincent
ElasticSearch has this feature as well.

When using the Java transport client (as opposed to the HTTP/JSON
interface), you actually become a cluster member as a non-data node.
You can also add any number of nodes to the cluster as non-data nodes
that then participate in general processing.

http://www.elasticsearch.org/guide/reference/modules/node.html
http://www.elasticsearch.org/guide/reference/java-api/client.html

On Thu, Apr 5, 2012 at 3:52 PM, Alexander Sicular  wrote:
> I think that being able to have a native erlang interface to a riak cluster
> would be a "Good Thing." I can imagine some compute applications that could
> benefit from having access to data at the speed of network but not actually
> be a member of the storage pool.
>
> Just watched Rich Hickey discussing similar concepts employed in his new
> system, Datomic, here, http://www.infoq.com/interviews/hickey-datomic .
> Worth the 30min.
>
> -Alexander Sicular
>
> @siculars
>
> On Apr 5, 2012, at 3:30 PM, Ryan Zezeski wrote:
>
> Mike,
>
> The native Erlang client was never designed to be used in this fashion, as
> you are finding out first hand.  You are correct in that some operations
> assume they are running on the Riak VM, e.g. mapred_dynamic_inputs_stream.
>
> Normally I would say not to establish a Erlang dist connection to the Riak
> cluster but given you need a work around for the performance issue it seems
> to be your only good choice ATM.  You can use `rpc:call` to invoke the
> mapred queries.
>
> -Z
>
> On Wed, Apr 4, 2012 at 2:40 PM, Michael Radford  wrote:
>>
>> In case anyone else is wondering, the answer is no, the native client
>> isn't fully usable from a non-Riak node.
>>
>> I started a hidden node and connected to one of my Riak nodes using
>> riak:client_connect. The returned client is able to successfully
>> execute Client:get(Bucket, Key) (and probably other simple things like
>> put/delete), but it fails on mapred with a {modfun, ...} input.
>>
>>
>> The problem is that the native client tries to run the input function
>> on the local node, rather than the node the client points to, which of
>> course won't work unless the local node is participating in the Riak
>> cluster.
>>
>> Mike
>>
>> On Tue, Apr 3, 2012 at 9:43 AM, Michael Radford  wrote:
>> > Is it OK to use Riak's native erlang client from an erlang node that's
>> > connected to a Riak cluster via normal erlang distribution
>> > (net_adm:ping etc.), but isn't participating in the Riak cluster?  I'm
>> > wondering if e.g. Riak makes any assumptions about nodes listed in
>> > nodes(), or the nodes of processes running client code, that would
>> > break if a node isn't running Riak.
>> >
>> > I'm trying to figure out a fallback strategy in case I can't resolve
>> > performance issues I'm seeing with the protocol buffers client (see
>> > other thread about slow searches).  The scariest option would be to
>> > run my application on the same nodes as Riak, which seems like it
>> > would complicate operations too much.  Or if
>> > connected-but-not-participating nodes aren't a good idea, maybe my
>> > application nodes could each start a slave node with -hidden, which
>> > would then connect to Riak and act as a proxy?
>> >
>> > Mike
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Misunderstanding cardinality issues ?

2012-04-18 Thread John E. Vincent
On Wed, Apr 18, 2012 at 3:42 PM, Adrien Mogenet
 wrote:
> Hi there,
>
> From this Riak Search's
> page (http://wiki.basho.com/MapReduce-Search-2i-Comparison.html) :
>
> Poor Use Case:
>
> Searching for common (low cardinality) terms in documents
>
> I agree that searching for common terms in Riak Search is a really poor use
> case (and not only in RS), but shouldn't we speak about "high cardinality" ?
>
> As far as I know, cardinality is a mathematical concept that defines the
> "size" of a set. Did I misunderstand it ?
>
> By the way, what is a "great cardinality" according to Riak Search ? 10 ?
> 100 ? 1000 ? ...
>

In database terms it's not the SIZE so much as the count of distinct
values.  Does that help? As far as "great cardinality", I have no idea
off hand what that would be.

>
> --
> Adrien
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Help with modeling and gotchas

2010-09-30 Thread John E. Vincent
Hey all,

I was wondering if someone could take a poke at the following issue I
opened for myself on a side project:

http://github.com/lusis/vogeler/issues/#issue/14

It's kind of terse but essentially, I wanted someone to vet and
comment on the Riak portion. I'm essentially looking for
recommendations, especially from someone who's modeled the same data
in different NoSQL engines.

Don't get too hung up on the functionality I'm abstracting. It's a
use-case specific ORM I'm having to implement and I'm getting really
buggered by having to even consider modifying the original model
simply because MongoDB doesn't like dots in key names.

Thanks a ton.

John E. Vincent
twitter/@lusis

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com