Re: Cluster start and 2i query

2014-03-05 Thread Ciprian Manea
Hi Daniel,

One possible configuration would be:

+ fronting the Riak cluster with HAProxy (or a hardware load balancer)
+ when the Riak server boots-up, block the Riak API ports (using iptables)
+ also at boot, spawn a riak-admin wait-for-service riak_kv process [0]
+ once the riak-admin exits, drop the Riak API related rules from iptables

Handling failures and retries gracefully in the application code should
help as well.


[0]
http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#wait-for-service


Regards,
Ciprian


On Wed, Mar 5, 2014 at 12:58 PM, Daniel Iwan  wrote:

> Any ideas regarding that?
>
> Thanks
> Daniel
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Cluster-start-and-2i-query-tp4030557p4030610.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Cluster start and 2i query

2014-03-07 Thread Ciprian Manea
Hi Daniel,

Secondary index queries need at least 1/n_val primary partitions to be
available before it could run successfully and Riak would return
{error,insufficient_vnodes_available} while the required primary partitions
are coming up.

I would suggest defensive programming (retrying the 2i queries on error) as
a way to mitigate this.


Thanks,
Ciprian


On Wed, Mar 5, 2014 at 11:06 PM, Daniel Iwan  wrote:

> Thanks Ciprian
>
> We already have wait-for-service in our script and it looks like it's not a
> sufficient condition to satisfy secondary index query.
> How long application should wait before starting querying Riak using 2i?
> Should we do riak-admin transfers to make sure there are no vnode transfers
> happening?
>
> I'm trying to figure it out what {error,insufficient_vnodes_available}
> means
> in terms of 2i query.
> Does it mean not all primary partitions are up?
>
> Regards
> Daniel
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Cluster-start-and-2i-query-tp4030557p4030614.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Partitions placement

2014-03-14 Thread Ciprian Manea
Hi Daniel,

"A Little Riak Book" covers the logic behind partition allocation in an
overly simplified way.

Riak will distribute partitions to vnodes in a pseudo-random fashion,
resulting in allocations like you described. These allocations are less
optimal when the number of riak nodes are small, hence we (strongly)
recommend 5+ nodes for production use.

Storing 3 data copies in 3 different servers sounds trivial to do, but not
that easy to scale up once the numbers of servers grows. To cope with
scalability, Riak introduces an "overlay". Data is first placed in
"partitions" (always a power of 2) which are then distributed to different
server nodes. As powers of 2 are not divisible by 3, this approach has a
problem at lower scale: some nodes will hold a few extra partitions (which
were not intended to be stored there).

If you know you are not going to need n_val greater then 3 in your buckets,
one way to hint this to Riak and get a better distribution of partitions to
nodes is to configure [0] target_n_val to 3.


[0]
http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/


Regards,
Ciprian


On Fri, Mar 14, 2014 at 12:09 AM, Daniel Iwan  wrote:

> Below is an output of my Riak cluster. 3 physical nodes. Ring size 128.
> As far as I can tell when Riak installed fresh it is always place
> partitions
> in the same way on a ring as long as number of vnodes and servers is the
> same.
>
> All presentations including "A Little Riak Book' show pretty picture of
> ring
> and nodes claiming partitions in a  sequential fashion. That's clearly not
> a
> case.
> Output below shows that node2 is picked as favourite, which means replicas
> of certain keys will definitely be on the same hardware. Partitions are
> split 44 + 42 + 42. Why not 43+43+42?
>
> Another thing, why the algorithm selects nodes in 'random' non-sequential
> fashion? When the cluster gets created and nodes 2 & 3 are joined to node
> 1,
> it's a clear situation. Partitions are empty so vnodes could be assigned in
> a way so there's no consecutive partitions on the same hw.
> My issue is that in my case if node2 goes down and I'm storing some data
> with N=2 I will definitely not be able access certain keys and more
> surprisingly all 2i will no longer work for the buckets with N=2 due to
> {error,insufficient_vnodes_available}. That is all 2i's for those buckets.
>
> I understand that when new nodes are attached Riak tries to avoid
> reshuffling everything and just moves certain partitions, and at that point
> you may end up with copies on the same physical nodes. But even then Riak
> should make best effort and try not to put consecutive partitions on the
> same server. If it has to move it anyway it could as well put it on any
> other machine but the one that holds partition with preceding and following
> index.
> I also understand Riak does not guarantee that replicas are on distinct
> servers (why? it should, at least for N=2 and N=3 if possible)
>
> I appreciate minimum recommended setup is 5 nodes and I should be storing
> with N=3 minimum.
> But I just find it confusing when presentations show something that is not
> even remotely close to reality.
>
> Just to be clear I have nothing against Riak, I think it's great though bit
> disappointing that there are no stronger conditions about replica placement
> here.
>
> I'm probably missing something and simplifying too much. Any clarification
> appreciated.
>
> Daniel
>
>
> riak@10.173.240.1)2>
> (riak@10.173.240.1)2> {ok, Ring} = riak_core_ring_manager:get_my_ring().
> {ok,
>  {chstate_v2,'riak@10.173.240.1',
>   [{'riak@10.173.240.1',{303,63561952927}},
>{'riak@10.173.240.2',{31,63561952907}},
>{'riak@10.173.240.3',{25,63561952907}}],
>   {128,
>[{0,'riak@10.173.240.1'},
> {11417981541647679048466287755595961091061972992,
>  'riak@10.173.240.2'},
> {22835963083295358096932575511191922182123945984,
>  'riak@10.173.240.2'},
> {34253944624943037145398863266787883273185918976,
>  'riak@10.173.240.3'},
> {45671926166590716193865151022383844364247891968,
>  'riak@10.173.240.1'},
> {57089907708238395242331438777979805455309864960,
>  'riak@10.173.240.2'},
> {68507889249886074290797726533575766546371837952,
>  'riak@10.173.240.2'},
> {79925870791533753339264014289171727637433810944,
>  'riak@10.173.240.3'},
> {91343852333181432387730302044767688728495783936,
>  'riak@10.173.240.1'},
> {102761833874829111436196589800363649819557756928,
>  'riak@10.173.240.2'},
> {114179815416476790484662877555959610910619729920,
>  'riak@10.173.240.2'},
> {12559779695812446953312916531172001681702912,
>  'riak@10.173.240.3'},
> {137015778499772148581595453067151533092743675904,
>  'riak@10.173.240.1'},
> {148433760041419827630061740822747494183805648896,
>  'riak@10.173.240.2'},
> {159851741583067506678528028578343455274867621888,
>  'riak@10.173.240.2'},
> {1712697231

Re: RiakError: timeout

2014-03-18 Thread Ciprian Manea
Hi Massimiliano,

As a first step I would recommend setting pb_backlog to 64 or 128 in your
app.config

How are you distributing the load from your python clients to the Riak
cluster? Is every python client connecting directly to one Riak node or do
you have a pool of Riak servers configured in each client?


Thanks,
Ciprian


On Tue, Mar 18, 2014 at 8:49 AM, Massimiliano Ciancio <
massimili...@ciancio.net> wrote:

> I tried to raise each param in sysctl.con (now the configuration is a
> bit strange with all that 9s :-)
> But I still get many "RiakError: timeout" :(
> Have you any suggestion, please?
> Thanks in advance
> Massimiliano
>
> fs.file-max = 99
> vm.swappiness = 0
> net.ipv4.tcp_max_syn_backlog = 99
> net.core.somaxconn = 99
> net.ipv4.tcp_timestamps = 0
> net.ipv4.tcp_sack = 1
> net.ipv4.tcp_window_scaling = 1
> net.ipv4.tcp_fin_timeout = 15
> net.ipv4.tcp_keepalive_intvl = 30
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_tw_recycle = 1
> vm.max_map_count = 3200
> net.core.rmem_default = 8388608
> net.core.rmem_max = 8388608
> net.core.wmem_default = 8388608
> net.core.wmem_max = 8388608
> net.core.netdev_max_backlog = 99
>
> 2014-03-17 21:15 GMT+01:00 Michael Dillon :
> > On the server which is sending write requests to the Riak cluster, you
> may
> > have run into some network limits, possibly max sockets or one of the
> > network buffer settings. I would try to tune your kernel for a high
> level of
> > network traffic and try again. Or just split your load across more than
> one
> > server.
> >
> > If you are going to have very high loads of requests going to a Riak
> cluster
> > it is a good idea to put a load balancer in front of it so that you
> spread
> > the requests across nodes in the cluster. Riak's clustering only
> distributes
> > work AFTER Riak receives the requests. If you sent all your Riak
> requests to
> > just one member of the cluster, then you can potentially create an
> incoming
> > network bottleneck on that server.
> >
> >
> >
> > On Mon, Mar 17, 2014 at 11:34 AM, Massimiliano Ciancio
> >  wrote:
> >>
> >> Hi list,
> >> I'm in troubles...
> >> I'm getting many timeout errors from Riak (see traceback at end of
> mail).
> >> I'm using a 5 node Debian cluster. Riak version is 1.4.8. Riak Python
> >> client is installed with 'pip install riak' and is up to date.
> >> The errors come from different processes on the same machine trying to
> >> write intensively on Riak, using each one their own connection.
> >> The errors start after some time the processes are running.
> >> What can I check?
> >> Thanks in advance
> >> Massimiliano
> >>
> >>
> >> Traceback (most recent call last):
> >>   ...
> >>   File "/usr/local/lib/python2.7/dist-packages/riak/bucket.py", line
> >> 206, in get
> >> return obj.reload(r=r, pr=pr, timeout=timeout)
> >>   File "/usr/local/lib/python2.7/dist-packages/riak/riak_object.py",
> >> line 307, in reload
> >> self.client.get(self, r=r, pr=pr, timeout=timeout)
> >>   File
> "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py",
> >> line 127, in wrapper
> >> return self._with_retries(pool, thunk)
> >>   File
> "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py",
> >> line 69, in _with_retries
> >> return fn(transport)
> >>   File
> "/usr/local/lib/python2.7/dist-packages/riak/client/transport.py",
> >> line 125, in thunk
> >> return fn(self, transport, *args, **kwargs)
> >>   File
> "/usr/local/lib/python2.7/dist-packages/riak/client/operations.py",
> >> line 333, in get
> >> return transport.get(robj, r=r, pr=pr, timeout=timeout)
> >>   File
> >>
> "/usr/local/lib/python2.7/dist-packages/riak/transports/pbc/transport.py",
> >> line 146, in get
> >> MSG_CODE_GET_RESP)
> >>   File
> >>
> "/usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py",
> >> line 43, in _request
> >> return self._recv_msg(expect)
> >>   File
> >>
> "/usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py",
> >> line 55, in _recv_msg
> >> raise RiakError(err.errmsg)
> >> RiakError: 'timeout'
> >>
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> >
> >
> > --
> > PageFreezer.com
> > #200 - 311 Water Street
> > Vancouver,  BC  V6B 1B8
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Partitions placement

2014-03-18 Thread Ciprian Manea
Hi Daniel,

target_n_val can be changed at any time and will take effect at the first
iteration of the claim algorithm [0] (which usually runs whenever you
add/remove nodes from the cluster via the riak-admin command)

On default settings, Riak is able to replicate data to distinct nodes in
all clusters with more than 3 nodes.

[0]
https://github.com/basho/riak_core/blob/develop/src/riak_core_claim.erl#L429


Thanks,
Ciprian


On Mon, Mar 17, 2014 at 5:50 PM, Daniel Iwan  wrote:

> Hi Ciprian
>
> Thanks for reply
> I'm assuming 'overlay' you are talking about are vnodes?
> When creating cluster and joining 2 nodes to first node (3-node cluster)
> there should be possible distributing partitions to guarantee 3 copies are
> on distinct machines. Simple sequential vnode assignment would do.
> Then I guess it's a matter of calculating distances between indexes within
> each of nodes. Partitions that do not meet that criteria should be moved
> when scaling up.
>
> Scaling up in my opinion should be easier as more nodes in the cluster,
> sequential partitions are more spread horizontally so probability that a
> server will hold sequential partitions decreases.
>
> With 3 servers it should be guaranteed that at least 2 copies are on
> distinct servers. On 5 servers should be guaranteed 3 copies are on
> distinct
> servers. etc.
>
> Does Riak give such guarantees?
>
> Can target_n_val be changed later. What are the implications? Is there a
> description what algorithm will be used for partition placement?
>
> Cheers
> Daniel
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Partitions-placement-tp4030664p4030679.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: change ring claimant

2014-04-23 Thread Ciprian Manea
Hi Daniil,

You should mark the claimant node as down.

Run the following command on another node:

riak-admin down riak@


Regards,
Ciprian


On Wed, Apr 23, 2014 at 12:13 PM, Daniil Churikov  wrote:

> Hello riak users.
>
> We have a riak-1.3.2 cluster with 3 nodes. One of this node was physically
> shut down, and now any cluster operations fails with reason:
>
> # riak-admin cluster force-remove riak@10.3.12.80
> Attempting to restart script through sudo -H -u riak
> Remove failed, see log for details
>
> 2014-04-18 08:52:38.639 [error]
> <0.18341.8>@riak_core_console:stage_remove:234 Remove failed
> exit:{{nodedown,'riak@10.3.12.80'},{gen_server,call,[{riak_core_claimant,'
> riak@10.3.12.80'},{stage,'riak@10.3.12.80',remove},infinity]}}
>
> ==> error.log <==
> 2014-04-18 08:52:38.639 [error]
> <0.18341.8>@riak_core_console:stage_remove:234 Remove failed
> exit:{{nodedown,'riak@10.3.12.80'},{gen_server,call,[{riak_core_claimant,'
> riak@10.3.12.80'},{stage,'riak@10.3.12.80',remove},infinity]}}
>
> As I understood this node was a claimant node and all cluster operations
> must be done by this node. The question is, how should I change claimant in
> order to fix cluster operations?
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/change-ring-claimant-tp4031018.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak handoffs stalled

2014-07-14 Thread Ciprian Manea
Hi Leonid,

Which Riak version are you running?

Have you committed* the cluster plan after issuing the cluster force-remove
 commands?

What is the output of $ riak-admin transfer-limit, ran from one of your
riak nodes?


*Do not run this command yet if you have not done it already.
Please run a riak-admin cluster plan and attach its output here.


Thanks,
Ciprian


On Mon, Jul 14, 2014 at 2:41 PM, Леонид Рябоштан <
leonid.riabosh...@twiket.com> wrote:

> Hello, guys,
>
> It seems like we ran into emergency. I wonder if there can be any help on
> that.
>
> Everything that happened below was because we were trying to rebalace
> space used by nodes that we running out of space.
>
> Cluster is 7 machines now, member_status looks like:
> Attempting to restart script through sudo -u riak
> = Membership
> ==
> Status RingPendingNode
>
> ---
> valid  15.6% 20.3%'riak@192.168.135.180'
> valid   0.0%  0.0%'riak@192.168.152.90'
> valid   0.0%  0.0%'riak@192.168.153.182'
> valid  26.6% 23.4%'riak@192.168.164.133'
> valid  27.3% 21.1%'riak@192.168.177.36'
> valid   8.6% 15.6%'riak@192.168.194.138'
> valid  21.9% 19.5%'riak@192.168.194.149'
>
> ---
> Valid:7 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
> 2 nodes with 0 Ring was made to force leave the cluster, they have plenty
> of data on them which is now seems to be not accessible. Handoffs are stuck
> it seems. Node 'riak@192.168.152.90'(is in same situation as '
> riak@192.168.153.182') tries to handoff partitions to '
> riak@192.168.164.133' but fails for unknown reason after huge
> timeouts(from 5 to 40 minutes). Partition it's trying to move is about 10Gb
> in size. It grows slowly on target node, but probably it's just usual
> writes from normal operation. It doesn't get any smaller on source node.
>
> I wonder is there any way to let cluster know that we want those nodes to
> be actually members of source node and there's no actual need to transfer
> them? How to redo cluster ownership balance? Revert this force-leave stuff.
>
> Thank you,
> Leonid
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak handoffs stalled

2014-07-14 Thread Ciprian Manea
Hi Leonid,

Lets try to increase the handoff_timeout and see if it can solve your
problem.

Could you please paste the below code in a $ riak attach

riak_core_util:rpc_every_member_ann(application,set_env,[riak_core,
handoff_timeout, 540],infinity).
riak_core_util:rpc_every_member_ann(application,set_env,[riak_core,
handoff_receive_timeout, 540],infinity).

You should be able to exit back at the shell prompt by pressing ^D

Could you please archive/compress and send me directly by email:

+ the ring directory (including its content) from one of your riak nodes
+ recent log files (console.log, error.log, crash.log if any), same node


Thanks,
Ciprian


On Mon, Jul 14, 2014 at 3:33 PM, Леонид Рябоштан <
leonid.riabosh...@twiket.com> wrote:

> Hello,
>
> riak version is 1.1.4-1. We set transfer limit in config made it equal to
> 4.
>
> I don't think we have riak-admin transfer-limit or riak-admin cluster plan.
>
> The problem is that damn nodes can't pass partition between each other,
> probably because they're too big. Each 5k files(leveldb backend) and
> weights 10GB each. There're no problems with smaller partitions. We can't
> find anything usefull on handoff fail in riak or system logs. Seems like
> ulimit and erlang ports are way higher, we increased it 4 times today.
>
> It begins like:
> 2014-07-14 12:22:45.518 UTC [info]
> <0.10544.0>@riak_core_handoff_sender:start_fold:83 Starting handoff of
> partition riak_kv_vnode 68507889249886074290797726533575766546371837952
> from 'riak@192.168.153.182' to 'riak@192.168.164.133'
>
> And ends like:
> 2014-07-14 08:43:28.829 UTC [error]
> <0.2264.0>@riak_core_handoff_sender:start_fold:152 Handoff of partition
> riak_kv_vnode 68507889249886074290797726533575766546371837952 from '
> riak@192.168.153.182' to 'riak@192.168.164.133' FAILED after sending
> 1318000 objects in 1455.15 seconds: closed
> 2014-07-14 10:40:18.294 UTC [error]
> <0.11555.0>@riak_core_handoff_sender:start_fold:152 Handoff of partition
> riak_kv_vnode 68507889249886074290797726533575766546371837952 from '
> riak@192.168.153.182' to 'riak@192.168.164.133' FAILED after sending
> 911000 objects in 2734.48 seconds: closed
> 2014-07-14 09:43:43.197 UTC [error]
> <0.26922.2>@riak_core_handoff_sender:start_fold:152 Handoff of partition
> riak_kv_vnode 68507889249886074290797726533575766546371837952 from '
> riak@192.168.153.182' to 'riak@192.168.164.133' FAILED after sending
> 32000 objects in 963.06 seconds: timeout
>
> Maybe we need to check something else on target node? Actually it always
> runs in GC problems:
> 2014-07-14 12:30:03.579 UTC [info]
> <0.99.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.468.0>
> [{initial_call,{riak_kv_js_vm,init,1}},{almost_current_function,{xmerl_ucs,expand_utf8_1,3}},{message_queue_len,0}]
> [{timeout,118},{old_heap_block_size,0},{heap_block_size,196418},{mbuf_size,0},{stack_size,45},{old_heap_size,0},{heap_size,136165}]
> 2014-07-14 12:30:44.386 UTC [info]
> <0.99.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc <0.713.0>
> [{initial_call,{riak_core_vnode,init,1}},{almost_current_function,{gen_fsm,loop,7}},{message_queue_len,0}]
> [{timeout,126},{old_heap_block_size,0},{heap_block_size,1597},{mbuf_size,0},{stack_size,38},{old_heap_size,0},{heap_size,658}]
>
> Probably we have some CPU issues here, but node is not under load
> currently.
>
> Thank you,
> Leonid
>
>
> 2014-07-14 16:11 GMT+04:00 Ciprian Manea :
>
> Hi Leonid,
>>
>> Which Riak version are you running?
>>
>> Have you committed* the cluster plan after issuing the cluster
>> force-remove  commands?
>>
>> What is the output of $ riak-admin transfer-limit, ran from one of your
>> riak nodes?
>>
>>
>> *Do not run this command yet if you have not done it already.
>> Please run a riak-admin cluster plan and attach its output here.
>>
>>
>> Thanks,
>> Ciprian
>>
>>
>> On Mon, Jul 14, 2014 at 2:41 PM, Леонид Рябоштан <
>> leonid.riabosh...@twiket.com> wrote:
>>
>>> Hello, guys,
>>>
>>> It seems like we ran into emergency. I wonder if there can be any help
>>> on that.
>>>
>>> Everything that happened below was because we were trying to rebalace
>>> space used by nodes that we running out of space.
>>>
>>> Cluster is 7 machines now, member_status looks like:
>>> Attempting to restart script through sudo -u riak
>>> = Membership

Re: Riak Search Question

2014-07-17 Thread Ciprian Manea
Hi Andrew,

Looks like a streaming search operation was timing out, unable to generate
more results:

https://github.com/basho/riak_search/blob/develop/src/riak_search_op_utils.erl#L174

This could happen if another node involved in this operation became
unavailable (due to network segmentation, a crash, administrative stop)


Regards,
Ciprian


On Thu, Jul 17, 2014 at 9:22 PM, Andrew Zeneski 
wrote:

> I'm seeing this in my crash log, can anyone help me understand what it
> means?
>
> 2014-07-17 17:21:20 =ERROR REPORT
> Error in process <0.11353.4454> on node 'riak@x.x.x.x' with exit value:
> {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4,[{file,"src/riak_search_op_utils.erl"},{line,174}]}]}
>
> 2014-07-17 17:21:20 =SUPERVISOR REPORT
>  Supervisor: {local,riak_pipe_builder_sup}
>  Context:child_terminated
>  Reason:
> {{nocatch,stream_timeout},[{riak_search_op_utils,gather_stream_results,4,[{file,"src/riak_search_op_utils.erl"},{line,174}]}]}
>  Offender:
> [{pid,<0.10701.4454>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]
>
> Thanks!
>
> Andrew
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Number of ports for Riak

2014-08-15 Thread Ciprian Manea
Hi Simon,

Quick answer: the more (ports open), the merrier. At least 6.

As these are Erlang specific settings, I recommend having a look at the
official answer [0].

[0] http://www.erlang.org/faq/how_do_i.html#idp27500560


Regards,
Ciprian


On Fri, Aug 15, 2014 at 3:15 PM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Hi,
>
>
>
> In the security documentation here:
>
>
>
> http://docs.basho.com/riak/latest/ops/advanced/security/
>
>
>
> Instructions are given for how to limit the port ranges used for internode
> communications. E.g.
>
>
>
> { kernel, [
>
> {inet_dist_listen_min, 6000},
>
> {inet_dist_listen_max, 7999}
>
>   ]},
>
>
>
> Are there any recommendations on how many ports should be allowed. I’m
> assume this is related to the number of nodes in the cluster, and possible
> some details of the clients.
>
>
>
> Can anyone provide guidance on this?
>
>
>
> Thanks,
>
>
>
> Simon.
>  Confidentiality: The contents of this e-mail and any attachments
> transmitted with it are intended to be confidential to the intended
> recipient; and may be privileged or otherwise protected from disclosure. If
> you are not an intended recipient of this e-mail, do not duplicate or
> redistribute it by any means. Please delete it and any attachments and
> notify the sender that you have received it in error. This e-mail is sent
> by a William Hill PLC group company. The William Hill group companies
> include, among others, William Hill PLC (registered number 4212563),
> William Hill Organization Limited (registered number 278208), William Hill
> US HoldCo Inc, WHG (International) Limited (registered number 99191) and
> WHG Trading Limited (registered number 101439). Each of William Hill PLC,
> William Hill Organization Limited is registered in England and Wales and
> has its registered office at Greenside House, 50 Station Road, Wood Green,
> London N22 7TP. William Hill U.S. HoldCo, Inc. is 160 Greentree Drive,
> Suite 101, Dover 19904, Kent, Delaware, United States of America. Each of
> WHG (International) Limited and WHG Trading Limited is registered in
> Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar.
> Unless specifically indicated otherwise, the contents of this e-mail are
> subject to contract; and are not an official statement, and do not
> necessarily represent the views, of William Hill PLC, its subsidiaries or
> affiliated companies. Please note that neither William Hill PLC, nor its
> subsidiaries and affiliated companies can accept any responsibility for any
> viruses contained within this e-mail and it is your responsibility to scan
> any emails and their attachments. William Hill PLC, its subsidiaries and
> affiliated companies may monitor e-mail traffic data and also the content
> of e-mails for effective operation of the e-mail system, or for security,
> purposes..
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Did a force-remove of two nodes, now system is unresponsive

2014-08-18 Thread Ciprian Manea
Hi Marcel,

What is the configured ring size for this cluster?

You can slow down the transfers by running $ riak-admin transfer-limit 1 in
one of your riak nodes. iowait should decrease as well once transfer-limit
is lowered, unless one of your disks is failing or is about to fail.


Regards,
Ciprian


On Mon, Aug 18, 2014 at 9:18 PM, marcel.koopman 
wrote:

> We have a 5 node riak cluster.
> Two nodes in this cluster, had to be removed because they are no longer
> available (since a half year).
> So a force remove was done.
>
> After this, the 3 remaining nodes began to transfer all data. So we ended
> up
> with a complete unresponsive system. The iowait is blocking us now.
> So we are hoping that this will settle today, the next transaction was
> actually adding two new nodes.
>
> And yes this is production, Is there any chance we lost data?
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Did-a-force-remove-of-two-nodes-now-system-is-unresponsive-tp4031603.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 2.x running with Riak Java client 1.4.x?

2014-11-10 Thread Ciprian Manea
Hi Guido,

Yes, you can run the 1.4.x java client against riak 2.0 as long as you
don't activate the newer features like security and bucket types.


Regards,
Ciprian

On Mon, Nov 10, 2014 at 2:58 PM, Guido Medina 
wrote:

> Hi,
>
> Is it possible to run the Riak Java client 1.4.x against Riak 2.x? At the
> moment we would have to do a major refactor in our application to support
> Riak Java client v2.x so I'm wondering if it is possible to first migrate
> to Riak v2.x before we start our refactor.
>
> Best regards,
>
> Guido.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Random timeouts on Riak

2014-12-29 Thread Ciprian Manea
Hi Jason,

Are these random timeouts happening for only one key, or is common for more?

What is the CPU utilisation in the cluster when you're experience these
timeouts?

Can you spot anything peculiar in your server's $ dmesg outputs? Any I/O
errors there?


Regards,
Ciprian

On Mon, Dec 29, 2014 at 1:55 PM, Sargun Dhillon  wrote:

> Several things:
> 1) I recommend you have a 5-node cluster:
> http://basho.com/why-your-riak-cluster-should-have-at-least-five-nodes/
> 2) What version of Riak are you using?
> 3) What backend(s) are you using?
> 4) What's the size of your keyspace?
> 5) Are you actively rewriting keys, or writing keys to the cluster?
> 6) Do you know how much I/O the cluster is currently doing?
>
> On Mon, Dec 29, 2014 at 2:51 AM, Jason Ryan 
> wrote:
> > Hi,
> >
> > We are getting random timeouts from our application (>60seconds) when we
> try
> > to retrieve a key from our Riak cluster (4 nodes with a load balancer in
> > front of them). Our application just uses the standard REST API to query
> > Riak.
> >
> > We are pretty new to Riak - so would like to understand how best to debug
> > this issue? Is there any good pointers on what to start with? This is our
> > production cluster.
> >
> > Thanks,
> > Jason
> >
> >
> > This message is for the named person's use only. If you received this
> > message in error, please immediately delete it and all copies and notify
> the
> > sender. You must not, directly or indirectly, use, disclose, distribute,
> > print, or copy any part of this message if you are not the intended
> > recipient. Any views expressed in this message are those of the
> individual
> > sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425
> and
> > trades from 2100 Cork Airport Business Park, Cork, Ireland.
> >
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: cannot connect to DB

2015-01-12 Thread Ciprian Manea
Hi Ildar,

Please have a look at the configuration files: /etc/riak/app.config and
/etc/riak/vm.config

By default Riak binds to localhost, but you can change that using the
following snippet:

export riakIP=$(ifconfig eth0 | grep 'inet addr' | cut -d: -f2 | cut -d' '
-f1)

sudo sed -i "s/127.0.0.1/$riakIP/" /etc/riak/app.config /etc/riak/vm.args


Regards,
Ciprian

On Mon, Jan 12, 2015 at 3:51 PM, Ildar Alishev 
wrote:

> Also maybe someone can help me, i have installed riak using apt-get
> install riak, i don’t understand where did it installed? How i can create
> node there?
>
>
> Thank you!
> > 12 янв. 2015 г., в 16:37, Alexander Sicular 
> написал(а):
> >
> > Hi Ildar,
> >
> > Please take a look at the docs,
> http://docs.basho.com/riak/latest/ops/building/basic-cluster-setup/ , you
> need to set up your IP address most likely.
> >
> > -Alexander
> >
> >
> > @siculars
> > http://siculars.posthaven.com
> >
> > Sent from my iRotaryPhone
> >
> >> On Jan 12, 2015, at 06:56, Ildar Alishev 
> wrote:
> >>
> >> Hello. Have started RIAK in UBUNTU64 14.04, trying to connect from
> another computer inside local network, but cannot connect. Maybe i should
> set up ipadress somewhere?
> >> ___
> >> riak-users mailing list
> >> riak-users@lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak client

2015-01-15 Thread Ciprian Manea
Hi Ildar,

We have a web GUI for riak called reckon [0]

While not in active development, it's a good starting point to browse your
riak data.

Please note that Rekon should NOT be used on a production cluster!

[1] https://github.com/basho/rekon


Regards,
Ciprian

On Thu, Jan 15, 2015 at 11:07 AM, Ildar Alishev 
wrote:

> Hello everyone.
>
>
> Would like to ask is there any riak clients like phpmyadmin for mysql
> db?
> I mean i’d like to have an easy web faced (with frontend) client is there
> any
>
>
> Than you!
> Ildar.
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Active Anti-Entropy detailed description

2015-01-22 Thread Ciprian Manea
Hi Simon,

Please find below some pointers regarding AAE concepts [0] and management
[1]

[0] http://docs.basho.com/riak/1.4.12/theory/concepts/aae/
[1] http://docs.basho.com/riak/1.4.12/ops/advanced/aae/


Regards,
Ciprian

On Thu, Jan 22, 2015 at 1:21 PM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Hi,
>
>
>
> We have some issue in a development Riak cluster. Basically we are seeing
> continuous AAE activity which we are not expecting.
>
>
>
> Can you point me at any specific and detailed documentation on the AAE
> subsystem, how it works both theoretically and practically, and how to
> interpret the log output.
>
>
>
> Thanks,
>
>
>
> *Simon Hartley*
>
> Architect
>
>
>
> Email: simon.hart...@williamhill.com
>
> Skype: *+44 (0)113 397 6747 <%2B44%20%280%29113%20397%206747>*
>
> Skype: *sijomons*
>
>
>
> *William Hill Online*, St. Johns, Merrion St. Leeds, LS2 8LQ
>
> [image: Description: Description: Description:
> cid:image002.png@01CC2FFA.24244CF0]
>
>
>  Confidentiality: The contents of this e-mail and any attachments
> transmitted with it are intended to be confidential to the intended
> recipient; and may be privileged or otherwise protected from disclosure. If
> you are not an intended recipient of this e-mail, do not duplicate or
> redistribute it by any means. Please delete it and any attachments and
> notify the sender that you have received it in error. This e-mail is sent
> by a William Hill PLC group company. The William Hill group companies
> include, among others, William Hill PLC (registered number 4212563),
> William Hill Organization Limited (registered number 278208), William Hill
> US HoldCo Inc, WHG (International) Limited (registered number 99191) and
> WHG Trading Limited (registered number 101439). Each of William Hill PLC,
> William Hill Organization Limited is registered in England and Wales and
> has its registered office at Greenside House, 50 Station Road, Wood Green,
> London N22 7TP. William Hill U.S. HoldCo, Inc. is 160 Greentree Drive,
> Suite 101, Dover 19904, Kent, Delaware, United States of America. Each of
> WHG (International) Limited and WHG Trading Limited is registered in
> Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar.
> Unless specifically indicated otherwise, the contents of this e-mail are
> subject to contract; and are not an official statement, and do not
> necessarily represent the views, of William Hill PLC, its subsidiaries or
> affiliated companies. Please note that neither William Hill PLC, nor its
> subsidiaries and affiliated companies can accept any responsibility for any
> viruses contained within this e-mail and it is your responsibility to scan
> any emails and their attachments. William Hill PLC, its subsidiaries and
> affiliated companies may monitor e-mail traffic data and also the content
> of e-mails for effective operation of the e-mail system, or for security,
> purposes..
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Simple 3 node test cluster eating all my memory

2015-02-12 Thread Ciprian Manea
Hi Simon,

Looking at this problem from another angle, a ring size of 128 is too large
for just 3 servers with 4 GB RAM each. For instance when dimensioning a
cluster with LevelDB backend we recommend our customers to observe the
calculations on this spreadsheet [0].

Filling the above spreadsheet with your system's details (3 nodes, 4 GB
RAM) we get a ring-size of 16 for the riak cluster. Would you be able to
recreate the cluster with ring-size 16 and test again?

[0]
https://docs.google.com/spreadsheet/ccc?key=0AnW_U8Qe8NdYdGk2V3Qza0VRNkxyRFNGUVVCV0c3V3c&usp=sharing#gid=0


Thanks,
Ciprian

On Thu, Feb 12, 2015 at 11:51 AM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Hi,
>
>
>
> We have a simple 3 node test cluster, and we are seeing this cluster
> fall-over under quite modest loads, 2-3 times a day with the node reporting
> out of memory problems.
>
>
>
> Our basic details are:
>
>
>
> · Riak 1.4.9
>
> · 3 nodes, each being:
>
> o   A virtual RedHat EL
>
> o   2 x 2.2GHz CPU
>
> o   4GB RAM
>
> · In-memory back end
>
> · 128 segment ring
>
>
>
> We have tried to limit the in-memory max memory by the following setting
> in app.config:
>
>
>
> %% Memory Config
>
> {memory_backend, [
>
>  {max_memory, 32}, %% 32 megabytes
>
>  {ttl, 86400}  %% 1 Day in seconds
>
>]},
>
>
>
> The behaviour we are seeing is a sudden and rapid increase in memory
> allocated to riak, up to 100% available RAM, and then a crash.
>
>
>
> This is happening on all 3 nodes (at different times).
>
>
>
> When looking through the console.log just prior to the out of memory /
> crash we see log entries similar to the following appearing:
>
>
>
> 2015-02-11 16:50:27.356 [info]
> <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap
> <0.238.0>
> [{name,riak_core_gossip},{initial_call,{riak_core_gossip,init,1}},{almost_current_function,{riak_core_gossip,update_gossip_vers
>
> ion,1}},{message_queue_len,78}]
> [{old_heap_block_size,0},{heap_block_size,47828850},{mbuf_size,0},{stack_size,15},{old_heap_size,0},{heap_size,13944475}]
>
>
>
> We can’t see any errors at the appropriate times in the error.log
>
>
>
> We see the following in erlang.log.1:
>
>
>
> = ALIVE Wed Feb 11 17:00:24 GMT 2015
>
> /usr/lib64/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed.
>
> Erlang has closed
>
>
>
> Crash dump was written to: /var/log/riak/erl_crash.dump
>
> eheap_alloc: Cannot allocate 4454408120 bytes of memory
>
>
>
> =
>
>
>
> Anyone any ideas?
>
>
>
> *Simon Hartley*
>
> Solutions Architect
>
>
>
> Email: simon.hart...@williamhill.com
>
> Skype: *+44 (0)113 397 6747 <%2B44%20%280%29113%20397%206747>*
>
> Skype: *sijomons*
>
>
>
> *William Hill Online*, St. Johns, Merrion St. Leeds, LS2 8LQ
>
> [image: Description: Description: Description:
> cid:image002.png@01CC2FFA.24244CF0]
>
>
>  Confidentiality: The contents of this e-mail and any attachments
> transmitted with it are intended to be confidential to the intended
> recipient; and may be privileged or otherwise protected from disclosure. If
> you are not an intended recipient of this e-mail, do not duplicate or
> redistribute it by any means. Please delete it and any attachments and
> notify the sender that you have received it in error. This e-mail is sent
> by a William Hill PLC group company. The William Hill group companies
> include, among others, William Hill PLC (registered number 4212563),
> William Hill Organization Limited (registered number 278208), William Hill
> US HoldCo Inc, WHG (International) Limited (registered number 99191) and
> WHG Trading Limited (registered number 101439). Each of William Hill PLC,
> William Hill Organization Limited is registered in England and Wales and
> has its registered office at Greenside House, 50 Station Road, Wood Green,
> London N22 7TP. William Hill U.S. HoldCo, Inc. is 160 Greentree Drive,
> Suite 101, Dover 19904, Kent, Delaware, United States of America. Each of
> WHG (International) Limited and WHG Trading Limited is registered in
> Gibraltar and has its registered office at 6/1 Waterport Place, Gibraltar.
> Unless specifically indicated otherwise, the contents of this e-mail are
> subject to contract; and are not an official statement, and do not
> necessarily represent the views, of William Hill PLC, its subsidiaries or
> affiliated companies. Please note that neither William Hill PLC, nor its
> subsidiaries and affiliated companies can accept any responsibility for any
> viruses contained within this e-mail and it is your responsibility to scan
> any emails and their attachments. William Hill PLC, its subsidiaries and
> affiliated companies may monitor e-mail traffic data and also the content
> of e-mails for effective operation of the e-mail system, or for security,
> purposes..
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/ri

Re: Simple 3 node test cluster eating all my memory

2015-02-12 Thread Ciprian Manea
Hi Simon,

The spreadsheet is referenced from the LevelDB's parameter planning [0].

A ring_size defines the number of vnodes (virtual nodes) a riak cluster
runs internally, and as each virtual node is implemented as an Erlang
process, the bigger the ring_size is, the more memory is required from the
operating system.

[0]
http://docs.basho.com/riak/1.4.12/ops/advanced/backends/leveldb/#Parameter-Planning


Regards,
Ciprian

On Thu, Feb 12, 2015 at 1:44 PM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Hi Ciprian,
>
>
>
> Thanks for the answer.
>
>
>
> According to Riak doc “Cluster-Capacity-Planning” (
> http://docs.basho.com/riak/1.4.9/ops/building/planning/cluster/#Ring-Size-Number-of-Partitions
> )
>
>
>
> “The default number of partitions in a Riak cluster is 64. This works for
> smaller clusters, but if you plan to grow your cluster past 5 nodes it is
> recommended you consider a larger ring size.”
>
>
>
> In our staging and production environments we have 5 nodes, and this will
> likely grow, so we chose a ring assize of 128. We like to keep the same
> config across all out environments where possible, so we replicated this in
> the 3-node test environment.
>
>
>
> There is nothing in this document to suggest a upper limit on ring size
> based on number of nodes (or capacity of individual nodes). From where in
> the Riak docs is this spreadsheet referenced?
>
>
>
> I can rebuild the cluster with ring size 16 if necessary, but can you
> explain why the current larger ring size produces the sudden memory spike
> and subsequent crash?
>
>
>
> Thanks,
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 12 February 2015 11:26
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> Looking at this problem from another angle, a ring size of 128 is too
> large for just 3 servers with 4 GB RAM each. For instance when dimensioning
> a cluster with LevelDB backend we recommend our customers to observe the
> calculations on this spreadsheet [0].
>
>
>
> Filling the above spreadsheet with your system's details (3 nodes, 4 GB
> RAM) we get a ring-size of 16 for the riak cluster. Would you be able to
> recreate the cluster with ring-size 16 and test again?
>
>
>
> [0]
> https://docs.google.com/spreadsheet/ccc?key=0AnW_U8Qe8NdYdGk2V3Qza0VRNkxyRFNGUVVCV0c3V3c&usp=sharing#gid=0
>
>
>
>
>
> Thanks,
>
> Ciprian
>
>
>
> On Thu, Feb 12, 2015 at 11:51 AM, Simon Hartley <
> simon.hart...@williamhill.com> wrote:
>
>  Hi,
>
>
>
> We have a simple 3 node test cluster, and we are seeing this cluster
> fall-over under quite modest loads, 2-3 times a day with the node reporting
> out of memory problems.
>
>
>
> Our basic details are:
>
>
>
> · Riak 1.4.9
>
> · 3 nodes, each being:
>
> o   A virtual RedHat EL
>
> o   2 x 2.2GHz CPU
>
> o   4GB RAM
>
> · In-memory back end
>
> · 128 segment ring
>
>
>
> We have tried to limit the in-memory max memory by the following setting
> in app.config:
>
>
>
> %% Memory Config
>
> {memory_backend, [
>
>  {max_memory, 32}, %% 32 megabytes
>
>  {ttl, 86400}  %% 1 Day in seconds
>
>]},
>
>
>
> The behaviour we are seeing is a sudden and rapid increase in memory
> allocated to riak, up to 100% available RAM, and then a crash.
>
>
>
> This is happening on all 3 nodes (at different times).
>
>
>
> When looking through the console.log just prior to the out of memory /
> crash we see log entries similar to the following appearing:
>
>
>
> 2015-02-11 16:50:27.356 [info]
> <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap
> <0.238.0>
> [{name,riak_core_gossip},{initial_call,{riak_core_gossip,init,1}},{almost_current_function,{riak_core_gossip,update_gossip_vers
>
> ion,1}},{message_queue_len,78}]
> [{old_heap_block_size,0},{heap_block_size,47828850},{mbuf_size,0},{stack_size,15},{old_heap_size,0},{heap_size,13944475}]
>
>
>
> We can’t see any errors at the appropriate times in the error.log
>
>
>
> We see the following in erlang.log.1:
>
>
>
> = ALIVE Wed Feb 11 17:00:24 GMT 2015
>
> /usr/lib64/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed.
>
> Erlang has closed
>
>
>
> Crash dump was written to: /var/log/riak/erl_crash.dump
>
> eheap_alloc: Cannot allocate 4454408120 bytes of memory
>
>
>
> =
&

Re: Simple 3 node test cluster eating all my memory

2015-02-13 Thread Ciprian Manea
Hi Simon,

Unfortunately we seem to have the same bug [0] mentioned by Juan Luis also
in the (1.4.9) release you're testing against. A fix is already being
worked on and will be available in an upcoming Riak release.

[0] https://github.com/basho/riak_kv/issues/1064

Thank you,
Ciprian

On Thu, Feb 12, 2015 at 3:19 PM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Brilliant Thanks.
>
>
>
> Is there an equivalent document and spreadsheet (and link form one to the
> other) for the in-memory backend. Given we are not using LevelDB I’ve never
> read that part of the docs.
>
>
>
> Given we have limited the memory per in-memory storage instance (i.e. per
> vnode) to 32MB (in the max_memory setting), and that we would expect ~43
> vnodes per server (128 ring size / 3 nodes = ~43 vnodes per server), that
> gives a maximum backend memory usage of 43 * 32MB = 1376MB in normal
> operation.
>
>
>
> This is significantly less than the amount of memory we see Riak trying to
> grab. Also the spiking behaviour of the memory usage is still unexplained.
>
>
>
> How do we estimate the memory requirements of the remainder of the vnode
> system (i.e. everything but the storage component)?
>
>
>
> Thanks,
>
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 12 February 2015 12:56
>
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> The spreadsheet is referenced from the LevelDB's parameter planning [0].
>
>
>
> A ring_size defines the number of vnodes (virtual nodes) a riak cluster
> runs internally, and as each virtual node is implemented as an Erlang
> process, the bigger the ring_size is, the more memory is required from the
> operating system.
>
>
>
> [0]
> http://docs.basho.com/riak/1.4.12/ops/advanced/backends/leveldb/#Parameter-Planning
>
>
>
>
>
> Regards,
>
> Ciprian
>
>
>
> On Thu, Feb 12, 2015 at 1:44 PM, Simon Hartley <
> simon.hart...@williamhill.com> wrote:
>
>  Hi Ciprian,
>
>
>
> Thanks for the answer.
>
>
>
> According to Riak doc “Cluster-Capacity-Planning” (
> http://docs.basho.com/riak/1.4.9/ops/building/planning/cluster/#Ring-Size-Number-of-Partitions
> )
>
>
>
> “The default number of partitions in a Riak cluster is 64. This works for
> smaller clusters, but if you plan to grow your cluster past 5 nodes it is
> recommended you consider a larger ring size.”
>
>
>
> In our staging and production environments we have 5 nodes, and this will
> likely grow, so we chose a ring assize of 128. We like to keep the same
> config across all out environments where possible, so we replicated this in
> the 3-node test environment.
>
>
>
> There is nothing in this document to suggest a upper limit on ring size
> based on number of nodes (or capacity of individual nodes). From where in
> the Riak docs is this spreadsheet referenced?
>
>
>
> I can rebuild the cluster with ring size 16 if necessary, but can you
> explain why the current larger ring size produces the sudden memory spike
> and subsequent crash?
>
>
>
> Thanks,
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 12 February 2015 11:26
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> Looking at this problem from another angle, a ring size of 128 is too
> large for just 3 servers with 4 GB RAM each. For instance when dimensioning
> a cluster with LevelDB backend we recommend our customers to observe the
> calculations on this spreadsheet [0].
>
>
>
> Filling the above spreadsheet with your system's details (3 nodes, 4 GB
> RAM) we get a ring-size of 16 for the riak cluster. Would you be able to
> recreate the cluster with ring-size 16 and test again?
>
>
>
> [0]
> https://docs.google.com/spreadsheet/ccc?key=0AnW_U8Qe8NdYdGk2V3Qza0VRNkxyRFNGUVVCV0c3V3c&usp=sharing#gid=0
>
>
>
>
>
> Thanks,
>
> Ciprian
>
>
>
> On Thu, Feb 12, 2015 at 11:51 AM, Simon Hartley <
> simon.hart...@williamhill.com> wrote:
>
>  Hi,
>
>
>
> We have a simple 3 node test cluster, and we are seeing this cluster
> fall-over under quite modest loads, 2-3 times a day with the node reporting
> out of memory problems.
>
>
>
> Our basic details are:
>
>
>
> · Riak 1.4.9
>
> · 3 nodes, each being:
>
> o   A virtual RedHat EL
>
> o   2 x 2.2GHz

Re: Simple 3 node test cluster eating all my memory

2015-02-13 Thread Ciprian Manea
Hi Simon,

Would you be able to attach a riak-debug output from your performance test
cluster?

Yes, 1.4.12 is likely to share the same problem as the fix is not shipped
with a recent release.


Thanks,
Ciprian

On Fri, Feb 13, 2015 at 11:51 AM, Simon Hartley <
simon.hart...@williamhill.com> wrote:

>  Hi,
>
>
>
> Thanks for this information.
>
>
>
> Any ideas why our performance test cluster (5 nodes, 24GB RAM, Enterprise
> Riak 1.4.9) doesn’t seem to have the same issue?
>
>
>
> Also, if we move to 1.4.12 are we likely to experience the same problems?
>
>
>
> Thanks,
>
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 13 February 2015 08:17
>
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> Unfortunately we seem to have the same bug [0] mentioned by Juan Luis
> also in the (1.4.9) release you're testing against. A fix is already being
> worked on and will be available in an upcoming Riak release.
>
>
>
> [0] https://github.com/basho/riak_kv/issues/1064
>
>
>
> Thank you,
>
> Ciprian
>
>
>
> On Thu, Feb 12, 2015 at 3:19 PM, Simon Hartley <
> simon.hart...@williamhill.com> wrote:
>
>  Brilliant Thanks.
>
>
>
> Is there an equivalent document and spreadsheet (and link form one to the
> other) for the in-memory backend. Given we are not using LevelDB I’ve never
> read that part of the docs.
>
>
>
> Given we have limited the memory per in-memory storage instance (i.e. per
> vnode) to 32MB (in the max_memory setting), and that we would expect ~43
> vnodes per server (128 ring size / 3 nodes = ~43 vnodes per server), that
> gives a maximum backend memory usage of 43 * 32MB = 1376MB in normal
> operation.
>
>
>
> This is significantly less than the amount of memory we see Riak trying to
> grab. Also the spiking behaviour of the memory usage is still unexplained.
>
>
>
> How do we estimate the memory requirements of the remainder of the vnode
> system (i.e. everything but the storage component)?
>
>
>
> Thanks,
>
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 12 February 2015 12:56
>
>
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> The spreadsheet is referenced from the LevelDB's parameter planning [0].
>
>
>
> A ring_size defines the number of vnodes (virtual nodes) a riak cluster
> runs internally, and as each virtual node is implemented as an Erlang
> process, the bigger the ring_size is, the more memory is required from the
> operating system.
>
>
>
> [0]
> http://docs.basho.com/riak/1.4.12/ops/advanced/backends/leveldb/#Parameter-Planning
>
>
>
>
>
> Regards,
>
> Ciprian
>
>
>
> On Thu, Feb 12, 2015 at 1:44 PM, Simon Hartley <
> simon.hart...@williamhill.com> wrote:
>
>  Hi Ciprian,
>
>
>
> Thanks for the answer.
>
>
>
> According to Riak doc “Cluster-Capacity-Planning” (
> http://docs.basho.com/riak/1.4.9/ops/building/planning/cluster/#Ring-Size-Number-of-Partitions
> )
>
>
>
> “The default number of partitions in a Riak cluster is 64. This works for
> smaller clusters, but if you plan to grow your cluster past 5 nodes it is
> recommended you consider a larger ring size.”
>
>
>
> In our staging and production environments we have 5 nodes, and this will
> likely grow, so we chose a ring assize of 128. We like to keep the same
> config across all out environments where possible, so we replicated this in
> the 3-node test environment.
>
>
>
> There is nothing in this document to suggest a upper limit on ring size
> based on number of nodes (or capacity of individual nodes). From where in
> the Riak docs is this spreadsheet referenced?
>
>
>
> I can rebuild the cluster with ring size 16 if necessary, but can you
> explain why the current larger ring size produces the sudden memory spike
> and subsequent crash?
>
>
>
> Thanks,
>
>
> Simon.
>
>
>
> *From:* Ciprian Manea [mailto:cipr...@basho.com]
> *Sent:* 12 February 2015 11:26
> *To:* Simon Hartley
> *Cc:* riak-users@lists.basho.com
> *Subject:* Re: Simple 3 node test cluster eating all my memory
>
>
>
> Hi Simon,
>
>
>
> Looking at this problem from another angle, a ring size of 128 is too
> large for just 3 servers with 4 GB RAM each. For instance when dimensioning
> a cluster with LevelDB backe

Re: Riak 1.3.1 crashing with segfault

2015-02-17 Thread Ciprian Manea
Hi Daniel,

Have you investigated your server's dmesg output? Segfaults can be
triggered also by memory corruption. Please check that first.


Regards,
Ciprian

On Tue, Feb 17, 2015 at 1:00 PM, Daniel Iwan  wrote:

> We are experiencing crash of beam.smp on one of nodes in 3-node cluster
> (ring
> 128)
> Distro is Ubuntu 12.04 with 16GB of memory (almost exclusive for Riak)
>
> = Sun Feb 15 10:02:23 UTC 2015
> Erlang has closed/usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup:
> Erlang has closed.
>
> Hi I've got following error in syslog
>
> Feb 15 10:02:23 node2 kernel: [157782.787481] beam.smp[2023]: segfault at
> 85239d0 ip 085239d0 sp 7f47e3fe6d68 error 14 in
> 61.log[7f463e32f000+140]
>
> I'm not sure which causes the other.
>
> 61.log was from leveldb folder but it ahs been deleted after restart of
> Riak I believe
>
> Last thing I could find in console log is AAE activity
>
>
> 2015-02-14 15:24:18.329 [info]
> <0.7258.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 3 keys during
> active anti-entropy exchange of
> {25119559391624893906625833062344003363405824,3} between
> {25119559391624893906625833062344003363405824,'riak@10.173.240.2'} and
> {262613575457896618114724618378707105094425378816,'riak@10.173.240.3'}
> 2015-02-14 15:31:03.594 [info]
> <0.10920.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 58 keys during
> active anti-entropy exchange of
> {376793390874373408599387495934666716005045108736,3} between
> {376793390874373408599387495934666716005045108736,'riak@10.173.240.2'} and
> {388211372416021087647853783690262677096107081728,'riak@10.173.240.2'}
> 2015-02-14 15:33:48.637 [info]
> <0.12367.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 37 keys during
> active anti-entropy exchange of
> {422465317040964124793252646957050560369293000704,3} between
> {422465317040964124793252646957050560369293000704,'riak@10.173.240.2'} and
> {445301280124259482890185222468242482551416946688,'riak@10.173.240.3'}
> 2015-02-14 15:34:03.454 [info]
> <0.12546.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 37 keys during
> active anti-entropy exchange of
> {422465317040964124793252646957050560369293000704,3} between
> {433883298582611803841718934712646521460354973696,'riak@10.173.240.2'} and
> {445301280124259482890185222468242482551416946688,'riak@10.173.240.3'}
> 2015-02-14 15:55:18.518 [info]
> <0.23498.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 1 keys during
> active anti-entropy exchange of
> {1061872283373234151507364761270424381468763488256,3} between
> {1061872283373234151507364761270424381468763488256,'riak@10.173.240.2'}
> and
> {1073290264914881830555831049026020342559825461248,'riak@10.173.240.3'}
> 2015-02-14 15:59:33.522 [info]
> <0.25935.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 1 keys during
> active anti-entropy exchange of
> {1187470080331358621040493926581979953470445191168,3} between
> {119061873006300088960214337575914561507164160,'riak@10.173.240.2'}
> and
> {1210306043414653979137426502093171875652569137152,'riak@10.173.240.3'}
> 2015-02-14 15:59:48.513 [info]
> <0.26044.26>@riak_kv_exchange_fsm:key_exchange:204 Repaired 1 keys during
> active anti-entropy exchange of
> {119061873006300088960214337575914561507164160,3} between
> {119061873006300088960214337575914561507164160,'riak@10.173.240.2'}
> and
> {1210306043414653979137426502093171875652569137152,'riak@10.173.240.3'}
> 2015-02-14 20:08:49.674 [info]
> <0.29386.30>@riak_kv_exchange_fsm:key_exchange:204 Repaired 5 keys during
> active anti-entropy exchange of
> {148433760041419827630061740822747494183805648896,3} between
> {148433760041419827630061740822747494183805648896,'riak@10.173.240.2'} and
> {171269723124715185726994316333939416365929594880,'riak@10.173.240.3'}
> 2015-02-14 20:09:04.516 [info]
> <0.29501.30>@riak_kv_exchange_fsm:key_exchange:204 Repaired 5 keys during
> active anti-entropy exchange of
> {148433760041419827630061740822747494183805648896,3} between
> {159851741583067506678528028578343455274867621888,'riak@10.173.240.2'} and
> {171269723124715185726994316333939416365929594880,'riak@10.173.240.3'}
>
> Is it possible that AAE is causing the problems here.
>
> Regards
> Daniel
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/Riak-1-3-1-crashing-with-segfault-tp4032638.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Fwd: RIAK-CS fast track documentation not working

2015-09-17 Thread Ciprian Manea
Hi,

Please read the following configuration of /etc/riak-cs/riak-cs.conf:

listener = 10.0.2.10:8080
riak_host = 10.0.2.10:8087
stanchion_host = 10.0.2.10:8085

as:

listener = 0.0.0.0:8080
riak_host = 10.0.2.10:8087
stanchion_host = 10.0.2.10:8085

This quick fix will have Riak CS listen on all interfaces and allow you to
continue with its installation.

I'll work with the documentation team to have this fix considered ASAP.
This issue is now tracked at https://github.com/basho/basho_docs/issues/1830


Thank you,
Ciprian

On Tue, Sep 15, 2015 at 12:26 AM, wealthiest.of.all <
wealthiest.of@gmail.com> wrote:

> Is there a different set of documentation out there for installing,
> primarily
> configuring RIAK-CS, RIAK, and Stanchion?  Basically - there are typos here
> and there, and the information is confusing, why don't they simply provide
> the configuration files for people to download and simply ask them to
> change
> the IP address?
>
> Examples of typos:
>
>   They ask to replace everywhere the IP addresses from 127.0.0.1 to your
> own
> IP... cool, then they ask you to test things like:
>
>"curl http://localhost:8080/riak-cs/ping"; but wait, didn't you just
> say to change all the 127.0.0.1 IPs to your own??? localhost is not binding
> to 8080... so it will not work (yes, yes - maybe all they want is for
> people
> to buy their enterprise)
>
>Here is another one among the same lines:
>
>curl -XPOST http://localhost:8080/riak-cs/user \
>   -H 'Content-Type: application/json' \
>   -d '{"email":"ad...@admin.com", "name":"admin"}'
>
>Again 8080 on localhost... well - is anyone testing this stuff do
> they even care to grow their user base...?
>
>It is rather frustrating...   I also tried to install using the vagrant
> one... yes - it almost seemed perfect... since they did all the work...
> but... they forgot to mention that s3cmd assumes that authentication is
> being done with new AWS API so we need to force V2  (signature_v2 = True) -
> also they forgot to mention that HTTPS is not enabled in the vagrant
> fasttrack RIAK-CS - why make the user have to troubleshoot more?  Well -
> that's one way to get them to learn a bit or waste their time Googling for
> the answer...
>
>Despite of all of this - I haven't been able to install successfully on
> my own on real servers (single server in the worse case) - correction, I
> was
> able to install, start everything, but I can not create an admin user... so
> now I am going to copy whatever is in the vagrant files and pull them over
> to my physical systems...
>
>Yes - this is not a question - just hoping these guys can rectify the
> doc
> issues - obviously not much QA going on their documentation - the stuff
> does
> not work, that's a fact.
>
>Regards.
>
>
>
> --
> View this message in context:
> http://riak-users.197444.n3.nabble.com/RIAK-CS-fast-track-documentation-not-working-tp4033444.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak does not have primary partitions running ?

2015-10-14 Thread Ciprian Manea
Hi Mohamad,

It's possible that some of your nodes are down or just restarting which
triggers the "partition not running" in `riak-admin transfers` output.
Please ensure that all riak nodes are running.


Regards,
Ciprian

On Tue, Oct 13, 2015 at 5:48 PM, Mohamad Taufiq 
wrote:

> Whar is the right solution if i have partition not running in my cluster
> when i execute riak-admin transfers command ?
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com