same socket, and hence reading "length headers" out
of the middle of messages. We went digging for culprits, found the
missing finally block, then noticed this change downstream.
Will post more info if this doesn't do the trick.
On Thu, Jan 10, 2013 at 11:44 AM, Dietrich Fea
We're seeing instances of a JVM app which talks to riak run out of
memory when riak operations rise in latency or riak becomes otherwise
unresponsive. A heap dump of the JVM at the time of the OOM show that
91% of the 1G (active) heap is consumed by large byte[] instances. In
our case 3 of those by
I don't believe allow_mult is enabled. It shouldn't be at least!
On Dec 28, 2012, at 1:23 PM, Brian Roach wrote:
> On Fri, Dec 28, 2012 at 12:34 PM, Dietrich Featherston
> wrote:
>> Primarily stores but I did see one case of socket timeouts simply building a
>> n
On Dec 28, 2012, at 11:57 AM, Brian Roach wrote:
> On Fri, Dec 28, 2012 at 11:37 AM, Dietrich Featherston
> wrote:
>>
>> All socket operations. It looks as though those that open a new socket are
>> especially
>> impacted. We are running 1.2.1 with the le
fter changing your code to use the new
> 'withoutFetch()' ?
>
> Thanks,
> Brian Roach
>
> On Wed, Dec 26, 2012 at 7:28 PM, Dietrich Featherston wrote:
>> I had rolled out an upgrade to a JVM app that uses rjc 1.0.5. We had
>> upgraded to 1.0.6 to take advantage of n
you see this simply dropping in the 1.0.6 client to your existing
> application, or is this after changing your code to use the new
> 'withoutFetch()' ?
1.0.6 with the inclusion of withoutFetch(). Haven't tried with just the driver
alone.
>
> Thanks,
> Brian Roach
I had rolled out an upgrade to a JVM app that uses rjc 1.0.5. We had
upgraded to 1.0.6 to take advantage of newly added abilities to do a
put without preceding it with a fetch in order to reduce operational
load on the cluster. However, after rolling out this change we
frequently see large rises in
i wrote:
> Would you paste the data for one core from /proc/cpuinfo? And do you know
> the brand of controller running the SSD drives?
>
> Thank you,
> Matthew
>
>
> On Nov 6, 2012, at 7:04 PM, Dietrich Featherston wrote:
>
> Thanks for the feedback. We haven't
ainst you in 1.2.x. I personally introduced a bug
> that hurts performance on that setting. My apologies. I recommend you
> take it below 100M until release notes publish that the bug is fixed.
>
> Matthew
>
>
> On Nov 6, 2012, at 7:04 PM, Dietrich Featherston wrote:
>
>
Erlang have given us some suggested
> changes in our Erlang to leveldb interface this past weekend. That
> information could also give leveldb a throughput boost if proven valid.
> Keep you posted.
>
> But at this time I see nothing that yells "massive slow down". I am of
&g
Seeing the following reported by rjc when doing a 2i scan. Has only started
happening since upgrading half of this cluster's nodes from 1.1 to 1.2.1.
Should we presume this is an incompatibility that will go away when
upgrading the remaining nodes?
com.basho.riak.client.RiakRetryFailedException:
avy times where the
> throttle saves the user experience.
>
> Matthew
>
>
> On Nov 1, 2012, at 8:54 PM, Dietrich Featherston wrote:
>
> Thanks. The amortized stalls may very well describe what we are seeing. If
> I combine leveldb logs from all partitions on one of the up
G files, combined them, then compare
> compaction activity to your graph.
>
> Write stalls are detailed here:
> http://basho.com/blog/technical/2012/10/30/leveldb-in-riak-1p2/
>
> How can I better assist you at this point?
>
> Matthew
>
>
> On Nov 1, 2012, at 8:13 PM, Die
We've just gone through the process of upgrading two riak clusters from 1.1
to 1.2.1. Both are on the leveldb backend backed by RAID0'd SSDs. The
process has gone smoothly and we see that latencies as measured at the
gen_fsm level are largely unaffected.
However, we are seeing some troubling disk
Seeing 99th percentile put latencies at around 30-40 ms with 99.9th
percentile jumping all the way up to 3-4s. This is riak 1.1 with the
eleveldb backend on a 9-node cluster, N = 2, W = 1. Lots of free iops, but
CPU is consistently burning 30-40% across all 8 cores.
Wondering if this could be caus
What is max_open_files set to in the eleveldb section of app.config? If
unspecified I think the limit is 20. Remember that this number is per
vnode. The process limit specified by ulimit -n must be greater than
max_open_files * num_vnodes / num_nodes allowing room for vnode
multiplexing and fallbac
LevelDB is a nice option with a key space that will not fit in memory. Whether
or not bitcask will work for you depends on total memory capacity of the
cluster and N value. Recommend using the bitcask capacity planner to see if it
is a suitable backed for your hardware+data combination.
http://
Suggest implementing security outside of riak. The interface to applications
which use riak for storage should not be riak-dependent. In addition, it would
be wise to avoid exposing storage-level details like bucket choice in the
security model for your applications.
For more details in why ria
You might try coordinating this activity outside of riak if at all
possible. If there is a single point of origin of these events (ie, a
dedicated master for each partition of writes) then you could maintain
reasonable guarantees that you dont need sibling processing on the riak end
since data is b
Been seeing some busy_dist_port errors lately in our riak logs and curious
if it's anything to worry about.
Here is a specific instance of the log msg
https://gist.github.com/881598dc29ca168dba34
More info on our setup
* 9 node cluster (dedicated)
* Ubuntu 10.04
* 32GB ram
* SSDs
* 256K block siz
Hey guys,
I just wrote a new blog post debugging some issues I'm seeing with riak by
looking at the network. Lots of words and pretty pictures here:
http://blog.boundary.com/2012/04/19/hungry-kobayashi-pt1/
What seems to be happening is that cleanup tasks in our app eventually
become the primary
The haproxy approach tends to work well so long as haproxy it is located on
the machine from which the riak client is establishing connections to the
riak cluster. This way the client always talks over localhost and haproxy
is unlikely to be a failure point. So in your service tier, each machine
ru
If you need CAS semantics, then coordinate that outside of riak. Any
check-then-act type of operation where atomicity is important is going to
leave some room for a data race in a system with the distribution semantics
of riak. Would suggest thinking about the problem in such a way that
handling of
ou'll see it is a bit
> of mess, what with the Query types and Index types and all that. I'm happy
> to keep going and finish.
>
> Cheers
>
> Russell
>
> On 30 Jan 2012, at 18:58, Dietrich Featherston wrote:
>
> Hey Russel. Any thoughts on when you'd get
sell Brown wrote:
>
> On 30 Jan 2012, at 18:12, Dietrich Featherston wrote:
>
> I'm using a leveldb-backed riak 1.0.2 and looking for some suggestions to
> fetch a block of data by key range. I have control over the keys and all
> reads out of this setup will involve at minimu
I'm using a leveldb-backed riak 1.0.2 and looking for some suggestions to
fetch a block of data by key range. I have control over the keys and all
reads out of this setup will involve at minimum a key range. It seems that
if leveldb is an ideal candidate for this kind of access pattern so long as
I
26 matches
Mail list logo