Re: repair-2i stops with "bad argument in call to eleveldb:async_write"

2014-08-08 Thread Effenberg, Simon
Hi @list,

I send an e-mail yesterday but because of the size (logfile attached) it
has to be moderated.. I will retry a smaller version but maybe some
admin can approve the mail?

Cheers
Simon

On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote:
> Simon,
> 
> If you want to get more verbose logging information, you could perform the 
> following to change the logging level, to debug, then run `repair-2i`, and 
> finally switching back to the normal logging level.
> 
> - `riak attach`
> - `(riak@nodename)1> SetDebug = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", debug)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from the node
> 
> You can then revert back to the normal `info` logging level by running the 
> following command via `riak attach`:
> 
> - `riak attach`
> - `(riak@nodename)1> SetInfo = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", info)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from a the node
> 
> Please also see the docs for info on `riak attach` monitoring of repairs.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs
> 
> Repairs can also be monitored using the `riak-admin transfers` command.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair
> 
> Best Regards,
> 
> Bryan Hunt 
> 
> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
> On 6 Aug 2014, at 12:53, Effenberg, Simon  wrote:
> 
> > Hi Engel,
> > 
> > I tried it yesterday but it was the same:
> > 
> > 2014-08-05 17:53:14.728 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 
> > 548063113999088594326381812268606132370974703616
> > 2014-08-05 17:53:14.728 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in 
> > partition 548063113999088594326381812268606132370974703616
> > 2014-08-05 17:53:14.753 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:create_index_data_db:324 Creating temporary 
> > database of 2i data in /var/lib/riak/anti_entropy/2i/tmp_db
> > 2014-08-05 17:53:14.772 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:create_index_data_db:361 Grabbing all index data 
> > for partition 548063113999088594326381812268606132370974703616
> > 2014-08-05 17:58:14.773 UTC [info] 
> > <0.24305.9>@riak_kv_2i_aae:next_partition:160 Finished 2i repair:
> >Total partitions: 1
> >Finished partitions: 1
> >Speed: 100
> >Total 2i items scanned: 0
> >Total tree objects: 0
> >Total objects fixed: 0
> > With errors:
> > Partition: 548063113999088594326381812268606132370974703616
> > Error: index_scan_timeout
> > 
> > Can't we use some erlang commands to execute parts of this manually to 
> > check where the timeout actually happens? Or at least who is timing out?
> > 
> > Cheers
> > Simon
> > 
> > On Tue, Aug 05, 2014 at 10:21:57AM -0400, Engel Sanchez wrote:
> >>   Simon:  The data scan for that partition seems to be taking more than 5
> >>   minutes to collect a batch of 1000 items, so the 2i repair process is
> >>   giving up on it before it has a chance to finish.   You can reduce the
> >>   likelihood of this happening by configuring the batch parameter to
> >>   something small.  In the riak_kv section of the configuration file, set
> >>   this:
> >>   {riak_kv, [
> >>  {aae_2i_batch_size, 10},
> >>  ...
> >>   Let us know if that allows it to finish the repair.  You should still 
> >> look
> >>   into what may be causing the slowness.  A combination of slow disks or
> >>   very large data sets might do it.
> >> 
> >>   On Fri, Aug 1, 2014 at 5:24 AM, Russell Brown 
> >>   wrote:
> >> 
> >> Hi Simon,
> >> Sorry for the delays. I'm on vacation for a couple of days. Will pick
> >> this up on Monday.
> >> 
> >> Cheers
> >> Russell
> >> On 1 Aug 2014, at 09:56, Effenberg, Simon 
> >> wrote:
> >> 
> >>> Hi Russell, @basho
> >>> 
> >>> any updates on this? We still have the issues with 2i (repair is also
> >>> still not possible) and searching for the 2i indexes is reproducable
> >>> creating (for one range I tested) 3 different values.
> >>> 
> >>> I would love to provide anything you need to debug that issue.
> >>> 
> >>> Cheers
> >>> Simon
> >>> 
> >>> On Wed, Jul 30, 2014 at 09:22:56AM +, Effenberg, Simon wrote:
>  Great. Thanks Russell..
>  
>  if you need me to do something.. feel free to ask.
>  
>  Cheers
>  Simon
>  
>  On Wed, Jul 30, 2014 at 10:19:56AM +0100, Russell Brown wrote:
> > Thanks Simon,
> > 
> > I'm g

Re: repair-2i stops with "bad argument in call to eleveldb:async_write"

2014-08-08 Thread Effenberg, Simon
Hi Bryan,

thanks for this. I tried it but to be honest I cannot see any specific
stuff in the logs (on the specific host).

I attached the logfile from the specific node. If you think it is
also/more important to look into the logfiles on the other nodes I can
send them as well.. but a quick look into all of them (searching for
"2i" and "index") didn't show anything unusual.. the only stuff was

2014-08-07 05:44:11.298 UTC [debug] 
<0.969.0>@riak_kv_index_hashtree:handle_call:240 Updating tree: 
(vnode)=633697975561446187189878970435575840553939501056 
(preflist)={61086201247815082909294639492438391837181072,12}

and searching for errors didn't show more than you see in the attached
files:

$ for host in kriak46-{1..7} kriak47-{1..6}; do echo $host; ssh $host "grep 
'^2014-08-07 05' /var/log/riak/console.log | grep -i error" ; done
kriak46-1
2014-08-07 05:38:28.596 UTC [error] <0.8949.566> ** Node 
'c_24556_riak@10.46.109.201' not responding **
2014-08-07 05:42:36.197 UTC [error] <0.24823.566> ** Node 
'c_26945_riak@10.46.109.201' not responding **
2014-08-07 05:43:16.213 UTC [error] <0.26434.566> ** Node 
'c_27071_riak@10.46.109.201' not responding **
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> gen_server <0.1697.0> terminated 
with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, 
<<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process <0.1697.0> 
with 0 neighbours exited with reason: bad argument in call to 
eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor 
{<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with 
{riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with reason 
bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155 in context child_terminated
2014-08-i7 05:50:11.390 UTC [error] <0.20983.567> ** Node 
'c_32188_riak@10.46.109.201' not responding **
kriak46-2
kriak46-3
kriak46-4
kriak46-5
kriak46-6
kriak46-7
kriak47-1
kriak47-2
kriak47-3
kriak47-4
kriak47-5
kriak47-6

You mentioned the partition repair stuff.. do you think I need to try
out the full repair? Is this maybe a way to fix it? Because it is quiet
hard to do this on the cluster (~15 TB of data with AAE stuff and
tombstones and maybe ~10 TB without tombstones and AAE stuff) and I
don't want to start doing this if it won't help.

Cheers
Simon

On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote:
> Simon,
> 
> If you want to get more verbose logging information, you could perform the 
> following to change the logging level, to debug, then run `repair-2i`, and 
> finally switching back to the normal logging level.
> 
> - `riak attach`
> - `(riak@nodename)1> SetDebug = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", debug)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from the node
> 
> You can then revert back to the normal `info` logging level by running the 
> following command via `riak attach`:
> 
> - `riak attach`
> - `(riak@nodename)1> SetInfo = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", info)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from a the node
> 
> Please also see the docs for info on `riak attach` monitoring of repairs.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs
> 
> Repairs can also be monitored using the `riak-admin transfers` command.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair
> 
> Best Regards,
> 
> Bryan Hunt 
> 
> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
> On 6 Aug 2014, at 12:53, Effenberg, Simon  wrote:
> 
> > Hi Engel,
> > 
> > I tried it yesterday but it was the same:
> > 
> > 2014-08-05 17:53:14.728 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 
> > 548063113999088594326381812268606132370974703616
> > 2014-08-05 17:53:14.728 UTC [info] 
> > <0.24306.9>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in 
> > partition 548063113999088594326381812268606132370974703616
> > 2014-08-05 17:53:14.753 UTC 

Re: repair-2i stops with "bad argument in call to eleveldb:async_write"

2014-08-08 Thread Effenberg, Simon
Hi Bryan,

thanks for this. I tried it but to be honest I cannot see any specific
stuff in the logs (on the specific host).

I attached the logfile from the specific node. If you think it is
also/more important to look into the logfiles on the other nodes I can
send them as well.. but a quick look into all of them (searching for
"2i" and "index") didn't show anything unusual.. the only stuff was

2014-08-07 05:44:11.298 UTC [debug] 
<0.969.0>@riak_kv_index_hashtree:handle_call:240 Updating tree: 
(vnode)=633697975561446187189878970435575840553939501056 
(preflist)={61086201247815082909294639492438391837181072,12}

and searching for errors didn't show more than you see in the attached
files:

$ for host in kriak46-{1..7} kriak47-{1..6}; do echo $host; ssh $host "grep 
'^2014-08-07 05' /var/log/riak/console.log | grep -i error" ; done
kriak46-1
2014-08-07 05:38:28.596 UTC [error] <0.8949.566> ** Node 
'c_24556_riak@10.46.109.201' not responding **
2014-08-07 05:42:36.197 UTC [error] <0.24823.566> ** Node 
'c_26945_riak@10.46.109.201' not responding **
2014-08-07 05:43:16.213 UTC [error] <0.26434.566> ** Node 
'c_27071_riak@10.46.109.201' not responding **
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> gen_server <0.1697.0> terminated 
with reason: bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, 
<<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155
2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process <0.1697.0> 
with 0 neighbours exited with reason: bad argument in call to 
eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747
2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor 
{<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with 
{riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with reason 
bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, 
[{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}],
 []) in eleveldb:write/3 line 155 in context child_terminated
2014-08-i7 05:50:11.390 UTC [error] <0.20983.567> ** Node 
'c_32188_riak@10.46.109.201' not responding **
kriak46-2
kriak46-3
kriak46-4
kriak46-5
kriak46-6
kriak46-7
kriak47-1
kriak47-2
kriak47-3
kriak47-4
kriak47-5
kriak47-6

You mentioned the partition repair stuff.. do you think I need to try
out the full repair? Is this maybe a way to fix it? Because it is quiet
hard to do this on the cluster (~15 TB of data with AAE stuff and
tombstones and maybe ~10 TB without tombstones and AAE stuff) and I
don't want to start doing this if it won't help.

Cheers
Simon

On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote:
> Simon,
> 
> If you want to get more verbose logging information, you could perform the 
> following to change the logging level, to debug, then run `repair-2i`, and 
> finally switching back to the normal logging level.
> 
> - `riak attach`
> - `(riak@nodename)1> SetDebug = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", debug)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from the node
> 
> You can then revert back to the normal `info` logging level by running the 
> following command via `riak attach`:
> 
> - `riak attach`
> - `(riak@nodename)1> SetInfo = fun() -> {node(), 
> lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", info)} 
> end.`
> - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).`
> (don't forget the period at the end of these statements)
> - Hit CTRL+C twice to quit from a the node
> 
> Please also see the docs for info on `riak attach` monitoring of repairs.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs
> 
> Repairs can also be monitored using the `riak-admin transfers` command.
> 
> http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair
> 
> Best Regards,
> 
> Bryan Hunt 
> 
> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> 
2014-08-07 05:43:04.469 UTC [notice] <0.57.0>@lager_file_backend:128 Changed 
loglevel of /var/log/riak/console.log to debug
2014-08-07 05:43:14.156 UTC [info] <0.27496.566>@riak_kv_2i_aae:init:139 
Starting 2i repair at speed 100 for partitions 
[319703483166135013357056057156686910549735243776]
2014-08-07 05:43:14.156 UTC [info] 
<0.27497.566>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 
319703483166135013357056057156686910549735243776
2014-08-07 05:43:14.156 UTC [info] 
<0.27497.5

Re: riak_core 2.0.0rc1

2014-08-08 Thread Karolis Petrauskas
Thanks for the info!

Karolis

On Thu, Aug 7, 2014 at 1:23 AM, Jordan West  wrote:
> Karolis,
>
> While there are no planned changes for 2.0.0 in riak_core specifically, at
> this time, we always suggest waiting until the final release to use it in
> production. Additionally, Riak Core 2.0.0, like many other Riak Core
> releases may not be 100% backwards compatible. We encourage testing your
> application with the 2.0.0rc1 release. Please let us know if you find any
> issues.
>
> Cheers,
> Jordan
>
>
> On Mon, Jul 28, 2014 at 11:24 PM, Karolis Petrauskas
>  wrote:
>>
>> Hello,
>>
>> What is the status of riak_core in 2.0.0rc1? Is it resonable to start
>> using it instead of 1.4.10 for production? Or are there big changes
>> pending till the final release?
>>
>> Karolis
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Optimistic Locking with Riak Search / is _version_ required in schema.xml?

2014-08-08 Thread David James
Thanks for the detailed answers!


On Thu, Aug 7, 2014 at 5:57 PM, Ryan Zezeski  wrote:

>
> On Aug 7, 2014, at 5:46 PM, David James  wrote:
>
>
>- Is _version_ required?
>
>
> It should not be required as the documentation says it is only needed for
> real-time GET which Riak Search (Yokozuna) disables since Riak KV provides
> the get/put implementation.
>
>
>- I see SolrCloud mentioned in some documentation (see below)? Does
>Riak Search use it?
>
>
> RS does not make use of SolrCloud at all.  It uses Solr’s Distributed
> Search but that is something that existed well before SolrCloud.  All
> routing and replica administration is handled by Riak.  Each Solr instance
> (one per node) has no awareness of the other nodes except for the explicit
> distributed queries sent by Riak.
>
>
>- How does Riak Search handle optimistic locking?
>
>
> It doesn’t use Solr’s optimistic locking at all.  All key-value semantics
> come from Riak itself.  RS simply indexes an object’s values.
>
>
>
> See this comment on the default_schema.xml on Github:
>
> 
> 
>
>
> https://raw.githubusercontent.com/basho/yokozuna/develop/priv/default_schema.xml
>
>
> Yes, I wrote that TODO.  It is one of many that founds its way into 2.0.0
> :).  You should run fine without this field if you create a custom schema.
>
>
>
> P.S. Per https://wiki.apache.org/solr/SchemaXml
>
>- _version_ [image: ] Solr4.0  -
>This field is used for optimistic locking in SolrCloud
> and it enables Real Time Get
>. If you remove it you must
>also remove the transaction logging from solrconfig.xml, see Real Time
>Get .
>
>
> Just to reiterate what I said above, RS disables the transaction logging
> and thus there is no real time get.  There is no reason for it since that
> is what Riak itself provides.
>
> -Z
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com