Re: [ceph-users] tunable question

2017-10-03 Thread lists

Hi,

What would make the decision easier: if we knew that we could easily 
revert the

> "ceph osd crush tunables optimal"
once it has begun rebalancing data?

Meaning: if we notice that impact is too high, or it will take too long, 
that we could simply again say

> "ceph osd crush tunables hammer"
and the cluster would calm down again?

MJ

On 2-10-2017 9:41, Manuel Lausch wrote:

Hi,

We have similar issues.
After upgradeing from hammer to jewel the tunable "choose leave stabel"
was introduces. If we activate it nearly all data will be moved. The
cluster has 2400 OSD on 40 nodes over two datacenters and is filled with
2,5 PB Data.

We tried to enable it but the backfillingtraffic is to high to be
handled without impacting other services on the Network.

Do someone know if it is neccessary to enable this tunable? And could
it be a problem in the future if we want to upgrade to newer versions
wihout it enabled?

Regards,
Manuel Lausch


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tunable question

2017-10-03 Thread lists

Thanks Jake, for your extensive reply. :-)

MJ

On 3-10-2017 15:21, Jake Young wrote:


On Tue, Oct 3, 2017 at 8:38 AM lists <mailto:li...@merit.unu.edu>> wrote:


Hi,

What would make the decision easier: if we knew that we could easily
revert the
  > "ceph osd crush tunables optimal"
once it has begun rebalancing data?

Meaning: if we notice that impact is too high, or it will take too long,
that we could simply again say
  > "ceph osd crush tunables hammer"
and the cluster would calm down again?


Yes you can revert the tunables back; but it will then move all the data 
back where it was, so be prepared for that.


Verify you have the following values in ceph.conf. Note that these are 
the defaults in Jewel, so if they aren’t defined, you’re probably good:

osd_max_backfills=1
osd_recovery_threads=1

You can try to set these (using ceph —inject) if you notice a large 
impact to your client performance:

osd_recovery_op_priority=1
osd_recovery_max_active=1
osd_recovery_threads=1

I recall this tunables change when we went from hammer to jewel last 
year. It took over 24 hours to rebalance 122TB on our 110 osd  cluster.


Jake



MJ

On 2-10-2017 9:41, Manuel Lausch wrote:
 > Hi,
 >
 > We have similar issues.
 > After upgradeing from hammer to jewel the tunable "choose leave
stabel"
 > was introduces. If we activate it nearly all data will be moved. The
 > cluster has 2400 OSD on 40 nodes over two datacenters and is
filled with
 > 2,5 PB Data.
 >
 > We tried to enable it but the backfillingtraffic is to high to be
 > handled without impacting other services on the Network.
 >
 > Do someone know if it is neccessary to enable this tunable? And could
 > it be a problem in the future if we want to upgrade to newer versions
 > wihout it enabled?
 >
 > Regards,
 > Manuel Lausch
 >
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] why sudden (and brief) HEALTH_ERR

2017-10-03 Thread lists

Hi,

Yesterday I chowned our /var/lib/ceph ceph, to completely finalize our 
jewel migration, and noticed something interesting.


After I brought back up the OSDs I just chowned, the system had some 
recovery to do. During that recovery, the system went to HEALTH_ERR for 
a short moment:


See below, for consecutive ceph -s outputs:


root@pm2:~# ceph -s
cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
 health HEALTH_WARN
1025 pgs degraded
1 pgs recovering
60 pgs recovery_wait
307 pgs stuck unclean
964 pgs undersized
recovery 2477548/8384034 objects degraded (29.551%)
6/24 in osds are down
noout flag(s) set
 monmap e3: 3 mons at 
{0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
election epoch 256, quorum 0,1,2 0,1,2
 osdmap e10222: 24 osds: 18 up, 24 in; 964 remapped pgs
flags noout,sortbitwise,require_jewel_osds
  pgmap v36531103: 1088 pgs, 2 pools, 10703 GB data, 2729 kobjects
32723 GB used, 56657 GB / 89380 GB avail
2477548/8384034 objects degraded (29.551%)
 964 active+undersized+degraded
  63 active+clean
  60 active+recovery_wait+degraded
   1 active+recovering+degraded
recovery io 63410 kB/s, 15 objects/s
  client io 4348 kB/s wr, 0 op/s rd, 630 op/s wr
root@pm2:~# ceph -s
cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
 health HEALTH_WARN
942 pgs degraded
1 pgs recovering
118 pgs recovery_wait
297 pgs stuck unclean
823 pgs undersized
recovery 2104751/8384079 objects degraded (25.104%)
6/24 in osds are down
noout flag(s) set
 monmap e3: 3 mons at 
{0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
election epoch 256, quorum 0,1,2 0,1,2
 osdmap e10224: 24 osds: 18 up, 24 in; 823 remapped pgs
flags noout,sortbitwise,require_jewel_osds
  pgmap v36531118: 1088 pgs, 2 pools, 10703 GB data, 2729 kobjects
32723 GB used, 56657 GB / 89380 GB avail
2104751/8384079 objects degraded (25.104%)
 823 active+undersized+degraded
 146 active+clean
 118 active+recovery_wait+degraded
   1 active+recovering+degraded
recovery io 61945 kB/s, 16 objects/s
  client io 2718 B/s rd, 5997 kB/s wr, 0 op/s rd, 638 op/s wr
root@pm2:~# ceph -s
cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
 health HEALTH_ERR
2 pgs are stuck inactive for more than 300 seconds
761 pgs degraded
2 pgs recovering
181 pgs recovery_wait
2 pgs stuck inactive
273 pgs stuck unclean
543 pgs undersized
recovery 1394085/8384166 objects degraded (16.628%)
4/24 in osds are down
noout flag(s) set
 monmap e3: 3 mons at 
{0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
election epoch 256, quorum 0,1,2 0,1,2
 osdmap e10230: 24 osds: 20 up, 24 in; 543 remapped pgs
flags noout,sortbitwise,require_jewel_osds
  pgmap v36531146: 1088 pgs, 2 pools, 10703 GB data, 2729 kobjects
32724 GB used, 56656 GB / 89380 GB avail
1394085/8384166 objects degraded (16.628%)
 543 active+undersized+degraded
 310 active+clean
 181 active+recovery_wait+degraded
  26 active+degraded
  13 active
   9 activating+degraded
   4 activating
   2 active+recovering+degraded
recovery io 133 MB/s, 37 objects/s
  client io 64936 B/s rd, 9935 kB/s wr, 0 op/s rd, 942 op/s wr
root@pm2:~# ceph -s
cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
 health HEALTH_WARN
725 pgs degraded
27 pgs peering
2 pgs recovering
207 pgs recovery_wait
269 pgs stuck unclean
516 pgs undersized
recovery 1325870/8384202 objects degraded (15.814%)
3/24 in osds are down
noout flag(s) set
 monmap e3: 3 mons at 
{0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0}
election epoch 256, quorum 0,1,2 0,1,2
 osdmap e10233: 24 osds: 21 up, 24 in; 418 remapped pgs
flags noout,sortbitwise,require_jewel_osds
  pgmap v36531161: 1088 pgs, 2 pools, 10703 GB data, 2729 kobjects
32724 GB used, 56656 GB / 89380 GB avail
1325870/8384202 objects degraded (15.814%)
 516 active+undersized+degraded
 336 active+clean
 207 active+recovery_wait+degraded
  27 peering
   2 active+recovering+degraded
recovery io 62886 kB/s, 15 objects/s
  client io 3586 kB/s wr, 0 op/s rd, 251 op/s wr


It was only very briefly, bu

Re: [ceph-users] why sudden (and brief) HEALTH_ERR

2017-10-04 Thread lists

ok, thanks for the feedback Piotr and Dan!

MJ

On 4-10-2017 9:38, Dan van der Ster wrote:

Since Jewel (AFAIR), when (re)starting OSDs, pg status is reset to "never
contacted", resulting in "pgs are stuck inactive for more than 300 seconds"
being reported until osds regain connections between themselves.



Also, the last_active state isn't updated very regularly, as far as I can tell.
On our cluster I have increased this timeout

--mon_pg_stuck_threshold: 1800

(Which helps suppress these bogus HEALTH_ERR's)


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi,

On our three-node 24 OSDs ceph 10.2.10 cluster, we have started seeing 
slow requests on a specific OSD, during the the two-hour nightly xfs_fsr 
run from 05:00 - 07:00. This started after we applied the meltdown patches.


The specific osd.10 also has the highest space utilization of all OSDs 
cluster-wide, with 45%, while the others are mostly around 40%. All OSDs 
are the same 4TB platters with journal on ssd, all with weight 1.


Smart info for osd.10 shows nothing interesting I think:


Current Drive Temperature: 27 C
Drive Trip Temperature:60 C

Manufactured in week 04 of year 2016
Specified cycle count over device lifetime:  1
Accumulated start-stop cycles:  53
Specified load-unload count over device lifetime:  30
Accumulated load-unload cycles:  697
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1933129649
  Blocks received from initiator = 869206640
  Blocks read from cache and sent to initiator = 2149311508
  Number of read and write commands whose size <= segment size = 676356809
  Number of read and write commands whose size > segment size = 12734900

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 13625.88
  number of minutes until next internal SMART test = 8


Now my question:
Could it be that osd.10 just happens to contain some data chunks that 
are heavily needed by the VMs around that time, and that the added load 
of an xfs_fsr is simply too much for it to handle?


In that case, how about reweighting that osd.10 to "0", wait until all 
data has moved off osd.10, and then setting it back to "1". Would this 
result in *exactly* the same situation as before, or would it at least 
cause the data to have spread move better across the other OSDs?


(with the idea that better data spread across OSDs brings also better 
distribution of load between the OSDs)


Or other ideas to check out?

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi Wes,

On 15-1-2018 20:32, Wes Dillingham wrote:
I dont hear a lot of people discuss using xfs_fsr on OSDs and going over 
the mailing list history it seems to have been brought up very 
infrequently and never as a suggestion for regular maintenance. Perhaps 
its not needed.
True, it's just something we've always done on all our xfs filesystems, 
to keep them speedy and snappy. I've disabled it, and then it doesn't 
happen.


Perhaps I'll keep it disabled.

But on this last question, about data distribution across OSDs:


In that case, how about reweighting that osd.10 to "0", wait until
all data has moved off osd.10, and then setting it back to "1".
Would this result in *exactly* the same situation as before, or
would it at least cause the data to have spread move better across
the other OSDs?


Would it work like that? Or would setting it back to "1" give me again 
the same data on this OSD that we started with?


Thanks for your comments,
MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests on a specific osd

2018-01-15 Thread lists

Hi Wes,

On 15-1-2018 20:57, Wes Dillingham wrote:
My understanding is that the exact same objects would move back to the 
OSD if weight went 1 -> 0 -> 1 given the same Cluster state and same 
object names, CRUSH is deterministic so that would be the almost certain 
result.




Ok, thanks! So this would be a useless exercise. :-|

Thanks very much for your feedback, Wes!

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados bench output question

2016-09-06 Thread lists

Hi all,

We're pretty new to ceph, but loving it so far.

We have a three-node cluster, four 4TB OSDs per node, journal (5GB) on 
SSD, 10G ethernet cluster network, 64GB ram on the nodes, total 12 OSDs.


We noticed the following output when using ceph bench:


root@ceph1:~# rados bench -p scbench 600 write --no-cleanup
Maintaining 16 concurrent writes of 4194304 bytes for up to 600 seconds or 0 
objects
Object prefix: benchmark_data_pm1_36584
sec Cur ops  started  finished  avg MB/s  cur MB/s  last lat  avg lat
0  0  0  0  0  0  -  0
1  16  124  108  431.899  432  0.138315  0.139077
2  16  237  221  441.928  452  0.169759  0.140138
3  16  351  335  446.598  456  0.105837  0.139844
4  16  466  450  449.938  460  0.140141  0.139716
5  16  569  553  442.337  412  0.025245  0.139328
6  16  634  618  411.943  260 0.0302609  0.147129
7  16  692  676  386.233  232  1.01843  0.15158
8  16  721  705  352.455  116 0.0224958  0.159924
9  16  721  705  313.293  0  -  0.159924

+-- notice the drop to zero for MB/s

10  16  764  748  299.163  86 0.0629263  0.20961
11  16  869  853  310.144  420 0.0805086  0.204707
12  16  986  970  323.295  468  0.175718  0.196822
13  16  1100  1084  333.5  456  0.171172  0.19105
14  16  1153  1137  324.819  212 0.0468416  0.188643
15  16  1225  1209  322.363  288 0.0421159  0.195791
16  16  1236  1220  304.964  44  1.28629  0.195499
17  16  1236  1220  287.025  0  -  0.195499
18  16  1236  1220  271.079  0  -  0.195499

+-- notice again the drop to zero for MB/s

19  16  1324  1308  275.336  117.333  0.148679  0.231708
20  16  1436  1420  283.967  448  0.120878  0.224367
21  16  1552  1536  292.538  464  0.173587  0.218141
22  16  1662  1646  299.238  440  0.141544  0.212946
23  16  1720  1704  296.314  232 0.0273257  0.211416
24  16  1729  1713  285.467  36 0.0215821  0.211308
25  16  1729  1713  274.048  0  -  0.211308
26  16  1729  1713  263.508  0  -  0.211308

+-- notice again the drop to zero for MB/s

27  16  1787  1771  262.34  77. 0.0338129  0.241103
28  16  1836  1820  259.97  196  0.183042  0.245665
29  16  1949  1933  266.59  452  0.129397  0.239445
30  16  2058  2042  272.235  436  0.165108  0.234447
31  16  2159  2143  276.484  404 0.0466259  0.229704
32  16  2189  2173  271.594  120 0.0206958  0.231772


So regular intervals, the "cur MB/s" appears to drop to zero. If 
meanwhile we ALSO run iperf between two nodes, we can tell that the 
network is fuctioning perfectly: while ceph bench goes to zero, iperf 
continues at max speed. (10G ethernet)


So it seems there is something slowing down ceph at 'regular' intervals. 
Is this normal, and expected, or not? In which case: What do we need to 
look at?


During the 0 MB/sec, there is NO increased cpu usage: it is usually 
around 15 - 20% for the four ceph-osd processes.


Do we have an issue..? And if yes: Anyone with a suggestions where to 
look at?


Some more details:
- ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
- Linux ceph2 4.4.15-1-pve #1 SMP Thu Jul 28 10:54:13 CEST 2016 x86_64 
GNU/Linux


Thanks in advance, and best regards from the netherlands,

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados bench output question

2016-09-06 Thread lists

Hi Christian,

Thanks for your reply.


What SSD model (be precise)?

Samsung 480GB PM863 SSD


Only one SSD?

Yes. With a 5GB partition based journal for each osd.


During the 0 MB/sec, there is NO increased cpu usage: it is usually
around 15 - 20% for the four ceph-osd processes.


Watch your node(s) with atop or iostat.

Ok, I will do.


Do we have an issue..? And if yes: Anyone with a suggestions where to
look at?


You will find that either your journal SSD is overwhelmed and a single
SSD peaking around 500MB/s wouldn't be that surprising.
Or that your HDDs can't scribble away at more than the speed above, the
more likely reason.
Even a combination of both.

Ceph needs to flush data to the OSDs eventually (and that is usually more
or less immediately with default parameters), so for a sustained,
sequential write test you're looking at the speed of your HDDs.
And that will be spiky of sorts, due to FS journals, seeks for other
writes (replicas), etc.
But would we expect the MB/sec to drop to ZERO, during journal-to-osd 
flushes?


Thanks for the quick feedback, and I'll dive into atop and iostat next.

Regards,
MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding additional disks to the production cluster without performance impacts on the existing

2018-06-12 Thread lists

Hi Pardhiv,

Thanks for sharing!

MJ

On 11-6-2018 22:30, Pardhiv Karri wrote:

Hi MJ,

Here are the links to the script and config file. Modify the config file 
as you wish, values in config file can be modified while the script 
execution is in progress. The script can be run from any monitor or data 
node. We tested and the script works in our cluster. Test it in your lab 
before using it in production.


Script Name: osd_crush_reweight.py
Config File Name: rebalance_config.ini

Script: https://jpst.it/1gwrk

Config File: https://jpst.it/1gwsh

--Pardhiv Karri


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2017-04-04 Thread lists

Hi John, list,

On 1-4-2017 16:18, John Petrini wrote:

Just ntp.


Just to follow-up on this: we have yet experienced a clock skew since we 
starting using chrony. Just three days ago, I know, bit still...


Perhaps you should try it too, and report if it (seems to) work better 
for you as well.


But again, just three days, could be I cheer too early.

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2017-04-06 Thread lists

Hi Dan,


did you mean "we have not yet..."?

Yes! That's what I meant.

Chrony does much better a job than NTP, at least here :-)

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd: add failed: (34) Numerical result out of range

2014-06-09 Thread lists+ceph

I was building a small test cluster and noticed a difference with trying
to rbd map depending on whether the cluster was built using fedora or
CentOS.

When I used CentOS osds, and tried to rbd map from arch linux or fedora,
I would get "rbd: add failed: (34) Numerical result out of range".  It
seemed to happen when the tool was writing to /sys/bus/rbd/add_single_major.

If I rebuild the osds using fedora (20 in this case), everything
works fine.

In each scenario, I used ceph-0.80.1 on all the boxes.

Is that expected?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What exactly is the kernel rbd on osd issue?

2014-06-12 Thread lists+ceph
I remember reading somewhere that the kernel ceph clients (rbd/fs) could
not run on the same host as the OSD.  I tried finding where I saw that,
and could only come up with some irc chat logs.

The issue stated there is that there can be some kind of deadlock.  Is
this true, and if so, would you have to run a totally different kernel
in a vm, or would some form of namespacing be enough to avoid it?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw issues

2014-06-15 Thread lists+ceph
I've just tried setting up the radosgw on centos6 according to 
http://ceph.com/docs/master/radosgw/config/


There didn't seem to be an init script in the rpm I installed, so I 
copied the one from here:

https://raw.githubusercontent.com/ceph/ceph/31b0823deb53a8300856db3c104c0e16d05e79f7/src/init-radosgw.sysv
That launches one process.

While I can run the admin commands just fine to create users etc., 
making a simple wget request to the domain I set up returns a 500 due to 
a timeout.  Every request I make results in another radosgw process 
being created, which seems to start even more processes itself.  I only 
have to make a few requests to have about 60 radosgw processes.


I am at a bit of a loss to tell what is going on.  I've included what I 
assume to be the error below.  I see some filures to aquire locks, and 
it complaining that another process has already created the unix socket, 
but don't know how relevant those are.  If anyone could point me in the 
right direction, I would appreciate it.




2014-06-15 20:21:03.814081 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6806/18981 -- ping v1 -- ?+0 0x7f8c98068e50 con 0x1e73630
2014-06-15 20:21:03.814097 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6813/20066 -- ping v1 -- ?+0 0x7f8c980693e0 con 0x1e76460
2014-06-15 20:21:03.814108 7f8c94cf9700  1 -- 10.30.83.29:0/1028955 --> 
10.30.85.60:6810/19519 -- ping v1 -- ?+0 0x7f8c98069600 con 0x1e79b90

2014-⁠06-⁠15 20:21:03.815286 7fd779f84820  0 framework: fastcgi
2014-⁠06-⁠15 20:21:03.815305 7fd779f84820  0 starting handler: fastcgi
2014-06-15 20:21:03.816726 7fd74e6f8700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6800/18433 -- osd_op(client.6329.0:24  [pgls start_epoch 0] 
7.0 ack+read e290) v4 -- ?+0 0x7fd75800a630 con 0x1505210
2014-06-15 20:21:03.817584 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.1 10.30.85.60:6800/18433 9  osd_op_reply(24  [pgls start_epoch 
0] v0'0 uv0 ondisk = 1) v6  167+0+44 (1244889235 0 139081063) 
0x7fd754000ce0 con 0x1505210
2014-06-15 20:21:03.817665 7fd770adb700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6806/18981 -- osd_op(client.6329.0:25  [pgls start_epoch 0] 
7.1 ack+read e290) v4 -- ?+0 0x7fd75800aae0 con 0x15043c0
2014-06-15 20:21:03.819356 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.2 10.30.85.60:6806/18981 5  osd_op_reply(25  [pgls start_epoch 
0] v0'0 uv0 ondisk = 1) v6  167+0+44 (3347405639 0 139081063) 
0x1512950 con 0x15043c0
2014-06-15 20:21:03.819509 7fd770adb700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.61:6800/28605 -- osd_op(client.6329.0:26  [pgls start_epoch 0] 
7.2 ack+read e290) v4 -- ?+0 0x7fd75800c3e0 con 0x7fd75800ac80

2014-⁠06-⁠15 20:21:03.819635 7fd74fafa700  2 garbage collection: start
2014-06-15 20:21:03.819798 7fd74fafa700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6806/18981 -- osd_op(client.6329.0:27 gc.21 [call lock.lock] 
6.6dc01772 ondisk+write e290) v4 -- ?+0 0x7fd7600022b0 con 0x15043c0
2014-06-15 20:21:03.823164 7fd770adb700  1 -- 10.30.83.29:0/1008789 <== 
osd.2 10.30.85.60:6806/18981 6  osd_op_reply(27 gc.21 [call] v0'0 
uv0 ondisk = -16 ((16) Device or resource busy)) v6  172+0+0 
(3774749926 0 0) 0x1512950 con 0x15043c0
2014-06-15 20:21:03.823309 7fd74fafa700  0 RGWGC::process() failed to 
acquire lock on gc.21
2014-06-15 20:21:03.823457 7fd74fafa700  1 -- 10.30.83.29:0/1008789 --> 
10.30.85.60:6810/19519 -- osd_op(client.6329.0:28 gc.22 [call lock.lock] 
6.97748d0d ondisk+write e290) v4 -- ?+0 0x7fd760002bc0 con 0x150b280
2014-06-15 20:21:03.821327 7fd74d2f6700 -1 common/Thread.cc: In function 
'void Thread::create(size_t)' thread 7fd74d2f6700 time 2014-06-15 
20:21:03.819948

common/⁠Thread.cc: 110: FAILED assert(ret == 0)

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (Thread::create(unsigned long)+0x8a) [0x7fd77900363a]
 2: (ThreadPool::start_threads()+0x12e) [0x7fd778fe770e]
 3: (ThreadPool::start()+0x7a) [0x7fd778feaa9a]
 4: (RGWFCGXProcess::run()+0x195) [0x4ae305]
 5: /⁠usr/⁠bin/⁠radosgw() [0x4b3bbe]
 6: (()+0x79d1) [0x7fd7772929d1]
 7: (clone()+0x6d) [0x7fd776fdfb5d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


-⁠-⁠-⁠ begin dump of recent events -⁠-⁠-⁠
  -125> 2014-06-15 20:21:03.205715 7fd779f84820  5 asok(0x14c3070) 
register_command perfcounters_dump hook 0x14c4a80
  -124> 2014-06-15 20:21:03.205761 7fd779f84820  5 asok(0x14c3070) 
register_command 1 hook 0x14c4a80
  -123> 2014-06-15 20:21:03.205766 7fd779f84820  5 asok(0x14c3070) 
register_command perf dump hook 0x14c4a80
  -122> 2014-06-15 20:21:03.205773 7fd779f84820  5 asok(0x14c3070) 
register_command perfcounters_schema hook 0x14c4a80
  -121> 2014-06-15 20:21:03.205780 7fd779f84820  5 asok(0x14c3070) 
register_command 2 hook 0x14c4a80
  -120> 2014-06-15 20:21:03.205784 7fd779f84820  5 asok(0x14c3070) 
register_command perf schema hook 0x14c4a80
  -119> 2014-06-15 20:21:03.205787 7fd779f84820  5 asok(0x14c3070) 
register_command confi

Re: [ceph-users] radosgw issues

2014-06-16 Thread lists+ceph

On 2014-06-17 07:30, John Wilkins wrote:

You followed this intallation guide:
http://ceph.com/docs/master/install/install-ceph-gateway/ [16]

An then you, followed this http://ceph.com/docs/master/radosgw/config/
[1] configuration guide and then you executed:

sudo /etc/init.d/ceph-radosgw start
And there was no ceph-radosgw script? We need to verify that first,
and file a bug if we're not getting an init script in CentOS packages.



I took a look again, and the package I had installed seemed to have come 
from epel, and did not contain the init script.  I started from scratch 
with a minimal install of centos6 that hadn't been used for anything 
else.  The package from the ceph repo does indeed have the init script.


Unfortunately, I'm still running into the same issue.  I removed all the 
rgw pools, started ceph-radosgw, and it recreated a few of them:

.rgw.root
.rgw.control
.rgw
.rgw.gc
.users.uid

Manually creating the rest of them has no effect.  It complains about 
aquiring locks and listing objects:
2014-06-17 00:31:45.150494 7f86ec450820  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 1704
2014-06-17 00:31:45.150556 7f86ec450820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-06-17 00:31:45.150590 7f86ec450820 -1 WARNING: cross zone / region 
transfer performance may be affected

2014-06-17 00:32:02.469894 7f86ec450820  0 framework: fastcgi
2014-06-17 00:32:02.469958 7f86ec450820  0 starting handler: fastcgi
2014-06-17 00:32:13.455904 7f86b700 -1 failed to list objects 
pool_iterate returned r=-2
2014-06-17 00:32:13.455918 7f86b700  0 ERROR: lists_keys_next(): 
ret=-2
2014-06-17 00:32:13.455924 7f86b700  0 ERROR: sync_all_users() 
returned ret=-2
2014-06-17 00:32:13.611812 7f86d95f9700  0 RGWGC::process() failed to 
acquire lock on gc.16
2014-06-17 00:32:14.105180 7f86d95f9700  0 RGWGC::process() failed to 
acquire lock on gc.0




If I make a request, the server eventually fills up with so many radosgw 
process that the apache user can no longer fork any new processes.


This is an strace from apache:

read(13, "GET / HTTP/1.1\r\nUser-Agent: Wget/1.15 (linux-gnu)\r\nAccept: 
*/*\r\nHost: gateway.ceph.chc.tlocal\r\nConnection: Keep-Alive\r\n\r\n", 
8000) = 121
stat("/s3gw.fcgi", 0x7fffcc662580)  = -1 ENOENT (No such file or 
directory)
stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) 
= 0
open("/var/www/html/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such 
file or directory)
open("/var/www/html/s3gw.fcgi/.htaccess", O_RDONLY|O_CLOEXEC) = -1 
ENOTDIR (Not a directory)

open("/var/www/html/s3gw.fcgi", O_RDONLY|O_CLOEXEC) = 14
fcntl(14, F_GETFD)  = 0x1 (flags FD_CLOEXEC)
fcntl(14, F_SETFD, FD_CLOEXEC)  = 0
read(14, "#!/bin/sh\nexec /usr/bin/radosgw -c /etc/ceph/ceph.conf -n 
client.radosgw.gateway\n", 4096) = 81
stat("/var/www/html/s3gw.fcgi", {st_mode=S_IFREG|0755, st_size=81, ...}) 
= 0

brk(0x7fab146ce000) = 0x7fab146ce000
write(2, "[Mon Jun 16 22:26:03 2014] [warn] FastCGI: 10.30.85.51 GET 
http://gateway.ceph.chc.tlocal/ auth \n", 97) = 97
stat("/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615", 
{st_mode=S_IFSOCK|0600, st_size=0, ...}) = 0

socket(PF_FILE, SOCK_STREAM, 0) = 15
connect(15, {sa_family=AF_FILE, 
path="/var/run/mod_fastcgi/dynamic/2a13e94a006b7f947a721cf995159615"}, 
63) = 0

fcntl(15, F_GETFL)  = 0x2 (flags O_RDWR)
fcntl(15, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
select(16, [15], [15], NULL, {3, 99784}) = 1 (out [15], left {3, 99781})
write(15, 
"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0\r\0\0\n\1SCRIPT_URL/\1\4\0\1\0+\0\0\n\37SCRIPT_URIhttp://gateway.ceph.chc.tlocal/\1\4\0\1\0\24\0\0\22\0HTTP_AUTHORIZATION\1\4\0\1\0&\0\0\17\25HTTP_USER_AGENTWget/1.15 
(linux-gnu)\1\4\0\1\0\20\0\0\v\3HTTP_ACCEPT*/*\1\4\0\1\0\"\0\0\t\27HTTP_HOSTgateway.ceph.chc.tlocal\1\4\0\1\0\33\0\0\17\nHTTP_CONNECTIONKee"..., 
841) = 841

select(16, [15], [], NULL, {3, 99624})  = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996562}) = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996992}) = 0 (Timeout)
write(12, "T /var/www/html/s3gw.fcgi 0 0*", 30) = 30
select(16, [15], [], NULL, {2, 996700}^C 
...


This continues until it times out.

and in /var/log/ceph/client.radosgw.gateway.log this repeats as all the 
other processes start




2014-06-16 22:27:28.411653 7f84742fb820  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process radosgw, pid 12225
2014-06-16 22:27:28.411668 7f84742fb820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-06-16 22:27:28.411672 7f84742fb820 -1 WARNING: cross zone / region 
transfer performance may be affected
2014-06-16 22:27:28.420286 7f84742fb820 -1 asok(0x8f2fe0) 
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed 
to bind the UNIX domain socket 

Re: [ceph-users] radosgw issues

2014-06-30 Thread lists+ceph

On 2014-06-16 13:16, lists+c...@deksai.com wrote:

I've just tried setting up the radosgw on centos6 according to
http://ceph.com/docs/master/radosgw/config/



While I can run the admin commands just fine to create users etc.,
making a simple wget request to the domain I set up returns a 500 due
to a timeout.  Every request I make results in another radosgw process
being created, which seems to start even more processes itself.  I
only have to make a few requests to have about 60 radosgw processes.



Guess I'll try again.  I gave this another shot, following the 
documentation, and still end up with basically a fork bomb rather than 
the nice ListAllMyBucketsResult output that the docs say I should get.  
Everything else about the cluster works fine, and I see others talking 
about the gateway as if it just worked, so I'm led to believe that I'm 
probably doing something stupid.  Has anybody else run into the 
situation where apache times out while fastcgi just launches more and 
more processes?


The init script launches a process, and the webserver seems to launch 
the same thing, so I'm not clear on what should be happening here.  
Either way, I get nothing back when making a simple GET request to the 
domain.


If anybody has suggestions, even if they are "You nincompoop!  Everybody 
knows that you need to do such and such", that would be helpful.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw issues

2014-07-08 Thread lists+ceph




Guess I'll try again.  I gave this another shot, following the
documentation, and still end up with basically a fork bomb rather than
the nice ListAllMyBucketsResult output that the docs say I should get.
 Everything else about the cluster works fine, and I see others
talking about the gateway as if it just worked, so I'm led to believe
that I'm probably doing something stupid.


For the benefit of anyone that was sitting on the edge of their seat 
waiting for me to figure this out, I found that indeed I had done 
something stupid.  Somehow I managed to miss the warning highlighted in 
red, set apart by itself and labeled "Important" in the documentation.


I had not turned off FastCgiWrapper in /etc/httpd/conf.d/fastcgi.conf.  
Fixing that made everything work just fine.

Thanks to all who offered help off list!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph User Teething Problems

2015-03-04 Thread Datatone Lists
I have been following ceph for a long time. I have yet to put it into
service, and I keep coming back as btrfs improves and ceph reaches
higher version numbers.

I am now trying ceph 0.93 and kernel 4.0-rc1.

Q1) Is it still considered that btrfs is not robust enough, and that
xfs should be used instead? [I am trying with btrfs].

I followed the manual deployment instructions on the web site 
(http://ceph.com/docs/master/install/manual-deployment/) and I managed
to get a monitor and several osds running and apparently working. The
instructions fizzle out without explaining how to set up mds. I went
back to mkcephfs and got things set up that way. The mds starts.

[Please don't mention ceph-deploy]

The first thing that I noticed is that (whether I set up mon and osds
by following the manual deployment, or using mkcephfs), the correct
default pools were not created.

bash-4.3# ceph osd lspools
0 rbd,
bash-4.3# 

 I get only 'rbd' created automatically. I deleted this pool, and
 re-created data, metadata and rbd manually. When doing this, I had to
 juggle with the pg- num in order to avoid the 'too many pgs for osd'.
 I have three osds running at the moment, but intend to add to these
 when I have some experience of things working reliably. I am puzzled,
 because I seem to have to set the pg-num for the pool to a number that
 makes (N-pools x pg-num)/N-osds come to the right kind of number. So
 this implies that I can't really expand a set of pools by adding osds
 at a later date. 

Q2) Is there any obvious reason why my default pools are not getting
created automatically as expected?

Q3) Can pg-num be modified for a pool later? (If the number of osds is 
increased dramatically).

Finally, when I try to mount cephfs, I get a mount 5 error.

"A mount 5 error typically occurs if a MDS server is laggy or if it
crashed. Ensure at least one MDS is up and running, and the cluster is
active + healthy".

My mds is running, but its log is not terribly active:

2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93 
(bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors 
{default=true}

(This is all there is in the log).

I think that a key indicator of the problem must be this from the
monitor log:

2015-03-04 16:53:20.715132 7f3cd0014700  1
mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
[2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem
disabled

(I have added the '' sections to obscure my ip address)

Q4) Can you give me an idea of what is wrong that causes the mds to not
play properly?

I think that there are some typos on the manual deployment pages, for
example:

ceph-osd id={osd-num}

This is not right. As far as I am aware it should be:

ceph-osd -i {osd-num}

An observation. In principle, setting things up manually is not all
that complicated, provided that clear and unambiguous instructions are
provided. This simple piece of documentation is very important. My view
is that the existing manual deployment instructions gets a bit confused
and confusing when it gets to the osd setup, and the mds setup is
completely absent.

For someone who knows, this would be a fairly simple and fairly quick 
operation to review and revise this part of the documentation. I
suspect that this part suffers from being really obvious stuff to the
well initiated. For those of us closer to the start, this forms the
ends of the threads that have to be picked up before the journey can be
made.

Very best regards,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-05 Thread Datatone Lists
opment
now that the lead developer has moved from Oracle to, I think, Facebook.

I now share the view that I think Robert LeBlanc has, that maybe btrfs
will now stand the ceph test.

Thanks, Lincoln Bryant, for confirming that I can increase the size of
pools in line with increasing osd numbers. I felt that this had to be
the case, otherwise the 'scalable' claim becomes a bit limited.

Returning from these digressions to my own experience; I set up my
cephfs file system as illuminated by John Spray. I mounted it and
started to rsync a multi-terabyte filesystem to it. This is my test, if
cephfs handles this without grinding to a snails pace or failing, I
will be ready to start to commit my data to it. My osd disk lights
started to flash and flicker and a comforting sound of drive activity
issued forth. I checked the osd logs, and to my dismay, there were
crash reports in them all. However, a closer look revealed that I am
getting the "too many open files" messages that precede the failures.

I can see that this is not an osd failure, but a resource limit issue.

I completely acknowledge that I must now RTFM, but I will ask whether
anybody can give any guidance, based on experience, with respect to
this issue.

Thank you again for all for the previous prompt and invaluable advice
and information.

David


On Wed, 4 Mar 2015 20:27:51 +
Datatone Lists  wrote:

> I have been following ceph for a long time. I have yet to put it into
> service, and I keep coming back as btrfs improves and ceph reaches
> higher version numbers.
> 
> I am now trying ceph 0.93 and kernel 4.0-rc1.
> 
> Q1) Is it still considered that btrfs is not robust enough, and that
> xfs should be used instead? [I am trying with btrfs].
> 
> I followed the manual deployment instructions on the web site 
> (http://ceph.com/docs/master/install/manual-deployment/) and I managed
> to get a monitor and several osds running and apparently working. The
> instructions fizzle out without explaining how to set up mds. I went
> back to mkcephfs and got things set up that way. The mds starts.
> 
> [Please don't mention ceph-deploy]
> 
> The first thing that I noticed is that (whether I set up mon and osds
> by following the manual deployment, or using mkcephfs), the correct
> default pools were not created.
> 
> bash-4.3# ceph osd lspools
> 0 rbd,
> bash-4.3# 
> 
>  I get only 'rbd' created automatically. I deleted this pool, and
>  re-created data, metadata and rbd manually. When doing this, I had to
>  juggle with the pg- num in order to avoid the 'too many pgs for osd'.
>  I have three osds running at the moment, but intend to add to these
>  when I have some experience of things working reliably. I am puzzled,
>  because I seem to have to set the pg-num for the pool to a number
> that makes (N-pools x pg-num)/N-osds come to the right kind of
> number. So this implies that I can't really expand a set of pools by
> adding osds at a later date. 
> 
> Q2) Is there any obvious reason why my default pools are not getting
> created automatically as expected?
> 
> Q3) Can pg-num be modified for a pool later? (If the number of osds
> is increased dramatically).
> 
> Finally, when I try to mount cephfs, I get a mount 5 error.
> 
> "A mount 5 error typically occurs if a MDS server is laggy or if it
> crashed. Ensure at least one MDS is up and running, and the cluster is
> active + healthy".
> 
> My mds is running, but its log is not terribly active:
> 
> 2015-03-04 17:47:43.177349 7f42da2c47c0  0 ceph version 0.93 
> (bebf8e9a830d998eeaab55f86bb256d4360dd3c4), process ceph-mds, pid 4110
> 2015-03-04 17:47:43.182716 7f42da2c47c0 -1 mds.-1.0 log_to_monitors 
> {default=true}
> 
> (This is all there is in the log).
> 
> I think that a key indicator of the problem must be this from the
> monitor log:
> 
> 2015-03-04 16:53:20.715132 7f3cd0014700  1
> mon.ceph-mon-00@0(leader).mds e1 warning, MDS mds.?
> [2001:8b0::5fb3::1fff::9054]:6800/4036 up but filesystem
> disabled
> 
> (I have added the '' sections to obscure my ip address)
> 
> Q4) Can you give me an idea of what is wrong that causes the mds to
> not play properly?
> 
> I think that there are some typos on the manual deployment pages, for
> example:
> 
> ceph-osd id={osd-num}
> 
> This is not right. As far as I am aware it should be:
> 
> ceph-osd -i {osd-num}
> 
> An observation. In principle, setting things up manually is not all
> that complicated, provided that clear and unambiguous instructions are
> provided. This simple piece of documentation is very important. My
> view is that the existing manual deployment instructions gets a bit
> confused and confusing when it gets to th

[ceph-users] Replacing a failed OSD disk drive (or replace XFS with BTRFS)

2015-03-21 Thread Datatone Lists
I have been experimenting with Ceph, and have some OSDs with drives
containing XFS filesystems which I want to change to BTRFS.
(I started with BTRFS, then started again from scratch with XFS
[currently recommended] in order to eleminate that as a potential cause
of some issues, now with further experience, I want to go back to
BTRFS, but have data in my cluster and I don't want to scrap it).

This is exactly equivalent to the case in which I have an OSD with a
drive that I see is starting to error. I would in that case need to
replace the drive and recreate the Ceph structures on it.

So, I mark the OSD out, and the cluster automatically eliminates its
notion of data stored on the OSD and creates copies of the affected PGs
elsewhere to make the cluster healthy again.

All of the disk replacement instructions that I see then tell me to
then follow an OSD removal process:

"This procedure removes an OSD from a cluster map, removes its
authentication key, removes the OSD from the OSD map, and removes the
OSD from the ceph.conf file".

This seems to me to be too heavy-handed. I'm worried about doing this
and then effectively adding a new OSD where I have the same id number
as the OSD that I apparently unnecessarily removed.

I don't actually want to remove the OSD. The OSD is fine, I just want
to replace the disk drive that it uses.

This suggests that I really want to take the OSD out, allow the cluster
to get healthy again, then (replace the disk if this is due to
failure,) create a new BTRFS/XFS filesystem, remount the drive, then
recreate the Ceph structures on the disk to be compatible with the old
disk and the original OSD that it was attached to.

The OSD then gets marked back in, the cluster says "hello again, we
missed you, but its good to see you back, here are some PGs ...".

What I'm saying is that I really don't want to destroy the OSD, I want
to refresh it with a new disk/filesystem and put it back to work.

Is there some fundamental reason why this can't be done? If not, how
should I do it?

Best regards,
David

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados rm objects, still appear in rados ls

2018-09-28 Thread Frank (lists)

Hi,

On my cluster I tried to clear all objects from a pool. I used the 
command "rados -p bench ls | xargs rados -p bench rm". (rados -p bench 
cleanup doesn't clean everything, because there was a lot of other 
testing going on here).


Now 'rados -p bench ls' returns a list of objects, which don't exists: 
[root@ceph01 yum.repos.d]# rados -p bench stat 
benchmark_data_ceph01.example.com_1805226_object32453
 error stat-ing 
bench/benchmark_data_ceph01.example.com_1805226_object32453: (2) No such 
file or directory


I've tried scrub and deepscrub the pg the object is in, but the problem 
persists. What causes this?


I use Centos 7.5 with mimic 13.2.2


regards,

Frank de Bot

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Expected performane with Ceph iSCSI gateway

2018-05-28 Thread Frank (lists)

Hi,

I an test cluster (3 nodes, 24 osd's) I'm testing the ceph iscsi gateway 
(with http://docs.ceph.com/docs/master/rbd/iscsi-targets/). For a client 
I used a seperate server, everything runs Centos 7.5. The iscsi gateway 
are located on 2 of the existing nodes in the cluster.


How does iscsi perform compared to krbd? I've already did some 
benchmarking, but it didn't performed any near what krbd is doing. krbd 
easily saturates  the public netwerk, iscsi about 75%. Tmcu-runner is 
running during a benchmark at a load of 50 to 75% on the (owner)target



Regards,

Frank de Bot
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Frequent slow requests

2018-06-14 Thread Frank (lists)

Hi,

On a small cluster (3 nodes) I frequently have slow requests. When 
dumping the inflight ops from the hanging OSD, it seems it doesn't get a 
'response' for one of the subops. The events always look like:


    "events": [
    {
    "time": "2018-06-14 07:10:07.256196",
    "event": "initiated"
    },
    {
    "time": "2018-06-14 07:10:07.256671",
    "event": "queued_for_pg"
    },
    {
    "time": "2018-06-14 07:10:07.256745",
    "event": "reached_pg"
    },
    {
    "time": "2018-06-14 07:10:07.256826",
    "event": "started"
    },
    {
    "time": "2018-06-14 07:10:07.256924",
    "event": "waiting for subops from 18,20"
    },
    {
    "time": "2018-06-14 07:10:07.263769",
    "event": "op_commit"
    },
    {
    "time": "2018-06-14 07:10:07.263775",
    "event": "op_applied"
    },
    {
    "time": "2018-06-14 07:10:07.269989",
    "event": "sub_op_commit_rec from 18"
    }
             ]

The OSD id's are not the same. Looking at osd.20, the OSD process runs, 
it accepts requests ('ceph tell osd.20 bench' runs fine). When I restart 
the process for the OSD, the requests is completed.
I could not find any pattern on which OSD is too blame (always an other 
one) or one of the servers, it's also differs.


The cluster runs Ceph 7.5 with 'ceph version 12.2.5 
(cad919881333ac92274171586c827e01f554a70a) luminous (stable)'. It's just 
a testcluster with very little activity. What could be a cause of an 
(replica)OSD not replying?


Regards,

Frank de Bot

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FreeBSD Initiator with Ceph iscsi

2018-06-28 Thread Frank (lists)

Jason Dillaman wrote:
Conceptually, I would assume it should just work if configured 
correctly w/ multipath (to properly configure the ALUA settings on the 
LUNs). I don't run FreeBSD, but any particular issue you are seeing?


When logged in to both targets,  the following message floods the log

WARNING: 192.168.5.109 (iqn.2018-06.lan.x.iscsi-gw:ceph-igw): underflow 
mismatch: target indicates 0, we calculated 512

(da1:iscsi6:0:0:0): READ(10). CDB: 28 00 0c 7f ff ff 00 00 01 00
(da1:iscsi6:0:0:0): CAM status: SCSI Status Error
(da1:iscsi6:0:0:0): SCSI status: Check Condition
(da1:iscsi6:0:0:0): SCSI sense: NOT READY asc:4,b (Logical unit not 
accessible, target port in standby state)

(da1:iscsi6:0:0:0): Error 6, Unretryable error
(da1:iscsi6:0:0:0): Invalidating pack

For both sessions the message are the same (besides numbering of devices)

When trying to read from either of the devices (da1 and da2 in my case), 
FreeBSD gives the error 'Device not configured'. When using gmultipath, 
manually created, because FreeBSD is not able to write a label to either 
of the devices, the created multipath is not functional because it 
markes both devices as FAIL






On Tue, Jun 26, 2018 at 6:06 PM Frank de Bot (lists) 
mailto:li...@searchy.net>> wrote:


Hi,

In my test setup I have a ceph iscsi gateway (configured as in
http://docs.ceph.com/docs/luminous/rbd/iscsi-overview/ )

I would like to use thie with a FreeBSD (11.1) initiator, but I
fail to
make a working setup in FreeBSD. Is it known if the FreeBSD initiator
(with gmultipath) can work with this gateway setup?


Regards,

Frank
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radula - radosgw(s3) cli tool

2015-09-08 Thread Andrew Bibby (lists)
Hey cephers,
Just wanted to briefly announce the release of a radosgw CLI tool that solves 
some of our team's minor annoyances. Called radula, a nod to the patron animal, 
this utility acts a lot like s3cmd with some tweaks to meet the expectations of 
our researchers.
 
https://pypi.python.org/pypi/radula
https://github.com/bibby/radula
 
I've seen a lot of boto wrappers, and yup - it's just another one. But, it 
could still have value for users, so we put it out there.
 
Here's a quick at its features:
- When a user is granted read access to a bucket, they're not given read access 
to any of the existing keys. radula applies bucket ACL changes to existing 
keys, and can synchronize anomalies. New keys are issued a copy of the bucket's 
ACL. Permissions are also kept from duplicating like they can on AWS and rados.
 
- Unless they are tiny, uploads are always multi-parted and multi-threaded. The 
file can then be checksum verified to have uploaded correctly.

- CLI and importable python module
 
- Typical s3cmd-like commands (mb, rb, lb, etc) leaning directly on boto; no 
clever rewrites.
 
We hope someone finds it useful.
Have a good rest of the week!
 
- Andrew Bibby
- DevOps, NantOmics, LLC

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados rm objects, still appear in rados ls

2018-09-28 Thread Frank de Bot (lists)
John Spray wrote:
> On Fri, Sep 28, 2018 at 2:25 PM Frank (lists)  wrote:
>>
>> Hi,
>>
>> On my cluster I tried to clear all objects from a pool. I used the
>> command "rados -p bench ls | xargs rados -p bench rm". (rados -p bench
>> cleanup doesn't clean everything, because there was a lot of other
>> testing going on here).
>>
>> Now 'rados -p bench ls' returns a list of objects, which don't exists:
>> [root@ceph01 yum.repos.d]# rados -p bench stat
>> benchmark_data_ceph01.example.com_1805226_object32453
>>   error stat-ing
>> bench/benchmark_data_ceph01.example.com_1805226_object32453: (2) No such
>> file or directory
>>
>> I've tried scrub and deepscrub the pg the object is in, but the problem
>> persists. What causes this?
> 
> Are you perhaps using a cache tier pool?

The pool had 2 snaps. After removing those, the ls command returned no
'non-existing' objects. I expected that ls would only return objects of
the current contents, I did not specify -s for working with snaps of the
pool.

> 
> John
> 
>>
>> I use Centos 7.5 with mimic 13.2.2
>>
>>
>> regards,
>>
>> Frank de Bot
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Frequent slow requests

2018-06-19 Thread Frank de Bot (lists)
Frank (lists) wrote:
> Hi,
> 
> On a small cluster (3 nodes) I frequently have slow requests. When
> dumping the inflight ops from the hanging OSD, it seems it doesn't get a
> 'response' for one of the subops. The events always look like:
> 

I've done some further testing, all slow request are blocked by OSD's on
 a single host.  How can I debug this problem further? I can't find any
errors or other strange things on the host with osd's that are seemingly
not sending a response to an op.


Regards,

Frank de Bot

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FreeBSD Initiator with Ceph iscsi

2018-06-26 Thread Frank de Bot (lists)
Hi,

In my test setup I have a ceph iscsi gateway (configured as in
http://docs.ceph.com/docs/luminous/rbd/iscsi-overview/ )

I would like to use thie with a FreeBSD (11.1) initiator, but I fail to
make a working setup in FreeBSD. Is it known if the FreeBSD initiator
(with gmultipath) can work with this gateway setup?


Regards,

Frank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] FreeBSD Initiator with Ceph iscsi

2018-06-30 Thread Frank de Bot (lists)
I've crossposted the problem to the freebsd-stable mailinglist. There is
no ALUA support at the initiator side. There were 2 options for
multipathing:

1. Export your LUNs via two (or more) different paths (for example
   via two different target portal IP addresses), on the initiator
   side set up both iSCSI sessions in the usual way (like without
   multipathing), add kern.iscsi.fail_on_disconnection=1 to
   /etc/sysctl.conf, and set up gmultipath on top of LUNs reachable
   via those sessions

2. Set up the target so it redirects (sends "Target moved temporarily"
   login responses) to the target portal it considers active.  Then
   set up the initiator (single session) to either one; the target
   will "bounce it" to the right place.  You don't need gmultipath
   in this case, because from the initiator point of view there's only
   one iSCSI session at any time.

Would an of those 2 options be possible on the ceph iscsi gateway
solution to configure?


Regards,

Frank

Jason Dillaman wrote:
> Conceptually, I would assume it should just work if configured correctly
> w/ multipath (to properly configure the ALUA settings on the LUNs). I
> don't run FreeBSD, but any particular issue you are seeing?
> 
> On Tue, Jun 26, 2018 at 6:06 PM Frank de Bot (lists)  <mailto:li...@searchy.net>> wrote:
> 
> Hi,
> 
> In my test setup I have a ceph iscsi gateway (configured as in
> http://docs.ceph.com/docs/luminous/rbd/iscsi-overview/ )
> 
> I would like to use thie with a FreeBSD (11.1) initiator, but I fail to
> make a working setup in FreeBSD. Is it known if the FreeBSD initiator
> (with gmultipath) can work with this gateway setup?
> 
> 
> Regards,
> 
> Frank
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com