Re: [ceph-users] MDS damaged

2017-10-26 Thread Ronny Aasen

if you were following this page:
http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-pg/


then there is normally hours of troubleshooting in the following 
paragraph, before finally admitting defeat and marking the object as lost:


"It is possible that there are other locations where the object can 
exist that are not listed. For example, if a ceph-osd is stopped and 
taken out of the cluster, the cluster fully recovers, and due to some 
future set of failures ends up with an unfound object, it won’t consider 
the long-departed ceph-osd as a potential location to consider. (This 
scenario, however, is unlikely.)"



Also this warning is important regarding the loosing of objects:
"Use this with caution, as it may confuse applications that expected the 
object to exist."


mds is definitiftly such an application. i think rgw would be the only 
application that loosing a object could be acceptable, depending on what 
used the object storage.  rbd and cephfs will have issues of varying 
degree. One could argue that the mark-unfound-lost command should have a 
--yes-i-mean-it type of warning, especialy of the pool application is 
cephfs or rbd



This is ofcourse a bit late now that the object is marked as lost. but 
for your future reference: since you had a inconsistent pg, most likely 
you had one corrupt object and 1 or more OK object on some osd. and 
using the methods written about in 
http://ceph.com/geen-categorie/ceph-manually-repair-object/ might have 
recovered that object for you.


kind regards
Ronny Aasen



On 26. okt. 2017 04:38, dani...@igb.illinois.edu wrote:

Hi Ronny,

 From the documentation, I thought this was the proper way to resolve the
issue.

Dan


On 24. okt. 2017 19:14, Daniel Davidson wrote:

Our ceph system is having a problem.

A few days a go we had a pg that was marked as inconsistent, and today I
fixed it with a:

#ceph pg repair 1.37c

then a file was stuck as missing so I did a:

#ceph pg 1.37c mark_unfound_lost delete
pg has 1 objects unfound and apparently lost marking


sorry i can not assist on the corrupt mds part. i have no experience in
that part.

But I felt this escaleted a bit quick. since this is a "i accept lost
object" type of command, the consequences are quite ugly, depending on
what the missing object was for.  Did you do much troubleshooting before
jumping to this command so you were certain there was no other non
dataloss options ?

kind regards
Ronny Aasen



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] crush optimize does not work

2017-10-26 Thread Stefan Priebe - Profihost AG
Hello,

while trying to optimize a ceph cluster running jewel i get the
following output:
2017-10-26 10:43:27,615 argv = optimize --crushmap
/home/spriebe/ceph.report --out-path /home/spriebe/optimized.crush
--pool 5 --pool=5 --choose-args=5 --replication-count=3 --pg-num=4096
--pgp-num=4096 --rule=data --out-version=j --no-positions
2017-10-26 10:43:27,646 root optimizing
2017-10-26 10:43:29,329 root already optimized
2017-10-26 10:43:29,337 cloud1-1475 optimizing
2017-10-26 10:43:29,348 cloud1-1474 optimizing
2017-10-26 10:43:29,353 cloud1-1473 optimizing
2017-10-26 10:43:29,361 cloud1-1467 optimizing
2017-10-26 10:43:30,118 cloud1-1473 already optimized
2017-10-26 10:43:30,126 cloud1-1472 optimizing
2017-10-26 10:43:30,177 cloud1-1474 already optimized
2017-10-26 10:43:30,178 cloud1-1467 already optimized
2017-10-26 10:43:30,185 cloud1-1471 optimizing
2017-10-26 10:43:30,193 cloud1-1470 optimizing
2017-10-26 10:43:30,301 cloud1-1475 already optimized
2017-10-26 10:43:30,310 cloud1-1469 optimizing
2017-10-26 10:43:30,855 cloud1-1472 already optimized
2017-10-26 10:43:30,864 cloud1-1468 optimizing
2017-10-26 10:43:31,020 cloud1-1470 already optimized
2017-10-26 10:43:31,075 cloud1-1471 already optimized
2017-10-26 10:43:31,079 cloud1-1469 already optimized
2017-10-26 10:43:31,460 cloud1-1468 already optimized


But this one is heavily inbalanced if you look at the AVAIL GB.

ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
31 1.0  1.0   931G   756G   174G 81.24 1.24 204
32 1.0  1.0   931G   750G   180G 80.58 1.23 202
33 1.0  1.0   931G   774G   157G 83.13 1.27 209
34 1.0  1.0   931G   743G   187G 79.89 1.22 200
48 1.0  1.0   931G   780G   150G 83.79 1.28 210
49 1.0  1.0   931G   778G   152G 83.64 1.27 209
50 0.7  1.0   819G   631G   188G 77.00 1.17 170
21 1.0  0.95001   931G   729G   201G 78.32 1.19 197
22 1.0  1.0   931G   756G   174G 81.28 1.24 204
23 1.0  1.0   931G   778G   152G 83.57 1.27 210
24 1.0  0.95001   931G   760G   171G 81.63 1.24 205
46 1.0  1.0   931G   756G   174G 81.30 1.24 203
47 1.0  1.0   931G   768G   162G 82.57 1.26 207
55 0.7  1.0   819G   602G   217G 73.46 1.12 162
11 1.0  1.0   931G   628G   302G 67.47 1.03 169
12 1.0  1.0   931G   630G   300G 67.68 1.03 170
13 1.0  1.0   931G   629G   301G 67.63 1.03 170
14 1.0  1.0   931G   644G   286G 69.20 1.05 173
45 1.0  1.0   931G   635G   295G 68.23 1.04 171
56 0.7  1.0   819G   509G   309G 62.18 0.95 137
40 1.72670  1.0  1768G   774G   994G 43.78 0.67 209
51 1.0  0.95001   931G   784G   146G 84.26 1.28 212
52 1.0  1.0   931G   768G   162G 82.52 1.26 207
53 1.0  1.0   931G   752G   178G 80.78 1.23 203
54 1.0  1.0   931G   750G   180G 80.58 1.23 202
38 1.0  1.0   931G   768G   162G 82.58 1.26 207
39 1.0  1.0   931G   768G   162G 82.51 1.26 207
57 0.7  1.0   819G   616G   203G 75.17 1.15 167
43 1.0  1.0   931G   636G   294G 68.41 1.04 171
44 1.0  1.0   931G   630G   300G 67.70 1.03 170
36 1.0  1.0   931G   626G   304G 67.33 1.03 170
37 1.0  1.0   931G   637G   293G 68.43 1.04 171
58 0.7  1.0   819G   508G   311G 62.03 0.95 137
41 0.87000  1.0   888G   549G   339G 61.79 0.94 148
42 1.72670  1.0  1768G   382G  1385G 21.65 0.33 103
65 1.72670  1.0  1768G   383G  1384G 21.69 0.33 103
 1 1.0  1.0   931G   760G   170G 81.70 1.24 205
 2 1.0  1.0   931G   768G   162G 82.57 1.26 207
 3 1.0  0.95001   931G   732G   198G 78.71 1.20 197
30 1.0  1.0   931G   771G   159G 82.84 1.26 209
35 1.0  1.0   931G   751G   179G 80.70 1.23 202
59 0.7  1.0   819G   599G   220G 73.08 1.11 162
 0 0.84999  1.0   874G   662G   211G 75.81 1.16 179
 4 1.0  1.0   931G   633G   297G 68.07 1.04 171
 5 1.0  1.0   931G   632G   298G 67.94 1.04 171
60 0.7  1.0   819G   509G   310G 62.12 0.95 137
29 0.87000  1.0   888G   553G   335G 62.29 0.95 149
 7 0.87000  1.0   888G   554G   333G 62.44 0.95 149
28 0.87000  1.0   888G   551G   336G 62.09 0.95 149
64 1.73000  1.0  1768G  1088G   679G 61.57 0.94 293
 6 1.72670  1.0  1768G   472G  1295G 26.71 0.41 127
66 1.72670  1.0  1768G   475G  1292G 26.88 0.41 128
 8 1.0  1.0   931G   625G   305G 67.14 1.02 169
 9 1.0  1.0   931G   631G   299G 67.82 1.03 170
61 0.7  1.0   819G   507G   312G 61.90 0.94 137
15 0.87000  1.0   888G   550G   338G 61.89 0.94 148
26 0.87000  1.0   888G   551G   337G 62.04 0.95 149
27 0.84999  1.0   874G   541G   333G 61.90 0.94 146
63 1.73000  1.0  1768G  1101G   666G 62.31 0.95 297
10 1.72670  1.0  1768G   483G  1284G 27.34 0.42 131
67 1.72670  1.0  1768G   482G  1285G 27.31 0.42 130
16 1.0  1.0   931G   786G   144G 84.46 1.29 212
17 1.0  1.0   931G   783G   147G 84.12 1.28 211
18 1.0  1.00

Re: [ceph-users] s3 bucket permishions

2017-10-26 Thread Abhishek Lekshmanan
nigel davies  writes:

> I am fallowing a guide at the mo.
>
> But I believe it's RWG users

We have support for aws like bucket policies,
http://docs.ceph.com/docs/master/radosgw/bucketpolicy/

Some amount of permissions can also be controlled by acls
>
> On 25 Oct 2017 5:29 pm, "David Turner"  wrote:
>
>> Are you talking about RGW buckets with limited permissions for cephx
>> authentication? Or RGW buckets with limited permissions for RGW users?
>>
>> On Wed, Oct 25, 2017 at 12:16 PM nigel davies  wrote:
>>
>>> Hay All
>>>
>>> is it possible to set permissions to buckets
>>>
>>> for example if i have 2 users  (user_a and user_b) and 2 buckets (bk_a
>>> and bk_b)
>>>
>>> i want to set permissions, so user a can only see bk_a and user b can
>>> only see bk_b

This is the default case, a bucket created by user_a is only accessible
to user_a (ie. the bucket owner) and not anyone else
>>>
>>>
>>> I have been looking at cant see what i am after.
>>>
>>> Any advise would be welcome
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] s3 bucket permishions

2017-10-26 Thread nigel davies
Thanks i spotted this, when i run the example i get
ERROR: S3 error: 400 (InvalidArgument)

I found that bucket link will, link my buckets to different users (what is
what i am kind of after)


But i also link to make sure, if an new user was added. they have no access
to any buckets until i allow them


sorry i am new to all this and trying to get my head around it

On Thu, Oct 26, 2017 at 9:54 AM, Abhishek Lekshmanan 
wrote:

> nigel davies  writes:
>
> > I am fallowing a guide at the mo.
> >
> > But I believe it's RWG users
>
> We have support for aws like bucket policies,
> http://docs.ceph.com/docs/master/radosgw/bucketpolicy/
>
> Some amount of permissions can also be controlled by acls
> >
> > On 25 Oct 2017 5:29 pm, "David Turner"  wrote:
> >
> >> Are you talking about RGW buckets with limited permissions for cephx
> >> authentication? Or RGW buckets with limited permissions for RGW users?
> >>
> >> On Wed, Oct 25, 2017 at 12:16 PM nigel davies 
> wrote:
> >>
> >>> Hay All
> >>>
> >>> is it possible to set permissions to buckets
> >>>
> >>> for example if i have 2 users  (user_a and user_b) and 2 buckets (bk_a
> >>> and bk_b)
> >>>
> >>> i want to set permissions, so user a can only see bk_a and user b can
> >>> only see bk_b
>
> This is the default case, a bucket created by user_a is only accessible
> to user_a (ie. the bucket owner) and not anyone else
> >>>
> >>>
> >>> I have been looking at cant see what i am after.
> >>>
> >>> Any advise would be welcome
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> Abhishek Lekshmanan
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Lots of reads on default.rgw.usage pool

2017-10-26 Thread Mark Schouten
Setting rgw_enable_usage_log is not even helping. I get a lot of reads still, 
caused by the calls in my previous email..


Met vriendelijke groeten,

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl



 Van:   Mark Schouten  
 Aan:
 Verzonden:   24-10-2017 12:32 
 Onderwerp:   Re: [ceph-users] Lots of reads on default.rgw.usage pool 

Stracing the radosgw-process, I see a lot of the following:



[pid 12364] sendmsg(23, {msg_name(0)=NULL, 
msg_iov(5)=[{"\7{\340\r\0\0\0\0\0P\200\16\0\0\0\0\0*\0?\0\10\0\331\0\0\0\0\0\0\0M"...,
 54}, 
{"\1\1\22\0\0\0\1\10\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\20\226\206\351\v3\0\0"...,
 217}, {"rgwuser_usage_log_read", 22}, 
{"\1\0011\0\0\0\\320Y\0\0\0\0\200\16\371Y\0\0\0\0\25\0\0\0DB0339"..., 55}, 
{"\305\234\203\332\0\0\0\0K~\356z\4\266\305\272\27hTx\5", 21}], 
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 369

Does anybody know where this is coming from?


Met vriendelijke groeten,

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl



 Van:   Mark Schouten  
 Aan:
 Verzonden:   24-10-2017 12:11 
 Onderwerp:   [ceph-users] Lots of reads on default.rgw.usage pool 

Hi,


Since I upgraded to Luminous last week, I see a lot of read-activity on the 
default.rgw.usage pool. (See attached image). I think it has something to with 
the rgw-daemons, since restarting them slows the reads down for a while. It 
might also have to do with tenants and the fact that dynamic bucket sharding 
isn't working for me [1].


So this morning I disabled the dynamic bucket sharding via 
'rgw_dynamic_resharding = false', but that doesn't seem to help. Maybe 
bucketsharding is still trying to run because of the entry in 'radosgw-admin 
reshard list' that I cannot delete?


[1]: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021774.html


Met vriendelijke groeten,

-- 
Kerio Operator in de Cloud? https://www.kerioindecloud.nl/
Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


smime.p7s
Description: Electronic Signature S/MIME
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
I increased the logging of the mds to try and get some more 
information.  I think the relevant lines are:


2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) _fetched 
missing object for [dir 607 ~mds0/stray7/ [2,head] auth v=108918871 
cv=0/0 ap=1+0+0 state=1610645632 f(v1 m2017-10-25 14:56:13.140995 299=
299+0) n(v1 rc2017-10-25 14:56:13.140995 b191590453903 299=299+0) 
hs=0+11,ss=0+0 dirty=11 | child=1 sticky=1 dirty=1 waiter=1 authpin=1 
0x7f1c71e9f300]
2017-10-26 05:03:17.661708 7f1c598a6700 -1 log_channel(cluster) log 
[ERR] : dir 607 object missing on disk; some files may be lost 
(~mds0/stray7)
2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag 
Damage to fragment * of ino 607 is fatal because it is a system 
directory for this rank


I would be grateful for any help in repair,

Dan

On 10/25/2017 04:17 PM, Daniel Davidson wrote:
A bit more news: I made the ceph-0 mds shut down, started the mds on 
ceph-1 and then told it the mds was repaired.  Everything ran great 
for about 5 hours and now it has crashed again.  Same error:


2017-10-25 15:13:07.344093 mon.0 [INF] fsmap e121828: 1/1/1 up 
{0=ceph-1=up:active}
2017-10-25 15:13:07.383445 mds.0 [ERR] dir 607 object missing on disk; 
some files may be lost (~mds0/stray7)
2017-10-25 15:13:07.480785 mon.0 [INF] osdmap e35296: 32 osds: 32 up, 
32 in
2017-10-25 15:13:07.530337 mon.0 [INF] pgmap v28449919: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 5894 
kB/s rd, 26 op/s
2017-10-25 15:13:08.473363 mon.0 [INF] mds.0 
172.16.31.2:6802/3109594408 down:damaged

2017-10-25 15:13:08.473487 mon.0 [INF] fsmap e121829: 0/1/1 up, 1 damaged

If I:
#  rados -p igbhome_data rmomapkey 100. stray7_head
# ceph mds repaired 0

then I get:
2017-10-25 16:11:52.219916 mds.0 [ERR] dir 607 object missing on disk; 
some files may be lost (~mds0/stray7)
2017-10-25 16:11:52.307975 mon.0 [INF] osdmap e35322: 32 osds: 32 up, 
32 in
2017-10-25 16:11:52.357904 mon.0 [INF] pgmap v28450567: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
11773 kB/s rd, 5262 kB/s wr, 26 op/s
2017-10-25 16:11:53.325331 mon.0 [INF] mds.0 
172.16.31.2:6802/2716803172 down:damaged

2017-10-25 16:11:53.325424 mon.0 [INF] fsmap e121882: 0/1/1 up, 1 damaged
2017-10-25 16:11:53.475087 mon.0 [INF] pgmap v28450568: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
39677 kB/s rd, 47236 B/s wr, 54 op/s
2017-10-25 16:11:54.590232 mon.0 [INF] pgmap v28450569: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
28105 kB/s rd, 3786 kB/s wr, 43 op/s
2017-10-25 16:11:55.719476 mon.0 [INF] pgmap v28450570: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
26284 kB/s rd, 3678 kB/s wr, 357 op/s
2017-10-25 16:11:56.830623 mon.0 [INF] pgmap v28450571: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
37249 kB/s rd, 5476 B/s wr, 358 op/s
2017-10-25 16:11:57.965330 mon.0 [INF] pgmap v28450572: 1536 pgs: 1536 
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 
60769 kB/s rd, 53485 B/s wr, 41 op/s
2017-10-25 16:11:58.033787 mon.0 [INF] mds.? 
172.16.31.2:6802/2725942008 up:boot
2017-10-25 16:11:58.033876 mon.0 [INF] fsmap e121883: 0/1/1 up, 1 
up:standby, 1 damaged


Dan

On 10/25/2017 11:30 AM, Daniel Davidson wrote:

The system is down again saying it is missing the same stray7 again.

2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for 
missing inodes:

2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6
2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on 
disk; some files may be lost (~mds0/stray7)



Dan

On 10/25/2017 08:54 AM, Daniel Davidson wrote:

Thanks for the information.

I did:
# ceph daemon mds.ceph-0 scrub_path / repair recursive

Saw in the logs it finished

# ceph daemon mds.ceph-0 flush journal

Saw in the logs it finished

#ceph mds fail 0
#ceph mds repaired 0

And it went back to missing stray7 again.  I added that back like we 
did earlier and the system is back on line again, but the metadata 
errors still exist.


Dan

On 10/25/2017 07:50 AM, John Spray wrote:

Commands that start "ceph daemon" take mds. rather than a rank
(notes on terminology here:
http://docs.ceph.com/docs/master/cephfs/standby/).  The name is how
you would refer to the daemon from systemd, it's often set to the
hostname where the daemon is running by default.

John

On Wed, Oct 25, 2017 at 2:30 PM, Daniel Davidson
 wrote:
I do have a problem with running the commands you mentioned to 
repair the

mds:

# ceph daemon mds.0 scrub_path
admin_socket: exception getting command descriptions: [Errno 2] No 
such file

or directory
admin_socket: exception getting command descriptions: [Errno 2] No 
such file

or directory

Any idea why that is not working?

Dan



On 10/25/2017 06:45 AM, Daniel Davidson wrote:

John, thank you so much.  After doing the 

Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
And at the risk of bombing the mailing list, I can also see that the 
stray7_head omapkey is not being recreated:

rados -p igbhome_data listomapkeys 100.
stray0_head
stray1_head
stray2_head
stray3_head
stray4_head
stray5_head
stray6_head
stray8_head
stray9_head



On 10/26/2017 05:08 AM, Daniel Davidson wrote:
I increased the logging of the mds to try and get some more 
information.  I think the relevant lines are:


2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) 
_fetched missing object for [dir 607 ~mds0/stray7/ [2,head] auth 
v=108918871 cv=0/0 ap=1+0+0 state=1610645632 f(v1 m2017-10-25 
14:56:13.140995 299=
299+0) n(v1 rc2017-10-25 14:56:13.140995 b191590453903 299=299+0) 
hs=0+11,ss=0+0 dirty=11 | child=1 sticky=1 dirty=1 waiter=1 authpin=1 
0x7f1c71e9f300]
2017-10-26 05:03:17.661708 7f1c598a6700 -1 log_channel(cluster) log 
[ERR] : dir 607 object missing on disk; some files may be lost 
(~mds0/stray7)
2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag 
Damage to fragment * of ino 607 is fatal because it is a system 
directory for this rank


I would be grateful for any help in repair,

Dan

On 10/25/2017 04:17 PM, Daniel Davidson wrote:
A bit more news: I made the ceph-0 mds shut down, started the mds on 
ceph-1 and then told it the mds was repaired.  Everything ran great 
for about 5 hours and now it has crashed again.  Same error:


2017-10-25 15:13:07.344093 mon.0 [INF] fsmap e121828: 1/1/1 up 
{0=ceph-1=up:active}
2017-10-25 15:13:07.383445 mds.0 [ERR] dir 607 object missing on 
disk; some files may be lost (~mds0/stray7)
2017-10-25 15:13:07.480785 mon.0 [INF] osdmap e35296: 32 osds: 32 up, 
32 in
2017-10-25 15:13:07.530337 mon.0 [INF] pgmap v28449919: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 5894 kB/s rd, 26 op/s
2017-10-25 15:13:08.473363 mon.0 [INF] mds.0 
172.16.31.2:6802/3109594408 down:damaged
2017-10-25 15:13:08.473487 mon.0 [INF] fsmap e121829: 0/1/1 up, 1 
damaged


If I:
#  rados -p igbhome_data rmomapkey 100. stray7_head
# ceph mds repaired 0

then I get:
2017-10-25 16:11:52.219916 mds.0 [ERR] dir 607 object missing on 
disk; some files may be lost (~mds0/stray7)
2017-10-25 16:11:52.307975 mon.0 [INF] osdmap e35322: 32 osds: 32 up, 
32 in
2017-10-25 16:11:52.357904 mon.0 [INF] pgmap v28450567: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 11773 kB/s rd, 5262 kB/s wr, 26 op/s
2017-10-25 16:11:53.325331 mon.0 [INF] mds.0 
172.16.31.2:6802/2716803172 down:damaged
2017-10-25 16:11:53.325424 mon.0 [INF] fsmap e121882: 0/1/1 up, 1 
damaged
2017-10-25 16:11:53.475087 mon.0 [INF] pgmap v28450568: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 39677 kB/s rd, 47236 B/s wr, 54 op/s
2017-10-25 16:11:54.590232 mon.0 [INF] pgmap v28450569: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 28105 kB/s rd, 3786 kB/s wr, 43 op/s
2017-10-25 16:11:55.719476 mon.0 [INF] pgmap v28450570: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 26284 kB/s rd, 3678 kB/s wr, 357 op/s
2017-10-25 16:11:56.830623 mon.0 [INF] pgmap v28450571: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 37249 kB/s rd, 5476 B/s wr, 358 op/s
2017-10-25 16:11:57.965330 mon.0 [INF] pgmap v28450572: 1536 pgs: 
1536 active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB 
avail; 60769 kB/s rd, 53485 B/s wr, 41 op/s
2017-10-25 16:11:58.033787 mon.0 [INF] mds.? 
172.16.31.2:6802/2725942008 up:boot
2017-10-25 16:11:58.033876 mon.0 [INF] fsmap e121883: 0/1/1 up, 1 
up:standby, 1 damaged


Dan

On 10/25/2017 11:30 AM, Daniel Davidson wrote:

The system is down again saying it is missing the same stray7 again.

2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for 
missing inodes:

2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6
2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on 
disk; some files may be lost (~mds0/stray7)



Dan

On 10/25/2017 08:54 AM, Daniel Davidson wrote:

Thanks for the information.

I did:
# ceph daemon mds.ceph-0 scrub_path / repair recursive

Saw in the logs it finished

# ceph daemon mds.ceph-0 flush journal

Saw in the logs it finished

#ceph mds fail 0
#ceph mds repaired 0

And it went back to missing stray7 again.  I added that back like 
we did earlier and the system is back on line again, but the 
metadata errors still exist.


Dan

On 10/25/2017 07:50 AM, John Spray wrote:

Commands that start "ceph daemon" take mds. rather than a rank
(notes on terminology here:
http://docs.ceph.com/docs/master/cephfs/standby/).  The name is how
you would refer to the daemon from systemd, it's often set to the
hostname where the daemon is running by default.

John

On Wed, Oct 25, 2017 at 2:30 PM, Daniel Davidson
 wrote:
I do have a problem with running the commands you mentioned to 
repair the

mds:

# ceph daemon mds.0 scrub

Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-26 Thread Sage Weil
On Thu, 26 Oct 2017, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
> 
> Am 25.10.2017 um 21:54 schrieb Sage Weil:
> > On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote:
> >> Hello,
> >>
> >> in the lumious release notes is stated that zstd is not supported by
> >> bluestor due to performance reason. I'm wondering why btrfs instead
> >> states that zstd is as fast as lz4 but compresses as good as zlib.
> >>
> >> Why is zlib than supported by bluestor? And why does btrfs / facebook
> >> behave different?
> >>
> >> "BlueStore supports inline compression using zlib, snappy, or LZ4. (Ceph
> >> also supports zstd for RGW compression but zstd is not recommended for
> >> BlueStore for performance reasons.)"
> > 
> > zstd will work but in our testing the performance wasn't great for 
> > bluestore in particular.  The problem was that for each compression run 
> > there is a relatively high start-up cost initializing the zstd 
> > context/state (IIRC a memset of a huge memory buffer) that dominated the 
> > execution time... primarily because bluestore is generally compressing 
> > pretty small chunks of data at a time, not big buffers or streams.
> > 
> > Take a look at unittest_compression timings on compressing 16KB buffers 
> > (smaller than bluestore needs usually, but illustrated of the problem):
> > 
> > [ RUN  ] Compressor/CompressorTest.compress_16384/0
> > [plugin zlib (zlib/isal)]
> > [   OK ] Compressor/CompressorTest.compress_16384/0 (294 ms)
> > [ RUN  ] Compressor/CompressorTest.compress_16384/1
> > [plugin zlib (zlib/noisal)]
> > [   OK ] Compressor/CompressorTest.compress_16384/1 (1755 ms)
> > [ RUN  ] Compressor/CompressorTest.compress_16384/2
> > [plugin snappy (snappy)]
> > [   OK ] Compressor/CompressorTest.compress_16384/2 (169 ms)
> > [ RUN  ] Compressor/CompressorTest.compress_16384/3
> > [plugin zstd (zstd)]
> > [   OK ] Compressor/CompressorTest.compress_16384/3 (4528 ms)
> > 
> > It's an order of magnitude slower than zlib or snappy, which probably 
> > isn't acceptable--even if it is a bit smaller.
> > 
> > We just updated to a newer zstd the other day but I haven't been paying 
> > attention to the zstd code changes.  When I was working on this the plugin 
> > was initially also misusing the zstd API, but it was also pointed out 
> > that the size of the memset is dependent on the compression level.  
> > Maybe a different (default) choice there woudl help.
> > 
> > https://github.com/facebook/zstd/issues/408#issuecomment-252163241
> 
> thanks for the fast reply. Btrfs uses a default compression level of 3
> but i think this is the default anyway.
> 
> Does the zstd plugin of ceph already uses the mentioned
> ZSTD_resetCStream instead of creating and initializing a new one every time?

Hmm, it doesn't:


https://github.com/ceph/ceph/blob/master/src/compressor/zstd/ZstdCompressor.h#L29

but perhaps that was because it didn't make a difference?  Might be worth 
revisiting.

> So if performance matters ceph would recommand snappy?

Yep!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-26 Thread Sage Weil
On Thu, 26 Oct 2017, Haomai Wang wrote:
> in our test, lz4 is better than snappy

Let's switch the default then?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2017-10-26 Thread John Spray
On Thu, Oct 26, 2017 at 12:40 PM, Daniel Davidson
 wrote:
> And at the risk of bombing the mailing list, I can also see that the
> stray7_head omapkey is not being recreated:
> rados -p igbhome_data listomapkeys 100.
> stray0_head
> stray1_head
> stray2_head
> stray3_head
> stray4_head
> stray5_head
> stray6_head
> stray8_head
> stray9_head

So if it's staying up for a little while, I'd issue a "ceph daemon
mds. flush journal" to ensure it writes back the recreated stray
directory.  It's normal that the newly created dir (and its linkage)
don't appear in the backing store right away (they're just created in
memory+journal at mds startup).

John

>
>
>
>
> On 10/26/2017 05:08 AM, Daniel Davidson wrote:
>>
>> I increased the logging of the mds to try and get some more information.
>> I think the relevant lines are:
>>
>> 2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) _fetched
>> missing object for [dir 607 ~mds0/stray7/ [2,head] auth v=108918871 cv=0/0
>> ap=1+0+0 state=1610645632 f(v1 m2017-10-25 14:56:13.140995 299=
>> 299+0) n(v1 rc2017-10-25 14:56:13.140995 b191590453903 299=299+0)
>> hs=0+11,ss=0+0 dirty=11 | child=1 sticky=1 dirty=1 waiter=1 authpin=1
>> 0x7f1c71e9f300]
>> 2017-10-26 05:03:17.661708 7f1c598a6700 -1 log_channel(cluster) log [ERR]
>> : dir 607 object missing on disk; some files may be lost (~mds0/stray7)
>> 2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag
>> Damage to fragment * of ino 607 is fatal because it is a system directory
>> for this rank
>>
>> I would be grateful for any help in repair,
>>
>> Dan
>>
>> On 10/25/2017 04:17 PM, Daniel Davidson wrote:
>>>
>>> A bit more news: I made the ceph-0 mds shut down, started the mds on
>>> ceph-1 and then told it the mds was repaired.  Everything ran great for
>>> about 5 hours and now it has crashed again.  Same error:
>>>
>>> 2017-10-25 15:13:07.344093 mon.0 [INF] fsmap e121828: 1/1/1 up
>>> {0=ceph-1=up:active}
>>> 2017-10-25 15:13:07.383445 mds.0 [ERR] dir 607 object missing on disk;
>>> some files may be lost (~mds0/stray7)
>>> 2017-10-25 15:13:07.480785 mon.0 [INF] osdmap e35296: 32 osds: 32 up, 32
>>> in
>>> 2017-10-25 15:13:07.530337 mon.0 [INF] pgmap v28449919: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 5894 kB/s
>>> rd, 26 op/s
>>> 2017-10-25 15:13:08.473363 mon.0 [INF] mds.0 172.16.31.2:6802/3109594408
>>> down:damaged
>>> 2017-10-25 15:13:08.473487 mon.0 [INF] fsmap e121829: 0/1/1 up, 1 damaged
>>>
>>> If I:
>>> #  rados -p igbhome_data rmomapkey 100. stray7_head
>>> # ceph mds repaired 0
>>>
>>> then I get:
>>> 2017-10-25 16:11:52.219916 mds.0 [ERR] dir 607 object missing on disk;
>>> some files may be lost (~mds0/stray7)
>>> 2017-10-25 16:11:52.307975 mon.0 [INF] osdmap e35322: 32 osds: 32 up, 32
>>> in
>>> 2017-10-25 16:11:52.357904 mon.0 [INF] pgmap v28450567: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 11773 kB/s
>>> rd, 5262 kB/s wr, 26 op/s
>>> 2017-10-25 16:11:53.325331 mon.0 [INF] mds.0 172.16.31.2:6802/2716803172
>>> down:damaged
>>> 2017-10-25 16:11:53.325424 mon.0 [INF] fsmap e121882: 0/1/1 up, 1 damaged
>>> 2017-10-25 16:11:53.475087 mon.0 [INF] pgmap v28450568: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 39677 kB/s
>>> rd, 47236 B/s wr, 54 op/s
>>> 2017-10-25 16:11:54.590232 mon.0 [INF] pgmap v28450569: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 28105 kB/s
>>> rd, 3786 kB/s wr, 43 op/s
>>> 2017-10-25 16:11:55.719476 mon.0 [INF] pgmap v28450570: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 26284 kB/s
>>> rd, 3678 kB/s wr, 357 op/s
>>> 2017-10-25 16:11:56.830623 mon.0 [INF] pgmap v28450571: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 37249 kB/s
>>> rd, 5476 B/s wr, 358 op/s
>>> 2017-10-25 16:11:57.965330 mon.0 [INF] pgmap v28450572: 1536 pgs: 1536
>>> active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 60769 kB/s
>>> rd, 53485 B/s wr, 41 op/s
>>> 2017-10-25 16:11:58.033787 mon.0 [INF] mds.? 172.16.31.2:6802/2725942008
>>> up:boot
>>> 2017-10-25 16:11:58.033876 mon.0 [INF] fsmap e121883: 0/1/1 up, 1
>>> up:standby, 1 damaged
>>>
>>> Dan
>>>
>>> On 10/25/2017 11:30 AM, Daniel Davidson wrote:

 The system is down again saying it is missing the same stray7 again.

 2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for
 missing inodes:
 2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6
 2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on disk;
 some files may be lost (~mds0/stray7)


 Dan

 On 10/25/2017 08:54 AM, Daniel Davidson wrote:
>
> Thanks for the information.
>
> I did:
> # ceph daemon mds.ceph-0 scrub_path / repair recursive
>
> Saw in the logs it finished
>
> # ceph daemon mds.ceph

[ceph-users] Ceph Tech Talk Cancelled

2017-10-26 Thread Leonardo Vaz
Hey Cephers,

Sorry for the short notice, but the Ceph Tech Talk for October
(scheduled for today) has been canceled.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developers Monthly - November

2017-10-26 Thread Leonardo Vaz
Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:

 http://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 http://wiki.ceph.com/CDM_01-NOV-2017

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Russell Glaue
On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar  wrote:

> It depends on what stage you are in:
> in production, probably the best thing is to setup a monitoring tool
> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well
> as resource load. This will, among other things, show you if you have
> slowing disks.
>
I am monitoring Ceph performance with ceph-dash (
http://cephdash.crapworks.de/), that is why I knew to look into the slow
writes issue. And I am using Monitorix (http://www.monitorix.org/) to
monitor system resources, including Disk I/O.

However, though I can monitor individual disk performance at the system
level, it seems Ceph does not tax any disk more than the worst disk. So in
my monitoring charts, all disks have the same performance.
All four nodes are base-lining at 50 writes/sec during the cluster's normal
load, with the non-problem hosts spiking up to 150, and the problem host
only spikes up to 100.
But during the window of time I took the problem host OSDs down to run the
bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.

Before production you should first make sure your SSDs are suitable for
> Ceph, either by being recommend by other Ceph users or you test them
> yourself for sync writes performance using fio tool as outlined earlier.
> Then after you build your cluster you can use rados and/or rbd bencmark
> tests to benchmark your cluster and find bottlenecks using
> atop/sar/collectl which will help you tune your cluster.
>
All 36 OSDs are: Crucial_CT960M500SSD1

Rados bench tests were done at the beginning. The speed was much faster
than it is now. I cannot recall the test results, someone else on my team
ran them. Recently, I had thought the slow disk problem was a configuration
issue with Ceph - before I posted here. Now we are hoping it may be
resolved with a firmware update. (If it is firmware related, rebooting the
problem node may temporarily resolve this)


> Though you did see better improvements, your cluster with 27 SSDs should
> give much higher numbers than 3k iops. If you are running rados bench while
> you have other client ios, then obviously the reported number by the tool
> will be less than what the cluster is actually giving...which you can find
> out via ceph status command, it will print the total cluster throughput and
> iops. If the total is still low i would recommend running the fio raw disk
> test, maybe the disks are not suitable. When you removed your 9 bad disk
> from 36 and your performance doubled, you still had 2 other disk slowing
> you..meaning near 100% busy ? It makes me feel the disk type used is not
> good. For these near 100% busy disks can you also measure their raw disk
> iops at that load (i am not sure atop shows this, if not use
> sat/syssyat/iostat/collecl).
>
I ran another bench test today with all 36 OSDs up. The overall performance
was improved slightly compared to the original tests. Only 3 OSDs on the
problem host were increasing to 101% disk busy.
The iops reported from ceph status during this bench test ranged from 1.6k
to 3.3k, the test yielding 4k iops.

Yes, the two other OSDs/disks that were the bottleneck were at 101% disk
busy. The other OSD disks on the same host were sailing along at like
50-60% busy.

All 36 OSD disks are exactly the same disk. They were all purchased at the
same time. All were installed at the same time.
I cannot believe it is a problem with the disk model. A failed/bad disk,
perhaps is possible. But the disk model itself cannot be the problem based
on what I am seeing. If I am seeing bad performance on all disks on one
ceph node/host, but not on another ceph node with these same disks, it has
to be some other factor. This is why I am now guessing a firmware upgrade
is needed.

Also, as I eluded to here earlier. I took down all 9 OSDs in the problem
host yesterday to run the bench test.
Today, with those 9 OSDs back online, I rerun the bench test, I am see 2-3
OSD disks with 101% busy on the problem host, and the other disks are lower
than 80%. So, for whatever reason, shutting down the OSDs and starting them
back up, allowed many (not all) of the OSDs performance to improve on the
problem host.


Maged
>
> On 2017-10-25 23:44, Russell Glaue wrote:
>
> Thanks to all.
> I took the OSDs down in the problem host, without shutting down the
> machine.
> As predicted, our MB/s about doubled.
> Using this bench/atop procedure, I found two other OSDs on another host
> that are the next bottlenecks.
>
> Is this the only good way to really test the performance of the drives as
> OSDs? Is there any other way?
>
> While running the bench on all 36 OSDs, the 9 problem OSDs stuck out. But
> two new problem OSDs I just discovered in this recent test of 27 OSDs did
> not stick out at all. Because ceph bench distributes the load making only
> the very worst denominators show up in atop. So ceph is a slow a

[ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread GiangCoi Mr
Hi all
I am installing ceph luminous on fedora 26, I installed ceph luminous success 
but when I install ceph mon, it’s error: it doesn’t find client.admin.keyring. 
How I can fix it, Thank so much

Regard, 
GiangLT

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread Alan Johnson
If using defaults try 
 chmod +r /etc/ceph/ceph.client.admin.keyring

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
GiangCoi Mr
Sent: Thursday, October 26, 2017 11:09 AM
To: ceph-us...@ceph.com
Subject: [ceph-users] Install Ceph on Fedora 26

Hi all
I am installing ceph luminous on fedora 26, I installed ceph luminous success 
but when I install ceph mon, it’s error: it doesn’t find client.admin.keyring. 
How I can fix it, Thank so much

Regard, 
GiangLT

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread GiangCoi Mr
Dear Alan Johnson
I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. When 
install success, I run command: ceph-deploy mon ceph-node1, it’s error because 
it didn’t find file ceph.client.admin.keyring. So how I make permission for 
this file?

Sent from my iPhone

> On Oct 26, 2017, at 10:18 PM, Alan Johnson  wrote:
> 
> If using defaults try 
> chmod +r /etc/ceph/ceph.client.admin.keyring
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> GiangCoi Mr
> Sent: Thursday, October 26, 2017 11:09 AM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Install Ceph on Fedora 26
> 
> Hi all
> I am installing ceph luminous on fedora 26, I installed ceph luminous success 
> but when I install ceph mon, it’s error: it doesn’t find 
> client.admin.keyring. How I can fix it, Thank so much
> 
> Regard, 
> GiangLT
> 
> Sent from my iPhone
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=
>  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread Denes Dolhay

Hi,

Did you to create a cluster first?

ceph-deploy  new  {initial-monitor-node(s)} Cheers, Denes.

On 10/26/2017 05:25 PM, GiangCoi Mr wrote:

Dear Alan Johnson
I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. When 
install success, I run command: ceph-deploy mon ceph-node1, it’s error because 
it didn’t find file ceph.client.admin.keyring. So how I make permission for 
this file?

Sent from my iPhone


On Oct 26, 2017, at 10:18 PM, Alan Johnson  wrote:

If using defaults try
chmod +r /etc/ceph/ceph.client.admin.keyring

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
GiangCoi Mr
Sent: Thursday, October 26, 2017 11:09 AM
To: ceph-us...@ceph.com
Subject: [ceph-users] Install Ceph on Fedora 26

Hi all
I am installing ceph luminous on fedora 26, I installed ceph luminous success 
but when I install ceph mon, it’s error: it doesn’t find client.admin.keyring. 
How I can fix it, Thank so much

Regard,
GiangLT

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread GiangCoi Mr
Hi Denes.
I created with command: ceph-deploy new ceph-node1

Sent from my iPhone

> On Oct 26, 2017, at 10:34 PM, Denes Dolhay  wrote:
> 
> Hi,
> Did you to create a cluster first?
> 
> ceph-deploy new {initial-monitor-node(s)}
> 
> Cheers,
> Denes.
>> On 10/26/2017 05:25 PM, GiangCoi Mr wrote:
>> Dear Alan Johnson
>> I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. 
>> When install success, I run command: ceph-deploy mon ceph-node1, it’s error 
>> because it didn’t find file ceph.client.admin.keyring. So how I make 
>> permission for this file?
>> 
>> Sent from my iPhone
>> 
>>> On Oct 26, 2017, at 10:18 PM, Alan Johnson  wrote:
>>> 
>>> If using defaults try 
>>> chmod +r /etc/ceph/ceph.client.admin.keyring
>>> 
>>> -Original Message-
>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
>>> GiangCoi Mr
>>> Sent: Thursday, October 26, 2017 11:09 AM
>>> To: ceph-us...@ceph.com
>>> Subject: [ceph-users] Install Ceph on Fedora 26
>>> 
>>> Hi all
>>> I am installing ceph luminous on fedora 26, I installed ceph luminous 
>>> success but when I install ceph mon, it’s error: it doesn’t find 
>>> client.admin.keyring. How I can fix it, Thank so much
>>> 
>>> Regard, 
>>> GiangLT
>>> 
>>> Sent from my iPhone
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=
>>>  
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread Denes Dolhay

Hi,

If you ssh to ceph-node1, what are the rights, owner, group, content of 
/etc/ceph/ceph.client.admin.keyring ?


[you should mask out the key, just show us that it is there]


On 10/26/2017 05:41 PM, GiangCoi Mr wrote:

Hi Denes.
I created with command: ceph-deploy new ceph-node1

Sent from my iPhone

On Oct 26, 2017, at 10:34 PM, Denes Dolhay > wrote:



Hi,

Did you to create a cluster first?

ceph-deploy  new  {initial-monitor-node(s)} Cheers, Denes.
On 10/26/2017 05:25 PM, GiangCoi Mr wrote:

Dear Alan Johnson
I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. When 
install success, I run command: ceph-deploy mon ceph-node1, it’s error because 
it didn’t find file ceph.client.admin.keyring. So how I make permission for 
this file?

Sent from my iPhone


On Oct 26, 2017, at 10:18 PM, Alan Johnson  wrote:

If using defaults try
chmod +r /etc/ceph/ceph.client.admin.keyring

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
GiangCoi Mr
Sent: Thursday, October 26, 2017 11:09 AM
To:ceph-us...@ceph.com
Subject: [ceph-users] Install Ceph on Fedora 26

Hi all
I am installing ceph luminous on fedora 26, I installed ceph luminous success 
but when I install ceph mon, it’s error: it doesn’t find client.admin.keyring. 
How I can fix it, Thank so much

Regard,
GiangLT

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=  

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
Thanks John.  It has been up for a few hours now, and I am slowly adding 
more workload to it over time, just so I can see what id going on better.


I was wondering, since this object is used to delete data, if there was 
a chance that deleting data from the system could cause it to be used 
and then make the system go down?


thanks again for all of your help,

Dan

On 10/26/2017 09:23 AM, John Spray wrote:

On Thu, Oct 26, 2017 at 12:40 PM, Daniel Davidson
 wrote:

And at the risk of bombing the mailing list, I can also see that the
stray7_head omapkey is not being recreated:
rados -p igbhome_data listomapkeys 100.
stray0_head
stray1_head
stray2_head
stray3_head
stray4_head
stray5_head
stray6_head
stray8_head
stray9_head

So if it's staying up for a little while, I'd issue a "ceph daemon
mds. flush journal" to ensure it writes back the recreated stray
directory.  It's normal that the newly created dir (and its linkage)
don't appear in the backing store right away (they're just created in
memory+journal at mds startup).

John





On 10/26/2017 05:08 AM, Daniel Davidson wrote:

I increased the logging of the mds to try and get some more information.
I think the relevant lines are:

2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) _fetched
missing object for [dir 607 ~mds0/stray7/ [2,head] auth v=108918871 cv=0/0
ap=1+0+0 state=1610645632 f(v1 m2017-10-25 14:56:13.140995 299=
299+0) n(v1 rc2017-10-25 14:56:13.140995 b191590453903 299=299+0)
hs=0+11,ss=0+0 dirty=11 | child=1 sticky=1 dirty=1 waiter=1 authpin=1
0x7f1c71e9f300]
2017-10-26 05:03:17.661708 7f1c598a6700 -1 log_channel(cluster) log [ERR]
: dir 607 object missing on disk; some files may be lost (~mds0/stray7)
2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag
Damage to fragment * of ino 607 is fatal because it is a system directory
for this rank

I would be grateful for any help in repair,

Dan

On 10/25/2017 04:17 PM, Daniel Davidson wrote:

A bit more news: I made the ceph-0 mds shut down, started the mds on
ceph-1 and then told it the mds was repaired.  Everything ran great for
about 5 hours and now it has crashed again.  Same error:

2017-10-25 15:13:07.344093 mon.0 [INF] fsmap e121828: 1/1/1 up
{0=ceph-1=up:active}
2017-10-25 15:13:07.383445 mds.0 [ERR] dir 607 object missing on disk;
some files may be lost (~mds0/stray7)
2017-10-25 15:13:07.480785 mon.0 [INF] osdmap e35296: 32 osds: 32 up, 32
in
2017-10-25 15:13:07.530337 mon.0 [INF] pgmap v28449919: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 5894 kB/s
rd, 26 op/s
2017-10-25 15:13:08.473363 mon.0 [INF] mds.0 172.16.31.2:6802/3109594408
down:damaged
2017-10-25 15:13:08.473487 mon.0 [INF] fsmap e121829: 0/1/1 up, 1 damaged

If I:
#  rados -p igbhome_data rmomapkey 100. stray7_head
# ceph mds repaired 0

then I get:
2017-10-25 16:11:52.219916 mds.0 [ERR] dir 607 object missing on disk;
some files may be lost (~mds0/stray7)
2017-10-25 16:11:52.307975 mon.0 [INF] osdmap e35322: 32 osds: 32 up, 32
in
2017-10-25 16:11:52.357904 mon.0 [INF] pgmap v28450567: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 11773 kB/s
rd, 5262 kB/s wr, 26 op/s
2017-10-25 16:11:53.325331 mon.0 [INF] mds.0 172.16.31.2:6802/2716803172
down:damaged
2017-10-25 16:11:53.325424 mon.0 [INF] fsmap e121882: 0/1/1 up, 1 damaged
2017-10-25 16:11:53.475087 mon.0 [INF] pgmap v28450568: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 39677 kB/s
rd, 47236 B/s wr, 54 op/s
2017-10-25 16:11:54.590232 mon.0 [INF] pgmap v28450569: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 28105 kB/s
rd, 3786 kB/s wr, 43 op/s
2017-10-25 16:11:55.719476 mon.0 [INF] pgmap v28450570: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 26284 kB/s
rd, 3678 kB/s wr, 357 op/s
2017-10-25 16:11:56.830623 mon.0 [INF] pgmap v28450571: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 37249 kB/s
rd, 5476 B/s wr, 358 op/s
2017-10-25 16:11:57.965330 mon.0 [INF] pgmap v28450572: 1536 pgs: 1536
active+clean; 793 TB data, 1592 TB used, 1026 TB / 2619 TB avail; 60769 kB/s
rd, 53485 B/s wr, 41 op/s
2017-10-25 16:11:58.033787 mon.0 [INF] mds.? 172.16.31.2:6802/2725942008
up:boot
2017-10-25 16:11:58.033876 mon.0 [INF] fsmap e121883: 0/1/1 up, 1
up:standby, 1 damaged

Dan

On 10/25/2017 11:30 AM, Daniel Davidson wrote:

The system is down again saying it is missing the same stray7 again.

2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for
missing inodes:
2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6
2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on disk;
some files may be lost (~mds0/stray7)


Dan

On 10/25/2017 08:54 AM, Daniel Davidson wrote:

Thanks for the information.

I did:
# ceph daemon mds.ceph-0 scrub_path / repair recursive

Saw in the logs it finished

# ceph daemon mds.ceph-

Re: [ceph-users] s3 bucket permishions

2017-10-26 Thread nigel davies
Thanks all for offering input

I believe i worked it out :D you can set permissions using s3cmd

On Thu, Oct 26, 2017 at 10:20 AM, nigel davies  wrote:

> Thanks i spotted this, when i run the example i get
> ERROR: S3 error: 400 (InvalidArgument)
>
> I found that bucket link will, link my buckets to different users (what is
> what i am kind of after)
>
>
> But i also link to make sure, if an new user was added. they have no
> access to any buckets until i allow them
>
>
> sorry i am new to all this and trying to get my head around it
>
> On Thu, Oct 26, 2017 at 9:54 AM, Abhishek Lekshmanan 
> wrote:
>
>> nigel davies  writes:
>>
>> > I am fallowing a guide at the mo.
>> >
>> > But I believe it's RWG users
>>
>> We have support for aws like bucket policies,
>> http://docs.ceph.com/docs/master/radosgw/bucketpolicy/
>>
>> Some amount of permissions can also be controlled by acls
>> >
>> > On 25 Oct 2017 5:29 pm, "David Turner"  wrote:
>> >
>> >> Are you talking about RGW buckets with limited permissions for cephx
>> >> authentication? Or RGW buckets with limited permissions for RGW users?
>> >>
>> >> On Wed, Oct 25, 2017 at 12:16 PM nigel davies 
>> wrote:
>> >>
>> >>> Hay All
>> >>>
>> >>> is it possible to set permissions to buckets
>> >>>
>> >>> for example if i have 2 users  (user_a and user_b) and 2 buckets (bk_a
>> >>> and bk_b)
>> >>>
>> >>> i want to set permissions, so user a can only see bk_a and user b can
>> >>> only see bk_b
>>
>> This is the default case, a bucket created by user_a is only accessible
>> to user_a (ie. the bucket owner) and not anyone else
>> >>>
>> >>>
>> >>> I have been looking at cant see what i am after.
>> >>>
>> >>> Any advise would be welcome
>> >>>
>> >>> ___
>> >>> ceph-users mailing list
>> >>> ceph-users@lists.ceph.com
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> --
>> Abhishek Lekshmanan
>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>> HRB 21284 (AG Nürnberg)
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Maged Mokhtar
I wish the firmware update will fix things for you.
Regarding monitoring: if your tool is able to record disk busy%, iops,
throughout then you do not need to run atop. 

I still highly recommend you run the fio SSD test for sync writes:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
[6] 

The other important factor for SSDs is they should have commercial grade
endurance/DWPD 

In the absence of other load, if you stress your cluster using rados 4k
benchmark (i recommended 4k since this was the block sizes you were
getting when doing RAID comparison in your initial post ), your load
will be dominated by iops performance. You should be easily see-ing a
couple of thousand iops on a raw disk level, on a cluster level with 30
disks, you should be roughly approaching 30 x actual raw disk iops for
4k reads and about 5 x for writes ( due to replicas and journal seeks ).
If you were using fast SSDs ( 10k+ iops per disk), you will start
hitting other bottlenecks like cpu% but your case is far from this. In
your case to get decent cluster iops performance you should be aiming to
get a couple of thousand iops at the raw disk level and a busy% of below
90% during rados 4k test. 

Maged 

On 2017-10-26 16:44, Russell Glaue wrote:

> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar  wrote:
> 
>> It depends on what stage you are in: 
>> in production, probably the best thing is to setup a monitoring tool 
>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well as 
>> resource load. This will, among other things, show you if you have slowing 
>> disks.
> 
> I am monitoring Ceph performance with ceph-dash 
> (http://cephdash.crapworks.de/), that is why I knew to look into the slow 
> writes issue. And I am using Monitorix (http://www.monitorix.org/) to monitor 
> system resources, including Disk I/O. 
> 
> However, though I can monitor individual disk performance at the system 
> level, it seems Ceph does not tax any disk more than the worst disk. So in my 
> monitoring charts, all disks have the same performance. 
> All four nodes are base-lining at 50 writes/sec during the cluster's normal 
> load, with the non-problem hosts spiking up to 150, and the problem host only 
> spikes up to 100.  
> But during the window of time I took the problem host OSDs down to run the 
> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec. 
> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts. 
> 
>> Before production you should first make sure your SSDs are suitable for 
>> Ceph, either by being recommend by other Ceph users or you test them 
>> yourself for sync writes performance using fio tool as outlined earlier. 
>> Then after you build your cluster you can use rados and/or rbd bencmark 
>> tests to benchmark your cluster and find bottlenecks using atop/sar/collectl 
>> which will help you tune your cluster.
> 
> All 36 OSDs are: Crucial_CT960M500SSD1 
> 
> Rados bench tests were done at the beginning. The speed was much faster than 
> it is now. I cannot recall the test results, someone else on my team ran 
> them. Recently, I had thought the slow disk problem was a configuration issue 
> with Ceph - before I posted here. Now we are hoping it may be resolved with a 
> firmware update. (If it is firmware related, rebooting the problem node may 
> temporarily resolve this) 
> 
>> Though you did see better improvements, your cluster with 27 SSDs should 
>> give much higher numbers than 3k iops. If you are running rados bench while 
>> you have other client ios, then obviously the reported number by the tool 
>> will be less than what the cluster is actually giving...which you can find 
>> out via ceph status command, it will print the total cluster throughput and 
>> iops. If the total is still low i would recommend running the fio raw disk 
>> test, maybe the disks are not suitable. When you removed your 9 bad disk 
>> from 36 and your performance doubled, you still had 2 other disk slowing 
>> you..meaning near 100% busy ? It makes me feel the disk type used is not 
>> good. For these near 100% busy disks can you also measure their raw disk 
>> iops at that load (i am not sure atop shows this, if not use 
>> sat/syssyat/iostat/collecl).
> 
> I ran another bench test today with all 36 OSDs up. The overall performance 
> was improved slightly compared to the original tests. Only 3 OSDs on the 
> problem host were increasing to 101% disk busy. 
> The iops reported from ceph status during this bench test ranged from 1.6k to 
> 3.3k, the test yielding 4k iops. 
> 
> Yes, the two other OSDs/disks that were the bottleneck were at 101% disk 
> busy. The other OSD disks on the same host were sailing along at like 50-60% 
> busy. 
> 
> All 36 OSD disks are exactly the same disk. They were all purchased at the 
> same time. All were installed at the same time. 
> I cannot believe it is a problem with the disk model. A failed

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Gerhard W. Recher
Would be nice to see your output of:

rados bench -p rbd 60 write --no-cleanup -t 56 -b 4096  -o 1M

Total time run: 60.005452
Total writes made:  438295
Write size: 4096
Object size:    1048576
Bandwidth (MB/sec): 28.5322
Stddev Bandwidth:   0.514721
Max bandwidth (MB/sec): 29.5781
Min bandwidth (MB/sec): 27.1328
Average IOPS:   7304
Stddev IOPS:    131
Max IOPS:   7572
Min IOPS:   6946
Average Latency(s): 0.00766615
Stddev Latency(s):  0.00276119
Max latency(s): 0.0481837
Min latency(s): 0.000474167


in real live Blocksize of only 4096 bytes is not really common i think :)

Regards

Gerhard W. Recher

net4sec UG (haftungsbeschränkt)
Leitenweg 6
86929 Penzing

+49 171 4802507
Am 26.10.2017 um 19:01 schrieb Maged Mokhtar:
>
>
> I wish the firmware update will fix things for you.
> Regarding monitoring: if your tool is able to record disk busy%, iops,
> throughout then you do not need to run atop.
>
> I still highly recommend you run the fio SSD test for sync writes:
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> 
>
> The other important factor for SSDs is they should have commercial
> grade endurance/DWPD
>
> In the absence of other load, if you stress your cluster using rados
> 4k benchmark (i recommended 4k since this was the block sizes you were
> getting when doing RAID comparison in your initial post ), your load
> will be dominated by iops performance. You should be easily see-ing a
> couple of thousand iops on a raw disk level, on a cluster level with
> 30 disks, you should be roughly approaching 30 x actual raw disk iops
> for 4k reads and about 5 x for writes ( due to replicas and journal
> seeks ). If you were using fast SSDs ( 10k+ iops per disk), you will
> start hitting other bottlenecks like cpu% but your case is far from
> this. In your case to get decent cluster iops performance you should
> be aiming to get a couple of thousand iops at the raw disk level and a
> busy% of below 90% during rados 4k test.
>
>  
>
> Maged
>
> On 2017-10-26 16:44, Russell Glaue wrote:
>
>>
>> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar > > wrote:
>>
>> It depends on what stage you are in:
>> in production, probably the best thing is to setup a monitoring
>> tool (collectd/grahite/prometheus/grafana) to monitor both ceph
>> stats as well as resource load. This will, among other things,
>> show you if you have slowing disks.
>>
>> I am monitoring Ceph performance with ceph-dash
>> (http://cephdash.crapworks.de/), that is why I knew to look into the
>> slow writes issue. And I am using Monitorix
>> (http://www.monitorix.org/) to monitor system resources, including
>> Disk I/O.
>>  
>> However, though I can monitor individual disk performance at the
>> system level, it seems Ceph does not tax any disk more than the worst
>> disk. So in my monitoring charts, all disks have the same performance.
>> All four nodes are base-lining at 50 writes/sec during the cluster's
>> normal load, with the non-problem hosts spiking up to 150, and the
>> problem host only spikes up to 100. 
>> But during the window of time I took the problem host OSDs down to
>> run the bench tests, the OSDs on the other nodes increased to 300-500
>> writes/sec. Otherwise, the chart looks the same for all disks on all
>> ceph nodes/hosts.
>>  
>>
>> Before production you should first make sure your SSDs are
>> suitable for Ceph, either by being recommend by other Ceph users
>> or you test them yourself for sync writes performance using fio
>> tool as outlined earlier. Then after you build your cluster you
>> can use rados and/or rbd bencmark tests to benchmark your cluster
>> and find bottlenecks using atop/sar/collectl which will help you
>> tune your cluster.
>>
>> All 36 OSDs are: Crucial_CT960M500SSD1
>>  
>> Rados bench tests were done at the beginning. The speed was much
>> faster than it is now. I cannot recall the test results, someone else
>> on my team ran them. Recently, I had thought the slow disk problem
>> was a configuration issue with Ceph - before I posted here. Now we
>> are hoping it may be resolved with a firmware update. (If it is
>> firmware related, rebooting the problem node may temporarily resolve
>> this)
>>  
>>
>> Though you did see better improvements, your cluster with 27 SSDs
>> should give much higher numbers than 3k iops. If you are running
>> rados bench while you have other client ios, then obviously the
>> reported number by the tool will be less than what the cluster is
>> actually giving...which you can find out via ceph status command,
>> it will print the total cluster throughput and iops. If the total
>> is still low i would reco

Re: [ceph-users] Hammer to Jewel Upgrade - Extreme OSD Boot Time

2017-10-26 Thread Chris Jones
The long running functionality appears to be related to clear_temp_objects(); 
from OSD.cc called from init().


What is this functionality intended to do? Is it required to be run on every 
OSD startup? Any configuration settings that would help speed this up?


--
Christopher J. Jones


From: Chris Jones
Sent: Wednesday, October 25, 2017 12:52:13 PM
To: ceph-users@lists.ceph.com
Subject: Hammer to Jewel Upgrade - Extreme OSD Boot Time


After upgrading from CEPH Hammer to Jewel, we are experiencing extremely long 
osd boot duration.

This long boot time is a huge concern for us and are looking for insight into 
how we can speed up the boot time.

In Hammer, OSD boot time was approx 3 minutes. After upgrading to Jewel, boot 
time is between 1 and 3 HOURS.

This was not surprising during initial boot after the upgrade, however we are 
seeing this occur each time an OSD process is restarted.

This is using ZFS. We added the following configuration to ceph.conf as part of 
the upgrade to overcome some filesystem startup issues per the recommendations 
at the following url:

https://github.com/zfsonlinux/zfs/issues/4913

Added ceph.conf configuration:
filestore_max_inline_xattrs = 10
filestore_max_inline_xattr_size = 65536
filestore_max_xattr_value_size = 65536


Example OSD Log (note the long duration at the line containing "osd.191 119292 
crush map has features 281819681652736, adjusting msgr requires for osds":

2017-10-24 18:01:18.410249 7f1333d08700  1 leveldb: Generated table #524178: 
158056 keys, 1502244 bytes
2017-10-24 18:01:18.805235 7f1333d08700  1 leveldb: Generated table #524179: 
266429 keys, 2129196 bytes
2017-10-24 18:01:19.254798 7f1333d08700  1 leveldb: Generated table #524180: 
197068 keys, 2128820 bytes
2017-10-24 18:01:20.070109 7f1333d08700  1 leveldb: Generated table #524181: 
192675 keys, 2129122 bytes
2017-10-24 18:01:20.947818 7f1333d08700  1 leveldb: Generated table #524182: 
196806 keys, 2128945 bytes
2017-10-24 18:01:21.183475 7f1333d08700  1 leveldb: Generated table #524183: 
63421 keys, 828081 bytes
2017-10-24 18:01:21.477197 7f1333d08700  1 leveldb: Generated table #524184: 
173331 keys, 1348407 bytes
2017-10-24 18:01:21.477226 7f1333d08700  1 leveldb: Compacted 1@2 + 12@3 files 
=> 19838392 bytes
2017-10-24 18:01:21.509952 7f1333d08700  1 leveldb: compacted to: files[ 0 1 66 
551 788 0 0 ]
2017-10-24 18:01:21.512235 7f1333d08700  1 leveldb: Delete type=2 #523994
2017-10-24 18:01:23.142853 7f1349d93800  0 filestore(/osd/191) mount: enabling 
WRITEAHEAD journal mode: checkpoint is not enabled
2017-10-24 18:01:27.927823 7f1349d93800  0  cls/hello/cls_hello.cc:305: 
loading cls_hello
2017-10-24 18:01:27.933105 7f1349d93800  0  cls/cephfs/cls_cephfs.cc:202: 
loading cephfs_size_scan
2017-10-24 18:01:27.960283 7f1349d93800  0 osd.191 119292 crush map has 
features 281544803745792, adjusting msgr requires for clients
2017-10-24 18:01:27.960309 7f1349d93800  0 osd.191 119292 crush map has 
features 281819681652736 was 8705, adjusting msgr requires for mons
2017-10-24 18:01:27.960316 7f1349d93800  0 osd.191 119292 crush map has 
features 281819681652736, adjusting msgr requires for osds
2017-10-24 23:28:09.694213 7f1349d93800  0 osd.191 119292 load_pgs
2017-10-24 23:28:14.757449 7f1333d08700  1 leveldb: Compacting 1@1 + 13@2 files
2017-10-24 23:28:15.002381 7f1333d08700  1 leveldb: Generated table #524185: 
17970 keys, 2128900 bytes
2017-10-24 23:28:15.198899 7f1333d08700  1 leveldb: Generated table #524186: 
22386 keys, 2128610 bytes
2017-10-24 23:28:15.337819 7f1333d08700  1 leveldb: Generated table #524187: 
3890 keys, 371799 bytes
2017-10-24 23:28:15.693433 7f1333d08700  1 leveldb: Generated table #524188: 
21984 keys, 2128947 bytes
2017-10-24 23:28:15.874955 7f1333d08700  1 leveldb: Generated table #524189: 
9565 keys, 1207375 bytes
2017-10-24 23:28:16.253599 7f1333d08700  1 leveldb: Generated table #524190: 
21999 keys, 2129625 bytes
2017-10-24 23:28:16.576250 7f1333d08700  1 leveldb: Generated table #524191: 
21544 keys, 2128033 bytes


Strace on an OSD process during startup reveals what appears to be parsing of 
objects and calling getxattr.

The bulk of the time is spent on parsing the objects and performing the 
getxattr system calls... for example:

(Full lines truncated intentionally for brevity).
[pid 3068964] 
getxattr("/osd/174/current/20.6a4s7_head/default.7385.13\...(ommitted)
[pid 3068964] 
getxattr("/osd/174/current/20.6a4s7_head/default.7385.5\...(ommitted)
[pid 3068964] 
getxattr("/osd/174/current/20.6a4s7_head/default.7385.5\...(ommitted)

Cluster details:
- 9 hosts (32 cores, 256 GB RAM, Ubuntu 14.04 3.16.0-77-generic, 72 6TB SAS2 
drives per host, collocated journals)
- Pre-upgrade: Hammer (ceph version 0.94.6)
- Post-upgrade: Jewel (ceph version 10.2.9)
- object storage use only
- erasure coded (k=7, m=2) .rgw.buckets pool (8192 pgs)
- failure domain of host
- cluster is currentl

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-26 Thread Christian Wuerdig
Hm, no necessarily directly related to your performance problem,
however: These SSDs have a listed endurance of 72TB total data written
- over a 5 year period that's 40GB a day or approx 0.04 DWPD. Given
that you run the journal for each OSD on the same disk, that's
effectively at most 0.02 DWPD (about 20GB per day per disk). I don't
know many who'd run a cluster on disks like those. Also it means these
are pure consumer drives which have a habit of exhibiting random
performance at times (based on unquantified anecdotal personal
experience with other consumer model SSDs). I wouldn't touch these
with a long stick for anything but small toy-test clusters.

On Fri, Oct 27, 2017 at 3:44 AM, Russell Glaue  wrote:
>
> On Wed, Oct 25, 2017 at 7:09 PM, Maged Mokhtar  wrote:
>>
>> It depends on what stage you are in:
>> in production, probably the best thing is to setup a monitoring tool
>> (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well as
>> resource load. This will, among other things, show you if you have slowing
>> disks.
>
> I am monitoring Ceph performance with ceph-dash
> (http://cephdash.crapworks.de/), that is why I knew to look into the slow
> writes issue. And I am using Monitorix (http://www.monitorix.org/) to
> monitor system resources, including Disk I/O.
>
> However, though I can monitor individual disk performance at the system
> level, it seems Ceph does not tax any disk more than the worst disk. So in
> my monitoring charts, all disks have the same performance.
> All four nodes are base-lining at 50 writes/sec during the cluster's normal
> load, with the non-problem hosts spiking up to 150, and the problem host
> only spikes up to 100.
> But during the window of time I took the problem host OSDs down to run the
> bench tests, the OSDs on the other nodes increased to 300-500 writes/sec.
> Otherwise, the chart looks the same for all disks on all ceph nodes/hosts.
>
>> Before production you should first make sure your SSDs are suitable for
>> Ceph, either by being recommend by other Ceph users or you test them
>> yourself for sync writes performance using fio tool as outlined earlier.
>> Then after you build your cluster you can use rados and/or rbd bencmark
>> tests to benchmark your cluster and find bottlenecks using atop/sar/collectl
>> which will help you tune your cluster.
>
> All 36 OSDs are: Crucial_CT960M500SSD1
>
> Rados bench tests were done at the beginning. The speed was much faster than
> it is now. I cannot recall the test results, someone else on my team ran
> them. Recently, I had thought the slow disk problem was a configuration
> issue with Ceph - before I posted here. Now we are hoping it may be resolved
> with a firmware update. (If it is firmware related, rebooting the problem
> node may temporarily resolve this)
>
>>
>> Though you did see better improvements, your cluster with 27 SSDs should
>> give much higher numbers than 3k iops. If you are running rados bench while
>> you have other client ios, then obviously the reported number by the tool
>> will be less than what the cluster is actually giving...which you can find
>> out via ceph status command, it will print the total cluster throughput and
>> iops. If the total is still low i would recommend running the fio raw disk
>> test, maybe the disks are not suitable. When you removed your 9 bad disk
>> from 36 and your performance doubled, you still had 2 other disk slowing
>> you..meaning near 100% busy ? It makes me feel the disk type used is not
>> good. For these near 100% busy disks can you also measure their raw disk
>> iops at that load (i am not sure atop shows this, if not use
>> sat/syssyat/iostat/collecl).
>
> I ran another bench test today with all 36 OSDs up. The overall performance
> was improved slightly compared to the original tests. Only 3 OSDs on the
> problem host were increasing to 101% disk busy.
> The iops reported from ceph status during this bench test ranged from 1.6k
> to 3.3k, the test yielding 4k iops.
>
> Yes, the two other OSDs/disks that were the bottleneck were at 101% disk
> busy. The other OSD disks on the same host were sailing along at like 50-60%
> busy.
>
> All 36 OSD disks are exactly the same disk. They were all purchased at the
> same time. All were installed at the same time.
> I cannot believe it is a problem with the disk model. A failed/bad disk,
> perhaps is possible. But the disk model itself cannot be the problem based
> on what I am seeing. If I am seeing bad performance on all disks on one ceph
> node/host, but not on another ceph node with these same disks, it has to be
> some other factor. This is why I am now guessing a firmware upgrade is
> needed.
>
> Also, as I eluded to here earlier. I took down all 9 OSDs in the problem
> host yesterday to run the bench test.
> Today, with those 9 OSDs back online, I rerun the bench test, I am see 2-3
> OSD disks with 101% busy on the problem host, and the other disks are lower
> than 80%. So, for whatever reaso

Re: [ceph-users] Install Ceph on Fedora 26

2017-10-26 Thread GiangCoi Mr
Hi Denes Dolhay,
This is error when I run command: ceph-deploy mon create-initial

[ceph_deploy.mon][INFO  ] mon.ceph-node1 monitor has reached quorum!
[ceph_deploy.mon][INFO  ] all initial monitors are running and have formed
quorum
[ceph_deploy.mon][INFO  ] Running gatherkeys...
[ceph_deploy.gatherkeys][DEBUG ] Checking ceph-node1 for
/etc/ceph/ceph.client.admin.keyring
[ceph-node1][DEBUG ] connected to host: ceph-node1
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] fetch remote file

*[ceph_deploy.gatherkeys][WARNIN] Unable to find
/etc/ceph/ceph.client.admin.keyring on ceph-node1[ceph_deploy][ERROR ]
KeyNotFoundError: Could not find keyring file:
/etc/ceph/ceph.client.admin.keyring on host ceph-node1*

It can not find */etc/ceph/ceph.client.admin.keyring,* and in directory I
run ceph-deploy, it only have 3 files: ceph.conf, ceph-deploy-ceph.log,
ceph.mon.keyring

Regards,
GiangLT



2017-10-26 22:47 GMT+07:00 Denes Dolhay :

> Hi,
>
> If you ssh to ceph-node1, what are the rights, owner, group, content of
> /etc/ceph/ceph.client.admin.keyring ?
>
> [you should mask out the key, just show us that it is there]
>
> On 10/26/2017 05:41 PM, GiangCoi Mr wrote:
>
> Hi Denes.
> I created with command: ceph-deploy new ceph-node1
>
> Sent from my iPhone
>
> On Oct 26, 2017, at 10:34 PM, Denes Dolhay  wrote:
>
> Hi,
>
> Did you to create a cluster first?
>
> ceph-deploy new {initial-monitor-node(s)}
>
> Cheers,
> Denes.
>
> On 10/26/2017 05:25 PM, GiangCoi Mr wrote:
>
> Dear Alan Johnson
> I install with command: ceph-deploy install ceph-node1 —no-adjust-repos. When 
> install success, I run command: ceph-deploy mon ceph-node1, it’s error 
> because it didn’t find file ceph.client.admin.keyring. So how I make 
> permission for this file?
>
> Sent from my iPhone
>
>
> On Oct 26, 2017, at 10:18 PM, Alan Johnson  
>  wrote:
>
> If using defaults try
> chmod +r /etc/ceph/ceph.client.admin.keyring
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> ] On Behalf Of GiangCoi Mr
> Sent: Thursday, October 26, 2017 11:09 AM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Install Ceph on Fedora 26
>
> Hi all
> I am installing ceph luminous on fedora 26, I installed ceph luminous success 
> but when I install ceph mon, it’s error: it doesn’t find 
> client.admin.keyring. How I can fix it, Thank so much
>
> Regard,
> GiangLT
>
> Sent from my iPhone
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttps://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ft5M-1rZQeWgdMry9v8-eNr4&r=eqMv5yFFe6-lAM9jJfUusNFzzcFAGwmoAez_acfPOtw&m=YEG8qsLFsc0XjSKKJCIlkSn9C_WtCejsaUPv2p5ieRk&s=orrv_azJsm9kAmXQLjUHM6ClwXx-8oQFN89GyknIeN0&e=
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com