Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread José M . Martín
# ceph -s
cluster 29a91870-2ed2-40dc-969e-07b22f37928b
 health HEALTH_ERR
clock skew detected on mon.loki04
155 pgs are stuck inactive for more than 300 seconds
7 pgs backfill_toofull
1028 pgs backfill_wait
48 pgs backfilling
892 pgs degraded
20 pgs down
153 pgs incomplete
2 pgs peering
155 pgs stuck inactive
1077 pgs stuck unclean
892 pgs undersized
1471 requests are blocked > 32 sec
recovery 3195781/36460868 objects degraded (8.765%)
recovery 5079026/36460868 objects misplaced (13.930%)
mds0: Behind on trimming (175/30)
noscrub,nodeep-scrub flag(s) set
Monitor clock skew detected
 monmap e5: 5 mons at
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
election epoch 4028, quorum 0,1,2,3,4
loki01,loki02,loki03,loki04,loki05
  fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
 osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
flags noscrub,nodeep-scrub
  pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
45892 GB used, 34024 GB / 79916 GB avail
3195781/36460868 objects degraded (8.765%)
5079026/36460868 objects misplaced (13.930%)
3640 active+clean
 838 active+undersized+degraded+remapped+wait_backfill
 184 active+remapped+wait_backfill
 134 incomplete
  48 active+undersized+degraded+remapped+backfilling
  19 down+incomplete
   6
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
   1 active+remapped+backfill_toofull
   1 peering
   1 down+peering
recovery io 93909 kB/s, 10 keys/s, 67 objects/s



# ceph osd tree
ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 77.22777 root default 
 -9 27.14778 rack sala1   
 -2  5.41974 host loki01  
 14  0.90329 osd.14   up  1.0  1.0
 15  0.90329 osd.15   up  1.0  1.0
 16  0.90329 osd.16   up  1.0  1.0
 17  0.90329 osd.17   up  1.0  1.0
 18  0.90329 osd.18   up  1.0  1.0
 25  0.90329 osd.25   up  1.0  1.0
 -4  3.61316 host loki03  
  0  0.90329 osd.0up  1.0  1.0
  2  0.90329 osd.2up  1.0  1.0
 20  0.90329 osd.20   up  1.0  1.0
 24  0.90329 osd.24   up  1.0  1.0
 -3  9.05714 host loki02  
  1  0.90300 osd.1up  0.90002  1.0
 31  2.72198 osd.31   up  1.0  1.0
 29  0.90329 osd.29   up  1.0  1.0
 30  0.90329 osd.30   up  1.0  1.0
 33  0.90329 osd.33   up  1.0  1.0
 32  2.72229 osd.32   up  1.0  1.0
 -5  9.05774 host loki04  
  3  0.90329 osd.3up  1.0  1.0
 19  0.90329 osd.19   up  1.0  1.0
 21  0.90329 osd.21   up  1.0  1.0
 22  0.90329 osd.22   up  1.0  1.0
 23  2.72229 osd.23   up  1.0  1.0
 28  2.72229 osd.28   up  1.0  1.0
-10 24.61000 rack sala2.2 
 -6 24.61000 host loki05  
  5  2.73000 osd.5up  1.0  1.0
  6  2.73000 osd.6up  1.0  1.0
  9  2.73000 osd.9up  1.0  1.0
 10  2.73000 osd.10   up  1.0  1.0
 11  2.73000 osd.11   up  1.0  1.0
 12  2.73000 osd.12   up  1.0  1.0
 13  2.73000 osd.13   up  1.0  1.0
  4  2.73000 osd.4up  1.0  1.0
  8  2.73000 osd.8up  1.0  1.0
  7  0.03999 osd.7up  1.0  1.0
-12 25.46999 rack sala2.1 
-11 25.46999 host loki06  
 34  2.73000 osd.34   up  1.0  1.0
 35  2.73000  

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Henrik Korkuc
I am not sure about "incomplete" part out of my head, but you can try 
setting min_size to 1 for pools toreactivate some PG, if they are 
down/inactive due to missing replicas.


On 17-01-31 10:24, José M. Martín wrote:

# ceph -s
 cluster 29a91870-2ed2-40dc-969e-07b22f37928b
  health HEALTH_ERR
 clock skew detected on mon.loki04
 155 pgs are stuck inactive for more than 300 seconds
 7 pgs backfill_toofull
 1028 pgs backfill_wait
 48 pgs backfilling
 892 pgs degraded
 20 pgs down
 153 pgs incomplete
 2 pgs peering
 155 pgs stuck inactive
 1077 pgs stuck unclean
 892 pgs undersized
 1471 requests are blocked > 32 sec
 recovery 3195781/36460868 objects degraded (8.765%)
 recovery 5079026/36460868 objects misplaced (13.930%)
 mds0: Behind on trimming (175/30)
 noscrub,nodeep-scrub flag(s) set
 Monitor clock skew detected
  monmap e5: 5 mons at
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
 election epoch 4028, quorum 0,1,2,3,4
loki01,loki02,loki03,loki04,loki05
   fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
  osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
 flags noscrub,nodeep-scrub
   pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
 45892 GB used, 34024 GB / 79916 GB avail
 3195781/36460868 objects degraded (8.765%)
 5079026/36460868 objects misplaced (13.930%)
 3640 active+clean
  838 active+undersized+degraded+remapped+wait_backfill
  184 active+remapped+wait_backfill
  134 incomplete
   48 active+undersized+degraded+remapped+backfilling
   19 down+incomplete
6
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
1 active+remapped+backfill_toofull
1 peering
1 down+peering
recovery io 93909 kB/s, 10 keys/s, 67 objects/s



# ceph osd tree
ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
  -1 77.22777 root default
  -9 27.14778 rack sala1
  -2  5.41974 host loki01
  14  0.90329 osd.14   up  1.0  1.0
  15  0.90329 osd.15   up  1.0  1.0
  16  0.90329 osd.16   up  1.0  1.0
  17  0.90329 osd.17   up  1.0  1.0
  18  0.90329 osd.18   up  1.0  1.0
  25  0.90329 osd.25   up  1.0  1.0
  -4  3.61316 host loki03
   0  0.90329 osd.0up  1.0  1.0
   2  0.90329 osd.2up  1.0  1.0
  20  0.90329 osd.20   up  1.0  1.0
  24  0.90329 osd.24   up  1.0  1.0
  -3  9.05714 host loki02
   1  0.90300 osd.1up  0.90002  1.0
  31  2.72198 osd.31   up  1.0  1.0
  29  0.90329 osd.29   up  1.0  1.0
  30  0.90329 osd.30   up  1.0  1.0
  33  0.90329 osd.33   up  1.0  1.0
  32  2.72229 osd.32   up  1.0  1.0
  -5  9.05774 host loki04
   3  0.90329 osd.3up  1.0  1.0
  19  0.90329 osd.19   up  1.0  1.0
  21  0.90329 osd.21   up  1.0  1.0
  22  0.90329 osd.22   up  1.0  1.0
  23  2.72229 osd.23   up  1.0  1.0
  28  2.72229 osd.28   up  1.0  1.0
-10 24.61000 rack sala2.2
  -6 24.61000 host loki05
   5  2.73000 osd.5up  1.0  1.0
   6  2.73000 osd.6up  1.0  1.0
   9  2.73000 osd.9up  1.0  1.0
  10  2.73000 osd.10   up  1.0  1.0
  11  2.73000 osd.11   up  1.0  1.0
  12  2.73000 osd.12   up  1.0  1.0
  13  2.73000 osd.13   up  1.0  1.0
   4  2.73000 osd.4up  1.0  1.0
   8  2.73000 osd.8up  1.0  1.0
   7  0.03999 osd.7up  1.0  1.0
-12 25.46999 rack sala2.1
-11 25.46999 host loki06
  34  2.73000 osd.34   up  1.0  1.0
  35  2.73000 osd.35   up  1.0  1.0
  36  2.

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg,

Now we could see the same problem exists for kraken-filestore also.
Attached the requested osdmap and crushmap.

OSD.1 was stopped in this following procedure and OSD map for a PG is
displayed.

ceph osd dump | grep cdvr_ec
2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
stripe_width 4128

[root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap


[root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 /tmp/osdmap
osdmaptool: osdmap file '/tmp/osdmap'
 object 'object1' -> 2.2bc -> [20,47,1,36]

[root@ca-cn2 ~]# ceph osd map cdvr_ec object1
osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc) ->
up ([20,47,1,36], p20) acting ([20,47,1,36], p20)

[root@ca-cn2 ~]# systemctl stop ceph-osd@1.service

[root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1


[root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 /tmp/osdmap1
osdmaptool: osdmap file '/tmp/osdmap1'
 object 'object1' -> 2.2bc -> [20,47,2147483647,36]


[root@ca-cn2 ~]# ceph osd map cdvr_ec object1
osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc) ->
up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)


[root@ca-cn2 ~]# ceph osd tree
2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
ID WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 327.47314 root default
-2  65.49463 host ca-cn4
 3   5.45789 osd.3up  1.0  1.0
 5   5.45789 osd.5up  1.0  1.0
10   5.45789 osd.10   up  1.0  1.0
16   5.45789 osd.16   up  1.0  1.0
21   5.45789 osd.21   up  1.0  1.0
27   5.45789 osd.27   up  1.0  1.0
30   5.45789 osd.30   up  1.0  1.0
35   5.45789 osd.35   up  1.0  1.0
42   5.45789 osd.42   up  1.0  1.0
47   5.45789 osd.47   up  1.0  1.0
51   5.45789 osd.51   up  1.0  1.0
53   5.45789 osd.53   up  1.0  1.0
-3  65.49463 host ca-cn3
 2   5.45789 osd.2up  1.0  1.0
 6   5.45789 osd.6up  1.0  1.0
11   5.45789 osd.11   up  1.0  1.0
15   5.45789 osd.15   up  1.0  1.0
20   5.45789 osd.20   up  1.0  1.0
25   5.45789 osd.25   up  1.0  1.0
29   5.45789 osd.29   up  1.0  1.0
33   5.45789 osd.33   up  1.0  1.0
38   5.45789 osd.38   up  1.0  1.0
40   5.45789 osd.40   up  1.0  1.0
45   5.45789 osd.45   up  1.0  1.0
49   5.45789 osd.49   up  1.0  1.0
-4  65.49463 host ca-cn5
 0   5.45789 osd.0up  1.0  1.0
 7   5.45789 osd.7up  1.0  1.0
12   5.45789 osd.12   up  1.0  1.0
17   5.45789 osd.17   up  1.0  1.0
23   5.45789 osd.23   up  1.0  1.0
26   5.45789 osd.26   up  1.0  1.0
32   5.45789 osd.32   up  1.0  1.0
34   5.45789 osd.34   up  1.0  1.0
41   5.45789 osd.41   up  1.0  1.0
46   5.45789 osd.46   up  1.0  1.0
52   5.45789 osd.52   up  1.0  1.0
56   5.45789 osd.56   up  1.0  1.0
-5  65.49463 host ca-cn1
 4   5.45789 osd.4up  1.0  1.0
 9   5.45789 osd.9up  1.0  1.0
14   5.45789 osd.14   up  1.0  1.0
19   5.45789 osd.19   up  1.0  1.0
24   5.45789 osd.24   up  1.0  1.0
36   5.45789 osd.36   up  1.0  1.0
43   5.45789 osd.43   up  1.0  1.0
50   5.45789 osd.50   up  1.0  1.0
55   5.45789 osd.55   up  1.0  1.0
57   5.45789 osd.57   up  1.0  1.0
58   5.45789 osd.58   up  1.0  1.0
59   5.45789 osd.59   up  1.0

[ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Martin Palma
Hi all,

our cluster is currently performing a big expansion and is in recovery
mode (we doubled in size and osd# from 600 TB to 1,2 TB).

Now we get the following message from our monitor nodes:

mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

Reading [0] it says that it is normal in a state of active data
rebalance and after it is finished it will be compacted.

Should we wait until the recovery is finished or should we perform
"ceph tell mon.{id} compact" now during recovery?

Best,
Martin

[0] https://access.redhat.com/solutions/1982273
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Wido den Hollander

> Op 31 januari 2017 om 10:22 schreef Martin Palma :
> 
> 
> Hi all,
> 
> our cluster is currently performing a big expansion and is in recovery
> mode (we doubled in size and osd# from 600 TB to 1,2 TB).
> 

Yes, that is to be expected. When not all PGs are active+clean the MONs will 
not trim their datastore.

> Now we get the following message from our monitor nodes:
> 
> mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail
> 
> Reading [0] it says that it is normal in a state of active data
> rebalance and after it is finished it will be compacted.
> 
> Should we wait until the recovery is finished or should we perform
> "ceph tell mon.{id} compact" now during recovery?
> 

Mainly wait and make sure there is enough disk space. You can try a compact, 
but that can take the mon offline temp.

Just make sure you have enough diskspace :)

Wido

> Best,
> Martin
> 
> [0] https://access.redhat.com/solutions/1982273
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread José M . Martín
Already min_size = 1

Thanks,
Jose M. Martín

El 31/01/17 a las 09:44, Henrik Korkuc escribió:
> I am not sure about "incomplete" part out of my head, but you can try
> setting min_size to 1 for pools toreactivate some PG, if they are
> down/inactive due to missing replicas.
>
> On 17-01-31 10:24, José M. Martín wrote:
>> # ceph -s
>>  cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>>   health HEALTH_ERR
>>  clock skew detected on mon.loki04
>>  155 pgs are stuck inactive for more than 300 seconds
>>  7 pgs backfill_toofull
>>  1028 pgs backfill_wait
>>  48 pgs backfilling
>>  892 pgs degraded
>>  20 pgs down
>>  153 pgs incomplete
>>  2 pgs peering
>>  155 pgs stuck inactive
>>  1077 pgs stuck unclean
>>  892 pgs undersized
>>  1471 requests are blocked > 32 sec
>>  recovery 3195781/36460868 objects degraded (8.765%)
>>  recovery 5079026/36460868 objects misplaced (13.930%)
>>  mds0: Behind on trimming (175/30)
>>  noscrub,nodeep-scrub flag(s) set
>>  Monitor clock skew detected
>>   monmap e5: 5 mons at
>> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>>
>>  election epoch 4028, quorum 0,1,2,3,4
>> loki01,loki02,loki03,loki04,loki05
>>fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>>   osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>>  flags noscrub,nodeep-scrub
>>pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
>>  45892 GB used, 34024 GB / 79916 GB avail
>>  3195781/36460868 objects degraded (8.765%)
>>  5079026/36460868 objects misplaced (13.930%)
>>  3640 active+clean
>>   838 active+undersized+degraded+remapped+wait_backfill
>>   184 active+remapped+wait_backfill
>>   134 incomplete
>>48 active+undersized+degraded+remapped+backfilling
>>19 down+incomplete
>> 6
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>> 1 active+remapped+backfill_toofull
>> 1 peering
>> 1 down+peering
>> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>>
>>
>>
>> # ceph osd tree
>> ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>   -1 77.22777 root default
>>   -9 27.14778 rack sala1
>>   -2  5.41974 host loki01
>>   14  0.90329 osd.14   up  1.0  1.0
>>   15  0.90329 osd.15   up  1.0  1.0
>>   16  0.90329 osd.16   up  1.0  1.0
>>   17  0.90329 osd.17   up  1.0  1.0
>>   18  0.90329 osd.18   up  1.0  1.0
>>   25  0.90329 osd.25   up  1.0  1.0
>>   -4  3.61316 host loki03
>>0  0.90329 osd.0up  1.0  1.0
>>2  0.90329 osd.2up  1.0  1.0
>>   20  0.90329 osd.20   up  1.0  1.0
>>   24  0.90329 osd.24   up  1.0  1.0
>>   -3  9.05714 host loki02
>>1  0.90300 osd.1up  0.90002  1.0
>>   31  2.72198 osd.31   up  1.0  1.0
>>   29  0.90329 osd.29   up  1.0  1.0
>>   30  0.90329 osd.30   up  1.0  1.0
>>   33  0.90329 osd.33   up  1.0  1.0
>>   32  2.72229 osd.32   up  1.0  1.0
>>   -5  9.05774 host loki04
>>3  0.90329 osd.3up  1.0  1.0
>>   19  0.90329 osd.19   up  1.0  1.0
>>   21  0.90329 osd.21   up  1.0  1.0
>>   22  0.90329 osd.22   up  1.0  1.0
>>   23  2.72229 osd.23   up  1.0  1.0
>>   28  2.72229 osd.28   up  1.0  1.0
>> -10 24.61000 rack sala2.2
>>   -6 24.61000 host loki05
>>5  2.73000 osd.5up  1.0  1.0
>>6  2.73000 osd.6up  1.0  1.0
>>9  2.73000 osd.9up  1.0  1.0
>>   10  2.73000 osd.10   up  1.0  1.0
>>   11  2.73000 osd.11   up  1.0  1.0
>>   12  2.73000 osd.12   up  1.0  1.0
>>   13  2.73000 osd.13   up  1.0  1.0
>>4  2.73000 osd.4up  1.0

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Maxime Guyot
Hi José,

Too late, but you could have updated the CRUSHmap *before* moving the disks. 
Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2  
host=loki05” would move the osd.0 to loki05 and would trigger the appropriate 
PG movements before any physical move. Then the physical move is done as usual: 
set noout, stop osd, physically move, active osd, unnset noout.

It’s a way to trigger the data movement overnight (maybe with a cron) and do 
the physical move at your own convenience in the morning.

Cheers, 
Maxime 

On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" 
 wrote:

Already min_size = 1

Thanks,
Jose M. Martín

El 31/01/17 a las 09:44, Henrik Korkuc escribió:
> I am not sure about "incomplete" part out of my head, but you can try
> setting min_size to 1 for pools toreactivate some PG, if they are
> down/inactive due to missing replicas.
>
> On 17-01-31 10:24, José M. Martín wrote:
>> # ceph -s
>>  cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>>   health HEALTH_ERR
>>  clock skew detected on mon.loki04
>>  155 pgs are stuck inactive for more than 300 seconds
>>  7 pgs backfill_toofull
>>  1028 pgs backfill_wait
>>  48 pgs backfilling
>>  892 pgs degraded
>>  20 pgs down
>>  153 pgs incomplete
>>  2 pgs peering
>>  155 pgs stuck inactive
>>  1077 pgs stuck unclean
>>  892 pgs undersized
>>  1471 requests are blocked > 32 sec
>>  recovery 3195781/36460868 objects degraded (8.765%)
>>  recovery 5079026/36460868 objects misplaced (13.930%)
>>  mds0: Behind on trimming (175/30)
>>  noscrub,nodeep-scrub flag(s) set
>>  Monitor clock skew detected
>>   monmap e5: 5 mons at
>> 
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>>
>>  election epoch 4028, quorum 0,1,2,3,4
>> loki01,loki02,loki03,loki04,loki05
>>fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>>   osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>>  flags noscrub,nodeep-scrub
>>pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
>>  45892 GB used, 34024 GB / 79916 GB avail
>>  3195781/36460868 objects degraded (8.765%)
>>  5079026/36460868 objects misplaced (13.930%)
>>  3640 active+clean
>>   838 active+undersized+degraded+remapped+wait_backfill
>>   184 active+remapped+wait_backfill
>>   134 incomplete
>>48 active+undersized+degraded+remapped+backfilling
>>19 down+incomplete
>> 6
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>> 1 active+remapped+backfill_toofull
>> 1 peering
>> 1 down+peering
>> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>>
>>
>>
>> # ceph osd tree
>> ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>   -1 77.22777 root default
>>   -9 27.14778 rack sala1
>>   -2  5.41974 host loki01
>>   14  0.90329 osd.14   up  1.0  1.0
>>   15  0.90329 osd.15   up  1.0  1.0
>>   16  0.90329 osd.16   up  1.0  1.0
>>   17  0.90329 osd.17   up  1.0  1.0
>>   18  0.90329 osd.18   up  1.0  1.0
>>   25  0.90329 osd.25   up  1.0  1.0
>>   -4  3.61316 host loki03
>>0  0.90329 osd.0up  1.0  1.0
>>2  0.90329 osd.2up  1.0  1.0
>>   20  0.90329 osd.20   up  1.0  1.0
>>   24  0.90329 osd.24   up  1.0  1.0
>>   -3  9.05714 host loki02
>>1  0.90300 osd.1up  0.90002  1.0
>>   31  2.72198 osd.31   up  1.0  1.0
>>   29  0.90329 osd.29   up  1.0  1.0
>>   30  0.90329 osd.30   up  1.0  1.0
>>   33  0.90329 osd.33   up  1.0  1.0
>>   32  2.72229 osd.32   up  1.0  1.0
>>   -5  9.05774 host loki04
>>3  0.90329 osd.3up  1.0  1.0
>>   19  0.90329  

Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Martin Palma
Hi Wido,

thank you for the clarification. We will wait until recovery is over
we have plenty of space on the mons :-)

Best,
Martin

On Tue, Jan 31, 2017 at 10:35 AM, Wido den Hollander  wrote:
>
>> Op 31 januari 2017 om 10:22 schreef Martin Palma :
>>
>>
>> Hi all,
>>
>> our cluster is currently performing a big expansion and is in recovery
>> mode (we doubled in size and osd# from 600 TB to 1,2 TB).
>>
>
> Yes, that is to be expected. When not all PGs are active+clean the MONs will 
> not trim their datastore.
>
>> Now we get the following message from our monitor nodes:
>>
>> mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail
>>
>> Reading [0] it says that it is normal in a state of active data
>> rebalance and after it is finished it will be compacted.
>>
>> Should we wait until the recovery is finished or should we perform
>> "ceph tell mon.{id} compact" now during recovery?
>>
>
> Mainly wait and make sure there is enough disk space. You can try a compact, 
> but that can take the mon offline temp.
>
> Just make sure you have enough diskspace :)
>
> Wido
>
>> Best,
>> Martin
>>
>> [0] https://access.redhat.com/solutions/1982273
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread José M . Martín
Thanks.
I just realized I keep some of the original OSD. If it contains some of
the incomplete PGs , would be possible to add then into the new disks?
Maybe following this steps? http://ceph.com/community/incomplete-pgs-oh-my/

El 31/01/17 a las 10:44, Maxime Guyot escribió:
> Hi José,
>
> Too late, but you could have updated the CRUSHmap *before* moving the disks. 
> Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2  
> host=loki05” would move the osd.0 to loki05 and would trigger the appropriate 
> PG movements before any physical move. Then the physical move is done as 
> usual: set noout, stop osd, physically move, active osd, unnset noout.
>
> It’s a way to trigger the data movement overnight (maybe with a cron) and do 
> the physical move at your own convenience in the morning.
>
> Cheers, 
> Maxime 
>
> On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" 
>  wrote:
>
> Already min_size = 1
> 
> Thanks,
> Jose M. Martín
> 
> El 31/01/17 a las 09:44, Henrik Korkuc escribió:
> > I am not sure about "incomplete" part out of my head, but you can try
> > setting min_size to 1 for pools toreactivate some PG, if they are
> > down/inactive due to missing replicas.
> >
> > On 17-01-31 10:24, José M. Martín wrote:
> >> # ceph -s
> >>  cluster 29a91870-2ed2-40dc-969e-07b22f37928b
> >>   health HEALTH_ERR
> >>  clock skew detected on mon.loki04
> >>  155 pgs are stuck inactive for more than 300 seconds
> >>  7 pgs backfill_toofull
> >>  1028 pgs backfill_wait
> >>  48 pgs backfilling
> >>  892 pgs degraded
> >>  20 pgs down
> >>  153 pgs incomplete
> >>  2 pgs peering
> >>  155 pgs stuck inactive
> >>  1077 pgs stuck unclean
> >>  892 pgs undersized
> >>  1471 requests are blocked > 32 sec
> >>  recovery 3195781/36460868 objects degraded (8.765%)
> >>  recovery 5079026/36460868 objects misplaced (13.930%)
> >>  mds0: Behind on trimming (175/30)
> >>  noscrub,nodeep-scrub flag(s) set
> >>  Monitor clock skew detected
> >>   monmap e5: 5 mons at
> >> 
> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
> >>
> >>  election epoch 4028, quorum 0,1,2,3,4
> >> loki01,loki02,loki03,loki04,loki05
> >>fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
> >>   osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
> >>  flags noscrub,nodeep-scrub
> >>pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 
> kobjects
> >>  45892 GB used, 34024 GB / 79916 GB avail
> >>  3195781/36460868 objects degraded (8.765%)
> >>  5079026/36460868 objects misplaced (13.930%)
> >>  3640 active+clean
> >>   838 active+undersized+degraded+remapped+wait_backfill
> >>   184 active+remapped+wait_backfill
> >>   134 incomplete
> >>48 active+undersized+degraded+remapped+backfilling
> >>19 down+incomplete
> >> 6
> >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
> >> 1 active+remapped+backfill_toofull
> >> 1 peering
> >> 1 down+peering
> >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
> >>
> >>
> >>
> >> # ceph osd tree
> >> ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>   -1 77.22777 root default
> >>   -9 27.14778 rack sala1
> >>   -2  5.41974 host loki01
> >>   14  0.90329 osd.14   up  1.0  1.0
> >>   15  0.90329 osd.15   up  1.0  1.0
> >>   16  0.90329 osd.16   up  1.0  1.0
> >>   17  0.90329 osd.17   up  1.0  1.0
> >>   18  0.90329 osd.18   up  1.0  1.0
> >>   25  0.90329 osd.25   up  1.0  1.0
> >>   -4  3.61316 host loki03
> >>0  0.90329 osd.0up  1.0  1.0
> >>2  0.90329 osd.2up  1.0  1.0
> >>   20  0.90329 osd.20   up  1.0  1.0
> >>   24  0.90329 osd.24   up  1.0  1.0
> >>   -3  9.05714 host loki02
> >>1  0.90300 osd.1up  0.90002  1.0
> >>   31  2.72198 osd.31   u

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread José M . Martín
Any idea of how could I recover files from the filesystem mount?
Doing a cp, it hungs when find a damaged file/folder. I would be happy
getting no damaged files

Thanks

El 31/01/17 a las 11:19, José M. Martín escribió:
> Thanks.
> I just realized I keep some of the original OSD. If it contains some of
> the incomplete PGs , would be possible to add then into the new disks?
> Maybe following this steps? http://ceph.com/community/incomplete-pgs-oh-my/
>
> El 31/01/17 a las 10:44, Maxime Guyot escribió:
>> Hi José,
>>
>> Too late, but you could have updated the CRUSHmap *before* moving the disks. 
>> Something like: “ceph osd crush set osd.0 0.90329 root=default rack=sala2.2  
>> host=loki05” would move the osd.0 to loki05 and would trigger the 
>> appropriate PG movements before any physical move. Then the physical move is 
>> done as usual: set noout, stop osd, physically move, active osd, unnset 
>> noout.
>>
>> It’s a way to trigger the data movement overnight (maybe with a cron) and do 
>> the physical move at your own convenience in the morning.
>>
>> Cheers, 
>> Maxime 
>>
>> On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" 
>>  
>> wrote:
>>
>> Already min_size = 1
>> 
>> Thanks,
>> Jose M. Martín
>> 
>> El 31/01/17 a las 09:44, Henrik Korkuc escribió:
>> > I am not sure about "incomplete" part out of my head, but you can try
>> > setting min_size to 1 for pools toreactivate some PG, if they are
>> > down/inactive due to missing replicas.
>> >
>> > On 17-01-31 10:24, José M. Martín wrote:
>> >> # ceph -s
>> >>  cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>> >>   health HEALTH_ERR
>> >>  clock skew detected on mon.loki04
>> >>  155 pgs are stuck inactive for more than 300 seconds
>> >>  7 pgs backfill_toofull
>> >>  1028 pgs backfill_wait
>> >>  48 pgs backfilling
>> >>  892 pgs degraded
>> >>  20 pgs down
>> >>  153 pgs incomplete
>> >>  2 pgs peering
>> >>  155 pgs stuck inactive
>> >>  1077 pgs stuck unclean
>> >>  892 pgs undersized
>> >>  1471 requests are blocked > 32 sec
>> >>  recovery 3195781/36460868 objects degraded (8.765%)
>> >>  recovery 5079026/36460868 objects misplaced (13.930%)
>> >>  mds0: Behind on trimming (175/30)
>> >>  noscrub,nodeep-scrub flag(s) set
>> >>  Monitor clock skew detected
>> >>   monmap e5: 5 mons at
>> >> 
>> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>> >>
>> >>  election epoch 4028, quorum 0,1,2,3,4
>> >> loki01,loki02,loki03,loki04,loki05
>> >>fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>> >>   osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>> >>  flags noscrub,nodeep-scrub
>> >>pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 
>> kobjects
>> >>  45892 GB used, 34024 GB / 79916 GB avail
>> >>  3195781/36460868 objects degraded (8.765%)
>> >>  5079026/36460868 objects misplaced (13.930%)
>> >>  3640 active+clean
>> >>   838 
>> active+undersized+degraded+remapped+wait_backfill
>> >>   184 active+remapped+wait_backfill
>> >>   134 incomplete
>> >>48 active+undersized+degraded+remapped+backfilling
>> >>19 down+incomplete
>> >> 6
>> >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>> >> 1 active+remapped+backfill_toofull
>> >> 1 peering
>> >> 1 down+peering
>> >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>> >>
>> >>
>> >>
>> >> # ceph osd tree
>> >> ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> >>   -1 77.22777 root default
>> >>   -9 27.14778 rack sala1
>> >>   -2  5.41974 host loki01
>> >>   14  0.90329 osd.14   up  1.0  1.0
>> >>   15  0.90329 osd.15   up  1.0  1.0
>> >>   16  0.90329 osd.16   up  1.0  1.0
>> >>   17  0.90329 osd.17   up  1.0  1.0
>> >>   18  0.90329 osd.18   up  1.0  1.0
>> >>   25  0.90329 osd.25   up  1.0  1.0
>> >>   -4  3.61316 host loki03
>> >>0  0.90329 osd.0up  1.0  1.0
>> >>2  0.90329 osd.2up  1.0  

[ceph-users] rsync service download.ceph.com partially broken

2017-01-31 Thread Björn Lässig
Hi cephers,

 since some time i get errors while rsyncing from the ceph download server:

download.ceph.com:

rsync: send_files failed to open "/debian-jewel/db/lockfile" (in ceph): 
Permission denied (13)
"/debian-jewel/pool/main/c/ceph/.ceph-fuse-dbg_10.1.0-1~bpo80+1_amd64.deb.h0JvHM"
 (in ceph): Permission denied (13)
rsync: send_files failed to open 
"/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.1.0-1trusty_amd64.deb.06D0AZ" 
(in ceph): Permission denied (13)

on eu.ceph.com there are some other files broken:

rsync: send_files failed to open 
"/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.2.4-1trusty_amd64.deb.BnXWIa" 
(in ceph): Permission denied (13)
rsync: send_files failed to open 
"/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.2.4-1xenial_amd64.deb.5Xhv3J" 
(in ceph): Permission denied (13)

Who is in charge for fixing this?

Thanks,
Björn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg,

Following are the test outcomes on EC profile ( n = k + m)



1.   Kraken filestore and bluetore with m=1 , recovery does not start .

2.   Jewel filestore and bluestore with m=1 , recovery happens .

3.   Kraken bluestore all default configuration and m=1, no recovery.

4.   Kraken bluestore with m=2 , recovery happens when one OSD is down
and for 2 OSD fails.



So, the issue seems to be on ceph-kraken release. Your views…



Thanks,

Muthu



On 31 January 2017 at 14:18, Muthusamy Muthiah 
wrote:

> Hi Greg,
>
> Now we could see the same problem exists for kraken-filestore also.
> Attached the requested osdmap and crushmap.
>
> OSD.1 was stopped in this following procedure and OSD map for a PG is
> displayed.
>
> ceph osd dump | grep cdvr_ec
> 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
> stripe_width 4128
>
> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap
>
>
> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 /tmp/osdmap
> osdmaptool: osdmap file '/tmp/osdmap'
>  object 'object1' -> 2.2bc -> [20,47,1,36]
>
> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
> osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
> -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)
>
> [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service
>
> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1
>
>
> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
> /tmp/osdmap1
> osdmaptool: osdmap file '/tmp/osdmap1'
>  object 'object1' -> 2.2bc -> [20,47,2147483647,36]
>
>
> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
> osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
> -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)
>
>
> [root@ca-cn2 ~]# ceph osd tree
> 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> ID WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 327.47314 root default
> -2  65.49463 host ca-cn4
>  3   5.45789 osd.3up  1.0  1.0
>  5   5.45789 osd.5up  1.0  1.0
> 10   5.45789 osd.10   up  1.0  1.0
> 16   5.45789 osd.16   up  1.0  1.0
> 21   5.45789 osd.21   up  1.0  1.0
> 27   5.45789 osd.27   up  1.0  1.0
> 30   5.45789 osd.30   up  1.0  1.0
> 35   5.45789 osd.35   up  1.0  1.0
> 42   5.45789 osd.42   up  1.0  1.0
> 47   5.45789 osd.47   up  1.0  1.0
> 51   5.45789 osd.51   up  1.0  1.0
> 53   5.45789 osd.53   up  1.0  1.0
> -3  65.49463 host ca-cn3
>  2   5.45789 osd.2up  1.0  1.0
>  6   5.45789 osd.6up  1.0  1.0
> 11   5.45789 osd.11   up  1.0  1.0
> 15   5.45789 osd.15   up  1.0  1.0
> 20   5.45789 osd.20   up  1.0  1.0
> 25   5.45789 osd.25   up  1.0  1.0
> 29   5.45789 osd.29   up  1.0  1.0
> 33   5.45789 osd.33   up  1.0  1.0
> 38   5.45789 osd.38   up  1.0  1.0
> 40   5.45789 osd.40   up  1.0  1.0
> 45   5.45789 osd.45   up  1.0  1.0
> 49   5.45789 osd.49   up  1.0  1.0
> -4  65.49463 host ca-cn5
>  0   5.45789 osd.0up  1.0  1.0
>  7   5.45789 osd.7up  1.0  1.0
> 12   5.45789 osd.12   up  1.0  1.0
> 17   5.45789 osd.17   up  1.0  1.0
> 23   5.45789 osd.23   up  1.0  1.0
> 26   5.45789 osd.26   up  1.0  1.0
> 32   5.45789 osd.32   up  1.0  1.0
> 34   5.45789 osd.34   up  1.0  1.0
> 41   5.45789 osd.41   up  1.0  1.0
> 46   5.45789 osd.46   up  1.0  1.0
> 52   5.45789 osd.52   up  1.0  1.0
> 56   5.45789 osd.56   up  1.0  1.0
> -5  65.49463 host ca-cn1
>  4   5.45789 osd.4up  1.0

Re: [ceph-users] [Ceph-mirrors] rsync service download.ceph.com partially broken

2017-01-31 Thread Wido den Hollander

> Op 31 januari 2017 om 13:46 schreef Björn Lässig :
> 
> 
> Hi cephers,
> 
>  since some time i get errors while rsyncing from the ceph download server:
> 
> download.ceph.com:
> 
> rsync: send_files failed to open "/debian-jewel/db/lockfile" (in ceph): 
> Permission denied (13)
> "/debian-jewel/pool/main/c/ceph/.ceph-fuse-dbg_10.1.0-1~bpo80+1_amd64.deb.h0JvHM"
>  (in ceph): Permission denied (13)
> rsync: send_files failed to open 
> "/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.1.0-1trusty_amd64.deb.06D0AZ"
>  (in ceph): Permission denied (13)
> 
> on eu.ceph.com there are some other files broken:
> 
> rsync: send_files failed to open 
> "/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.2.4-1trusty_amd64.deb.BnXWIa"
>  (in ceph): Permission denied (13)
> rsync: send_files failed to open 
> "/debian-jewel/pool/main/c/ceph/.ceph-test-dbg_10.2.4-1xenial_amd64.deb.5Xhv3J"
>  (in ceph): Permission denied (13)
> 
> Who is in charge for fixing this?
> 

People at Red Hat maintaining this. This seems to pop up now and then. These 
files seem to be owned by root and are left-over from a rsync somewhere on 
download.ceph.com

Wido

> Thanks,
> Björn
> ___
> Ceph-mirrors mailing list
> ceph-mirr...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-mirrors-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread David Turner
If you do have a large enough drive on all of your mons (and always intend to 
do so) you can increase the mon store warning threshold in the config file so 
that it no longer warns at 15360 MB.



[cid:image5e6ea5.JPG@20abe996.44926dad]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido den 
Hollander [w...@42on.com]
Sent: Tuesday, January 31, 2017 2:35 AM
To: Martin Palma; CEPH list
Subject: Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 
MB -- 94% avail

> Op 31 januari 2017 om 10:22 schreef Martin Palma :
>
>
> Hi all,
>
> our cluster is currently performing a big expansion and is in recovery
> mode (we doubled in size and osd# from 600 TB to 1,2 TB).
>

Yes, that is to be expected. When not all PGs are active+clean the MONs will 
not trim their datastore.

> Now we get the following message from our monitor nodes:
>
> mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail
>
> Reading [0] it says that it is normal in a state of active data
> rebalance and after it is finished it will be compacted.
>
> Should we wait until the recovery is finished or should we perform
> "ceph tell mon.{id} compact" now during recovery?
>

Mainly wait and make sure there is enough disk space. You can try a compact, 
but that can take the mon offline temp.

Just make sure you have enough diskspace :)

Wido

> Best,
> Martin
>
> [0] https://access.redhat.com/solutions/1982273
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unique object IDs and crush on object striping

2017-01-31 Thread Ukko
Hi,

Two quickies:

1) How does Ceph handle unique object IDs without any
central information about the object names?

2) How CRUSH is used in case of splitting an object in
stripes?

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Import Ceph RBD snapshot

2017-01-31 Thread pierrepalussiere
Hello,

I just wonder if there is a way to import a Ceph RBD snapshot that I have 
previously exported, but without recover the current image state ?

Thanks in advance.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Joao Eduardo Luis

On 01/31/2017 03:35 PM, David Turner wrote:

If you do have a large enough drive on all of your mons (and always
intend to do so) you can increase the mon store warning threshold in the
config file so that it no longer warns at 15360 MB.


And if you so decide to go that route, please be aware that the monitors 
are known to misbehave if their store grows too much.


Those warnings have been put in place to let the admin know that action 
may be needed, hopefully in time to avoid abhorrent behaviour.


  -Joao


From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido
den Hollander [w...@42on.com]
Sent: Tuesday, January 31, 2017 2:35 AM
To: Martin Palma; CEPH list
Subject: Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB

= 15360 MB -- 94% avail



Op 31 januari 2017 om 10:22 schreef Martin Palma :


Hi all,

our cluster is currently performing a big expansion and is in recovery
mode (we doubled in size and osd# from 600 TB to 1,2 TB).



Yes, that is to be expected. When not all PGs are active+clean the MONs
will not trim their datastore.


Now we get the following message from our monitor nodes:

mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

Reading [0] it says that it is normal in a state of active data
rebalance and after it is finished it will be compacted.

Should we wait until the recovery is finished or should we perform
"ceph tell mon.{id} compact" now during recovery?



Mainly wait and make sure there is enough disk space. You can try a
compact, but that can take the mon offline temp.

Just make sure you have enough diskspace :)

Wido


Best,
Martin

[0] https://access.redhat.com/solutions/1982273
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg,

the problem is in kraken,  when a pool is created with EC profile ,
min_size equals erasure size.

For 3+1 profile , following is the pool status ,
pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
stripe_width 4128

For 4+1 profile:
pool 5 'cdvr_ec' erasure size 5 min_size 5 crush_ruleset 1 object_hash
rjenkins pg_num 4096 pgp_num 4096

For 3+2 profile :
pool 3 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 412 flags hashpspool
stripe_width 4128

Where as on Jewel release for EC 4+1:
pool 30 'cdvr_ec' *erasure size 5 min_size 4* crush_ruleset 1 object_hash
rjenkins pg_num 4096 pgp_num 4096

Trying to modify min_size and verify the status.

Is there any reason behind this change in ceph kraken  or a bug.

Thanks,
Muthu




On 31 January 2017 at 18:17, Muthusamy Muthiah 
wrote:

> Hi Greg,
>
> Following are the test outcomes on EC profile ( n = k + m)
>
>
>
> 1.   Kraken filestore and bluetore with m=1 , recovery does not start
> .
>
> 2.   Jewel filestore and bluestore with m=1 , recovery happens .
>
> 3.   Kraken bluestore all default configuration and m=1, no recovery.
>
> 4.   Kraken bluestore with m=2 , recovery happens when one OSD is
> down and for 2 OSD fails.
>
>
>
> So, the issue seems to be on ceph-kraken release. Your views…
>
>
>
> Thanks,
>
> Muthu
>
>
>
> On 31 January 2017 at 14:18, Muthusamy Muthiah <
> muthiah.muthus...@gmail.com> wrote:
>
>> Hi Greg,
>>
>> Now we could see the same problem exists for kraken-filestore also.
>> Attached the requested osdmap and crushmap.
>>
>> OSD.1 was stopped in this following procedure and OSD map for a PG is
>> displayed.
>>
>> ceph osd dump | grep cdvr_ec
>> 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
>> dangerous and experimental features are enabled: bluestore,rocksdb
>> 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
>> dangerous and experimental features are enabled: bluestore,rocksdb
>> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
>> stripe_width 4128
>>
>> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap
>>
>>
>> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>> /tmp/osdmap
>> osdmaptool: osdmap file '/tmp/osdmap'
>>  object 'object1' -> 2.2bc -> [20,47,1,36]
>>
>> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>> osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
>> -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)
>>
>> [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service
>>
>> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1
>>
>>
>> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>> /tmp/osdmap1
>> osdmaptool: osdmap file '/tmp/osdmap1'
>>  object 'object1' -> 2.2bc -> [20,47,2147483647,36]
>>
>>
>> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>> osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
>> -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)
>>
>>
>> [root@ca-cn2 ~]# ceph osd tree
>> 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
>> dangerous and experimental features are enabled: bluestore,rocksdb
>> 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following
>> dangerous and experimental features are enabled: bluestore,rocksdb
>> ID WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 327.47314 root default
>> -2  65.49463 host ca-cn4
>>  3   5.45789 osd.3up  1.0  1.0
>>  5   5.45789 osd.5up  1.0  1.0
>> 10   5.45789 osd.10   up  1.0  1.0
>> 16   5.45789 osd.16   up  1.0  1.0
>> 21   5.45789 osd.21   up  1.0  1.0
>> 27   5.45789 osd.27   up  1.0  1.0
>> 30   5.45789 osd.30   up  1.0  1.0
>> 35   5.45789 osd.35   up  1.0  1.0
>> 42   5.45789 osd.42   up  1.0  1.0
>> 47   5.45789 osd.47   up  1.0  1.0
>> 51   5.45789 osd.51   up  1.0  1.0
>> 53   5.45789 osd.53   up  1.0  1.0
>> -3  65.49463 host ca-cn3
>>  2   5.45789 osd.2up  1.0  1.0
>>  6   5.45789 osd.6up  1.0  1.0
>> 11   5.45789 osd.11   up  1.0  1.0
>> 15   5.45789 osd.15   up  1.0  1.0
>> 20   5.45789 osd.20   up  1.0  1.0
>> 25   5.45789 osd.25   up  1.0  1.0
>> 29   5.45789 osd.29   up  1.0  1.0
>> 33   5.45789 osd.33   up  1.0  1.0
>> 38   5.45789 osd.38   up  1.00

Re: [ceph-users] Unique object IDs and crush on object striping

2017-01-31 Thread Brian Andrus
On Tue, Jan 31, 2017 at 7:42 AM, Ukko  wrote:

> Hi,
>
> Two quickies:
>
> 1) How does Ceph handle unique object IDs without any
> central information about the object names?
>

That's where CRUSH comes in. It maps an object name to a unique placement
group ID based on the available placement groups. Some of my favorite
explanations of data placement comes from the core Ceph developers. [1] [2]

2) How CRUSH is used in case of splitting an object in
> stripes?
>

The splitting/striping of data actually occurs at a layer above CRUSH. The
clients handle that and calculate object placement with CRUSH based on
unique object names.


> Thanks!
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
[1] https://youtu.be/05spXfLKKVU?t=9m14s
[2] https://youtu.be/lG6eeUNw9iI?t=18m49s

-- 
Brian Andrus
Cloud Systems Engineer
DreamHost, LLC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] No space left on device on directory with > 1000000 files

2017-01-31 Thread Jorge Garcia
I'm running into a problem on a really large directory of over a million 
files (don't ask, my users are clueless). Anyway, I'm trying to to use 
Ceph as backup storage for their filesystem. As I rsync the directory, 
it started giving me a "No space left on device" for this directory, 
even though the ceph filesystem is at 66%, and no individual OSD is 
fuller than 82%. If I go to the directory and try to do a "touch foo", 
it gives me the same "No space left on device", but if I go to the 
parent directory and try to copy a file there, it is fine. So I must be 
running into some per-directory limit. Any ideas of what I can do to fix 
this problem? This is Ceph 10.2.5.


Thanks!

Jorge

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Gregory Farnum
On Tue, Jan 31, 2017 at 9:06 AM, Muthusamy Muthiah
 wrote:
> Hi Greg,
>
> the problem is in kraken,  when a pool is created with EC profile , min_size
> equals erasure size.
>
> For 3+1 profile , following is the pool status ,
> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
> stripe_width 4128
>
> For 4+1 profile:
> pool 5 'cdvr_ec' erasure size 5 min_size 5 crush_ruleset 1 object_hash
> rjenkins pg_num 4096 pgp_num 4096
>
> For 3+2 profile :
> pool 3 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 412 flags hashpspool
> stripe_width 4128
>
> Where as on Jewel release for EC 4+1:
> pool 30 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 4096 pgp_num 4096
>
> Trying to modify min_size and verify the status.
>
> Is there any reason behind this change in ceph kraken  or a bug.

The change was made on purpose because running with k replicas on a
k+m pool is a bad idea. However, it definitely should have recovered
the missing shard and then gone active, which doesn't appear to have
happened in this case.

It looks like we just screwed up and don't let EC pools do recovery on
min size. You can restore the old behavior by setting min_size equal
to k and we'll be fixing this for the next release. (In general, k+1
pools are not a good idea, which is why we didn't catch this in
testing.)
-Greg

>
> Thanks,
> Muthu
>
>
>
>
> On 31 January 2017 at 18:17, Muthusamy Muthiah 
> wrote:
>>
>> Hi Greg,
>>
>> Following are the test outcomes on EC profile ( n = k + m)
>>
>>
>>
>> 1.   Kraken filestore and bluetore with m=1 , recovery does not start
>> .
>>
>> 2.   Jewel filestore and bluestore with m=1 , recovery happens .
>>
>> 3.   Kraken bluestore all default configuration and m=1, no recovery.
>>
>> 4.   Kraken bluestore with m=2 , recovery happens when one OSD is down
>> and for 2 OSD fails.
>>
>>
>>
>> So, the issue seems to be on ceph-kraken release. Your views…
>>
>>
>>
>> Thanks,
>>
>> Muthu
>>
>>
>>
>>
>> On 31 January 2017 at 14:18, Muthusamy Muthiah
>>  wrote:
>>>
>>> Hi Greg,
>>>
>>> Now we could see the same problem exists for kraken-filestore also.
>>> Attached the requested osdmap and crushmap.
>>>
>>> OSD.1 was stopped in this following procedure and OSD map for a PG is
>>> displayed.
>>>
>>> ceph osd dump | grep cdvr_ec
>>> 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
>>> dangerous and experimental features are enabled: bluestore,rocksdb
>>> 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
>>> dangerous and experimental features are enabled: bluestore,rocksdb
>>> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
>>> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
>>> stripe_width 4128
>>>
>>> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap
>>>
>>>
>>> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>>> /tmp/osdmap
>>> osdmaptool: osdmap file '/tmp/osdmap'
>>>  object 'object1' -> 2.2bc -> [20,47,1,36]
>>>
>>> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>>> osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
>>> -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)
>>>
>>> [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service
>>>
>>> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1
>>>
>>>
>>> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>>> /tmp/osdmap1
>>> osdmaptool: osdmap file '/tmp/osdmap1'
>>>  object 'object1' -> 2.2bc -> [20,47,2147483647,36]
>>>
>>>
>>> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>>> osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
>>> -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)
>>>
>>>
>>> [root@ca-cn2 ~]# ceph osd tree
>>> 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
>>> dangerous and experimental features are enabled: bluestore,rocksdb
>>> 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following
>>> dangerous and experimental features are enabled: bluestore,rocksdb
>>> ID WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> -1 327.47314 root default
>>> -2  65.49463 host ca-cn4
>>>  3   5.45789 osd.3up  1.0  1.0
>>>  5   5.45789 osd.5up  1.0  1.0
>>> 10   5.45789 osd.10   up  1.0  1.0
>>> 16   5.45789 osd.16   up  1.0  1.0
>>> 21   5.45789 osd.21   up  1.0  1.0
>>> 27   5.45789 osd.27   up  1.0  1.0
>>> 30   5.45789 osd.30   up  1.0  1.0
>>> 35   5.45789 osd.35   up  1.0  1.0
>>> 42   5.45789 osd.42   up  1.0  1.0
>>> 47   5.45789 osd.47   up  1.0  1.0
>>> 51   5.45789 osd.51

Re: [ceph-users] No space left on device on directory with > 1000000 files

2017-01-31 Thread John Spray
On Tue, Jan 31, 2017 at 6:29 PM, Jorge Garcia  wrote:
> I'm running into a problem on a really large directory of over a million
> files (don't ask, my users are clueless). Anyway, I'm trying to to use Ceph
> as backup storage for their filesystem. As I rsync the directory, it started
> giving me a "No space left on device" for this directory, even though the
> ceph filesystem is at 66%, and no individual OSD is fuller than 82%. If I go
> to the directory and try to do a "touch foo", it gives me the same "No space
> left on device", but if I go to the parent directory and try to copy a file
> there, it is fine. So I must be running into some per-directory limit. Any
> ideas of what I can do to fix this problem? This is Ceph 10.2.5.

Your choices are:
 A) Lift the limit on individual dirfrags (mds_bal_fragment_size_max),
this may help if you only need a little more slack, but it is there
for a reason and if you set it way higher you will risk hitting
painful issues with oversized writes and reads to OSDs.  It would be
wise to do your own experiments on a separate filesystem to see how
far you can push it.
 B) Enable directory fragmentation, while we aren't switching it on by
default until Luminous it has historically not been very buggy.
 C) Put the monster directory into e.g. a local filesystem on an RBD
volume for now, and move it back into CephFS when directory
fragmentation is officially non-experimental.

John


>
> Thanks!
>
> Jorge
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Shinobu Kinjo
On Wed, Feb 1, 2017 at 1:51 AM, Joao Eduardo Luis  wrote:
> On 01/31/2017 03:35 PM, David Turner wrote:
>>
>> If you do have a large enough drive on all of your mons (and always
>> intend to do so) you can increase the mon store warning threshold in the
>> config file so that it no longer warns at 15360 MB.
>
>
> And if you so decide to go that route, please be aware that the monitors are
> known to misbehave if their store grows too much.

Would you please elaborate on what *misbehave* means? Do you have any
pointers to tell us more specifically?

>
> Those warnings have been put in place to let the admin know that action may
> be needed, hopefully in time to avoid abhorrent behaviour.
>
>   -Joao
>
>
>> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido
>> den Hollander [w...@42on.com]
>> Sent: Tuesday, January 31, 2017 2:35 AM
>> To: Martin Palma; CEPH list
>> Subject: Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB
>>>
>>> = 15360 MB -- 94% avail
>>
>>
>>> Op 31 januari 2017 om 10:22 schreef Martin Palma :
>>>
>>>
>>> Hi all,
>>>
>>> our cluster is currently performing a big expansion and is in recovery
>>> mode (we doubled in size and osd# from 600 TB to 1,2 TB).
>>>
>>
>> Yes, that is to be expected. When not all PGs are active+clean the MONs
>> will not trim their datastore.
>>
>>> Now we get the following message from our monitor nodes:
>>>
>>> mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail
>>>
>>> Reading [0] it says that it is normal in a state of active data
>>> rebalance and after it is finished it will be compacted.
>>>
>>> Should we wait until the recovery is finished or should we perform
>>> "ceph tell mon.{id} compact" now during recovery?
>>>
>>
>> Mainly wait and make sure there is enough disk space. You can try a
>> compact, but that can take the mon offline temp.
>>
>> Just make sure you have enough diskspace :)
>>
>> Wido
>>
>>> Best,
>>> Martin
>>>
>>> [0] https://access.redhat.com/solutions/1982273
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Shinobu Kinjo
On Wed, Feb 1, 2017 at 3:38 AM, Gregory Farnum  wrote:
> On Tue, Jan 31, 2017 at 9:06 AM, Muthusamy Muthiah
>  wrote:
>> Hi Greg,
>>
>> the problem is in kraken,  when a pool is created with EC profile , min_size
>> equals erasure size.
>>
>> For 3+1 profile , following is the pool status ,
>> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
>> stripe_width 4128
>>
>> For 4+1 profile:
>> pool 5 'cdvr_ec' erasure size 5 min_size 5 crush_ruleset 1 object_hash
>> rjenkins pg_num 4096 pgp_num 4096
>>
>> For 3+2 profile :
>> pool 3 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 last_change 412 flags hashpspool
>> stripe_width 4128
>>
>> Where as on Jewel release for EC 4+1:
>> pool 30 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
>> rjenkins pg_num 4096 pgp_num 4096
>>
>> Trying to modify min_size and verify the status.
>>
>> Is there any reason behind this change in ceph kraken  or a bug.
>
> The change was made on purpose because running with k replicas on a
> k+m pool is a bad idea. However, it definitely should have recovered
> the missing shard and then gone active, which doesn't appear to have

Yeah, that might be true.

> happened in this case.
>
> It looks like we just screwed up and don't let EC pools do recovery on
> min size. You can restore the old behavior by setting min_size equal
> to k and we'll be fixing this for the next release. (In general, k+1

If it's true, we should stop users to set up this ratio of data /
coding chunks. Or there should be any reasonable warning.

Should it be feature request?

> pools are not a good idea, which is why we didn't catch this in
> testing.)
> -Greg
>
>>
>> Thanks,
>> Muthu
>>
>>
>>
>>
>> On 31 January 2017 at 18:17, Muthusamy Muthiah 
>> wrote:
>>>
>>> Hi Greg,
>>>
>>> Following are the test outcomes on EC profile ( n = k + m)
>>>
>>>
>>>
>>> 1.   Kraken filestore and bluetore with m=1 , recovery does not start
>>> .
>>>
>>> 2.   Jewel filestore and bluestore with m=1 , recovery happens .
>>>
>>> 3.   Kraken bluestore all default configuration and m=1, no recovery.
>>>
>>> 4.   Kraken bluestore with m=2 , recovery happens when one OSD is down
>>> and for 2 OSD fails.
>>>
>>>
>>>
>>> So, the issue seems to be on ceph-kraken release. Your views…
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Muthu
>>>
>>>
>>>
>>>
>>> On 31 January 2017 at 14:18, Muthusamy Muthiah
>>>  wrote:

 Hi Greg,

 Now we could see the same problem exists for kraken-filestore also.
 Attached the requested osdmap and crushmap.

 OSD.1 was stopped in this following procedure and OSD map for a PG is
 displayed.

 ceph osd dump | grep cdvr_ec
 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
 dangerous and experimental features are enabled: bluestore,rocksdb
 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
 dangerous and experimental features are enabled: bluestore,rocksdb
 pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
 rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
 stripe_width 4128

 [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap


 [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
 /tmp/osdmap
 osdmaptool: osdmap file '/tmp/osdmap'
  object 'object1' -> 2.2bc -> [20,47,1,36]

 [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
 osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
 -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)

 [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service

 [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1


 [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
 /tmp/osdmap1
 osdmaptool: osdmap file '/tmp/osdmap1'
  object 'object1' -> 2.2bc -> [20,47,2147483647,36]


 [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
 osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
 -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)


 [root@ca-cn2 ~]# ceph osd tree
 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
 dangerous and experimental features are enabled: bluestore,rocksdb
 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following
 dangerous and experimental features are enabled: bluestore,rocksdb
 ID WEIGHTTYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -1 327.47314 root default
 -2  65.49463 host ca-cn4
  3   5.45789 osd.3up  1.0  1.0
  5   5.45789 osd.5up  1.0  1.0
 10   5.45789 osd.10   up  1.0  1.0
 16   5.45789 osd.16   up  1.0  1.0
 21   5.45789 osd.21   up

Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

2017-01-31 Thread Joao Eduardo Luis

On 01/31/2017 07:12 PM, Shinobu Kinjo wrote:

On Wed, Feb 1, 2017 at 1:51 AM, Joao Eduardo Luis  wrote:

On 01/31/2017 03:35 PM, David Turner wrote:


If you do have a large enough drive on all of your mons (and always
intend to do so) you can increase the mon store warning threshold in the
config file so that it no longer warns at 15360 MB.



And if you so decide to go that route, please be aware that the monitors are
known to misbehave if their store grows too much.


Would you please elaborate on what *misbehave* means? Do you have any
pointers to tell us more specifically?


In particular, when using leveldb, stalls while reading or writing to 
the store - typically, leveldb is compacting when this happens. This 
leads to all sorts of timeouts to be triggered, but the really annoying 
one would be the lease timeout, which tends to result in flapping quorum.


Also, being unable to sync monitors. Again, stalls on leveldb lead to 
timeouts being triggered and the sync to restart.


Once upon a time, this *may* have also translated into large memory 
consumption. A direct relation was never proved though, and behaviour 
went away as ceph became smarter, and libs were updated by distros.


  -Joao





Those warnings have been put in place to let the admin know that action may
be needed, hopefully in time to avoid abhorrent behaviour.

  -Joao



From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Wido
den Hollander [w...@42on.com]
Sent: Tuesday, January 31, 2017 2:35 AM
To: Martin Palma; CEPH list
Subject: Re: [ceph-users] mon.mon01 store is getting too big! 18119 MB


= 15360 MB -- 94% avail




Op 31 januari 2017 om 10:22 schreef Martin Palma :


Hi all,

our cluster is currently performing a big expansion and is in recovery
mode (we doubled in size and osd# from 600 TB to 1,2 TB).



Yes, that is to be expected. When not all PGs are active+clean the MONs
will not trim their datastore.


Now we get the following message from our monitor nodes:

mon.mon01 store is getting too big! 18119 MB >= 15360 MB -- 94% avail

Reading [0] it says that it is normal in a state of active data
rebalance and after it is finished it will be compacted.

Should we wait until the recovery is finished or should we perform
"ceph tell mon.{id} compact" now during recovery?



Mainly wait and make sure there is enough disk space. You can try a
compact, but that can take the mon offline temp.

Just make sure you have enough diskspace :)

Wido


Best,
Martin

[0] https://access.redhat.com/solutions/1982273
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

2017-01-31 Thread Muthusamy Muthiah
Hi Greg,

Thanks for the info and hope this will be solved in the upcoming minor
updates of kraken.
Regarding k+1 , I will take your feedback to our architect team and to
increase this to k+2 and revert back the pool to normal state.

Thanks,
Muthu

On 1 February 2017 at 02:01, Shinobu Kinjo  wrote:

> On Wed, Feb 1, 2017 at 3:38 AM, Gregory Farnum  wrote:
> > On Tue, Jan 31, 2017 at 9:06 AM, Muthusamy Muthiah
> >  wrote:
> >> Hi Greg,
> >>
> >> the problem is in kraken,  when a pool is created with EC profile ,
> min_size
> >> equals erasure size.
> >>
> >> For 3+1 profile , following is the pool status ,
> >> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
> >> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
> >> stripe_width 4128
> >>
> >> For 4+1 profile:
> >> pool 5 'cdvr_ec' erasure size 5 min_size 5 crush_ruleset 1 object_hash
> >> rjenkins pg_num 4096 pgp_num 4096
> >>
> >> For 3+2 profile :
> >> pool 3 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> >> rjenkins pg_num 1024 pgp_num 1024 last_change 412 flags hashpspool
> >> stripe_width 4128
> >>
> >> Where as on Jewel release for EC 4+1:
> >> pool 30 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash
> >> rjenkins pg_num 4096 pgp_num 4096
> >>
> >> Trying to modify min_size and verify the status.
> >>
> >> Is there any reason behind this change in ceph kraken  or a bug.
> >
> > The change was made on purpose because running with k replicas on a
> > k+m pool is a bad idea. However, it definitely should have recovered
> > the missing shard and then gone active, which doesn't appear to have
>
> Yeah, that might be true.
>
> > happened in this case.
> >
> > It looks like we just screwed up and don't let EC pools do recovery on
> > min size. You can restore the old behavior by setting min_size equal
> > to k and we'll be fixing this for the next release. (In general, k+1
>
> If it's true, we should stop users to set up this ratio of data /
> coding chunks. Or there should be any reasonable warning.
>
> Should it be feature request?
>
> > pools are not a good idea, which is why we didn't catch this in
> > testing.)
> > -Greg
> >
> >>
> >> Thanks,
> >> Muthu
> >>
> >>
> >>
> >>
> >> On 31 January 2017 at 18:17, Muthusamy Muthiah <
> muthiah.muthus...@gmail.com>
> >> wrote:
> >>>
> >>> Hi Greg,
> >>>
> >>> Following are the test outcomes on EC profile ( n = k + m)
> >>>
> >>>
> >>>
> >>> 1.   Kraken filestore and bluetore with m=1 , recovery does not
> start
> >>> .
> >>>
> >>> 2.   Jewel filestore and bluestore with m=1 , recovery happens .
> >>>
> >>> 3.   Kraken bluestore all default configuration and m=1, no
> recovery.
> >>>
> >>> 4.   Kraken bluestore with m=2 , recovery happens when one OSD is
> down
> >>> and for 2 OSD fails.
> >>>
> >>>
> >>>
> >>> So, the issue seems to be on ceph-kraken release. Your views…
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Muthu
> >>>
> >>>
> >>>
> >>>
> >>> On 31 January 2017 at 14:18, Muthusamy Muthiah
> >>>  wrote:
> 
>  Hi Greg,
> 
>  Now we could see the same problem exists for kraken-filestore also.
>  Attached the requested osdmap and crushmap.
> 
>  OSD.1 was stopped in this following procedure and OSD map for a PG is
>  displayed.
> 
>  ceph osd dump | grep cdvr_ec
>  2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
>  dangerous and experimental features are enabled: bluestore,rocksdb
>  2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
>  dangerous and experimental features are enabled: bluestore,rocksdb
>  pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
>  rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
>  stripe_width 4128
> 
>  [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap
> 
> 
>  [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>  /tmp/osdmap
>  osdmaptool: osdmap file '/tmp/osdmap'
>   object 'object1' -> 2.2bc -> [20,47,1,36]
> 
>  [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>  osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc
> (2.2bc)
>  -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)
> 
>  [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service
> 
>  [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1
> 
> 
>  [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
>  /tmp/osdmap1
>  osdmaptool: osdmap file '/tmp/osdmap1'
>   object 'object1' -> 2.2bc -> [20,47,2147483647,36]
> 
> 
>  [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
>  osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc
> (2.2bc)
>  -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)
> 
> 
>  [root@ca-cn2 ~]# ceph osd tree
>  2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
>  dangerous and experimental featur

Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Maxime Guyot
Hi José

If you have some of the original OSDs (not zapped or erased) then you might be 
able to just re-add them to your cluster and have a happy cluster.
If you attempt the ceph_objectstore_tool –op export & import make sure to do it 
on a temporary OSD of weight 0 as recommended in the link provided.

Either way and from what I can see inthe pg dump you provided, if you restore 
osd.0, osd.3, osd.20, osd.21 and osd.22 it should be enough to bring back the 
pg that are down.

Cheers,
 
On 31/01/17 11:48, "ceph-users on behalf of José M. Martín" 
 wrote:

Any idea of how could I recover files from the filesystem mount?
Doing a cp, it hungs when find a damaged file/folder. I would be happy
getting no damaged files

Thanks

El 31/01/17 a las 11:19, José M. Martín escribió:
> Thanks.
> I just realized I keep some of the original OSD. If it contains some of
> the incomplete PGs , would be possible to add then into the new disks?
> Maybe following this steps? 
http://ceph.com/community/incomplete-pgs-oh-my/
>
> El 31/01/17 a las 10:44, Maxime Guyot escribió:
>> Hi José,
>>
>> Too late, but you could have updated the CRUSHmap *before* moving the 
disks. Something like: “ceph osd crush set osd.0 0.90329 root=default 
rack=sala2.2  host=loki05” would move the osd.0 to loki05 and would trigger the 
appropriate PG movements before any physical move. Then the physical move is 
done as usual: set noout, stop osd, physically move, active osd, unnset noout.
>>
>> It’s a way to trigger the data movement overnight (maybe with a cron) 
and do the physical move at your own convenience in the morning.
>>
>> Cheers, 
>> Maxime 
>>
>> On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" 
 wrote:
>>
>> Already min_size = 1
>> 
>> Thanks,
>> Jose M. Martín
>> 
>> El 31/01/17 a las 09:44, Henrik Korkuc escribió:
>> > I am not sure about "incomplete" part out of my head, but you can 
try
>> > setting min_size to 1 for pools toreactivate some PG, if they are
>> > down/inactive due to missing replicas.
>> >
>> > On 17-01-31 10:24, José M. Martín wrote:
>> >> # ceph -s
>> >>  cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>> >>   health HEALTH_ERR
>> >>  clock skew detected on mon.loki04
>> >>  155 pgs are stuck inactive for more than 300 seconds
>> >>  7 pgs backfill_toofull
>> >>  1028 pgs backfill_wait
>> >>  48 pgs backfilling
>> >>  892 pgs degraded
>> >>  20 pgs down
>> >>  153 pgs incomplete
>> >>  2 pgs peering
>> >>  155 pgs stuck inactive
>> >>  1077 pgs stuck unclean
>> >>  892 pgs undersized
>> >>  1471 requests are blocked > 32 sec
>> >>  recovery 3195781/36460868 objects degraded (8.765%)
>> >>  recovery 5079026/36460868 objects misplaced (13.930%)
>> >>  mds0: Behind on trimming (175/30)
>> >>  noscrub,nodeep-scrub flag(s) set
>> >>  Monitor clock skew detected
>> >>   monmap e5: 5 mons at
>> >> 
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>> >>
>> >>  election epoch 4028, quorum 0,1,2,3,4
>> >> loki01,loki02,loki03,loki04,loki05
>> >>fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>> >>   osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>> >>  flags noscrub,nodeep-scrub
>> >>pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 
kobjects
>> >>  45892 GB used, 34024 GB / 79916 GB avail
>> >>  3195781/36460868 objects degraded (8.765%)
>> >>  5079026/36460868 objects misplaced (13.930%)
>> >>  3640 active+clean
>> >>   838 
active+undersized+degraded+remapped+wait_backfill
>> >>   184 active+remapped+wait_backfill
>> >>   134 incomplete
>> >>48 
active+undersized+degraded+remapped+backfilling
>> >>19 down+incomplete
>> >> 6
>> >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>> >> 1 active+remapped+backfill_toofull
>> >> 1 peering
>> >> 1 down+peering
>> >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>> >>
>> >>
>> >>
>> >> #