Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

I send some related bugs:
(osd.21 not be able started)

 -8705> 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 102840/102840/102477) 
[40,0,21]/[40,0,60] r=-1 lpr=102840 pi=31832-102839/230 luod=0'0 
crt=102843'11832157 lcod 102843'11832158 active+remapped] *exit 
Started/ReplicaActive/RepNotRecovering* 0.000170 1 0.000296


 -1637> 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100114-102839/50 luod=0'0 crt=102839'91984 
active] *enter Started/ReplicaActive/RepNotRecovering*


  -437> 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 102840/102840/102656) 
[90,21,120] r=1 lpr=102840 pi=100252-102839/53 luod=0'0 crt=102808'38419 
active] *enter **Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49] (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
"active+recovering") . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 seconds 
old, received at 2014-10-25 10:57:17.580013: MOSDPGPush(*7.9d8 *102803 
[Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, version: 
102798'7794851, data_included: [0~4194304], data_size: 4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 seconds 
old, received at 2014-10-25 10:57:18.140156: MOSDPGPush(*23.596* 
102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 seconds 
old, received at 2014-10-25 10:57:17.555048: MOSDPGPush(*23.9c6* 
102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 4194304, 
omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete:
true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_reco

vered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:58.418847 osd.21 [WRN] 26 slow requests, 1 included 
below; oldest blocked for > 54.967456 secs
2014-10-25 10:57:58.418859 osd.21 [WRN] slow request 30.967294 seconds 
old, received at 2014-10-25 10:57:27.451488: MOSDPGPush(*23.63c* 
102803 [Pus
hOp(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24, 
version: 102748'145637, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(40e4b63c/rbd_data.57ed612ae8944a.0c00/head//24@1
02748'145637, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:

Re: [ceph-users] recovery process stops

2014-10-25 Thread Harald Rößler
Anyone an idea to solver the situation?
Thanks for any advise.

Kind Regards
Harald Rößler


> Am 23.10.2014 um 18:56 schrieb Harald Rößler :
>
> @Wido: sorry I don’t understand what you mean 100%, generated some output 
> which may helps.
>
>
> Ok the pool:
>
> pool 3 'bcf' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins 
> pg_num 832 pgp_num 832 last_change 8000 owner 0
>
>
> all remapping pg have an temp entry:
>
> pg_temp 3.1 [14,20,0]
> pg_temp 3.c [1,7,23]
> pg_temp 3.22 [15,21,23]
>
>
>
> 3.22429 0   2   0   1654296576  0   0   
> active+remapped 2014-10-23 03:25:03.180505  8608'363836897  
> 8608'377970131  [15,21] [15,21,23]  3578'354650024  2014-10-16 
> 04:06:39.133104  3578'354650024  2014-10-16 04:06:39.133104
>
> the crush rules.
>
> # rules
> rule data {
>ruleset 0
>type replicated
>min_size 1
>max_size 10
>step take default
>step chooseleaf firstn 0 type host
>step emit
> }
> rule metadata {
>ruleset 1
>type replicated
>min_size 1
>max_size 10
>step take default
>step chooseleaf firstn 0 type host
>step emit
> }
> rule rbd {
>ruleset 2
>type replicated
>min_size 1
>max_size 10
>step take default
>step chooseleaf firstn 0 type host
>step emit
> }
>
>
> ceph pg 3.22 query
>
>
>
>
> { "state": "active+remapped",
>  "epoch": 8608,
>  "up": [
>15,
>21],
>  "acting": [
>15,
>21,
>23],
>  "info": { "pgid": "3.22",
>  "last_update": "8608'363845313",
>  "last_complete": "8608'363845313",
>  "log_tail": "8608'363842312",
>  "last_backfill": "MAX",
>  "purged_snaps": "[1~1,3~3,8~6,f~31,42~1,44~3,48~f,58~1,5a~2]",
>  "history": { "epoch_created": 140,
>  "last_epoch_started": 8576,
>  "last_epoch_clean": 8576,
>  "last_epoch_split": 0,
>  "same_up_since": 8340,
>  "same_interval_since": 8575,
>  "same_primary_since": 7446,
>  "last_scrub": "3578'354650024",
>  "last_scrub_stamp": "2014-10-16 04:06:39.133104",
>  "last_deep_scrub": "3578'354650024",
>  "last_deep_scrub_stamp": "2014-10-16 04:06:39.133104",
>  "last_clean_scrub_stamp": "2014-10-16 04:06:39.133104"},
>  "stats": { "version": "8608'363845313",
>  "reported": "8608'377978685",
>  "state": "active+remapped",
>  "last_fresh": "2014-10-23 18:55:07.582844",
>  "last_change": "2014-10-23 03:25:03.180505",
>  "last_active": "2014-10-23 18:55:07.582844",
>  "last_clean": "2014-10-20 07:51:21.330669",
>  "last_became_active": "2013-07-14 07:20:30.173508",
>  "last_unstale": "2014-10-23 18:55:07.582844",
>  "mapping_epoch": 8370,
>  "log_start": "8608'363842312",
>  "ondisk_log_start": "8608'363842312",
>  "created": 140,
>  "last_epoch_clean": 8576,
>  "parent": "0.0",
>  "parent_split_bits": 0,
>  "last_scrub": "3578'354650024",
>  "last_scrub_stamp": "2014-10-16 04:06:39.133104",
>  "last_deep_scrub": "3578'354650024",
>  "last_deep_scrub_stamp": "2014-10-16 04:06:39.133104",
>  "last_clean_scrub_stamp": "2014-10-16 04:06:39.133104",
>  "log_size": 0,
>  "ondisk_log_size": 0,
>  "stats_invalid": "0",
>  "stat_sum": { "num_bytes": 1654296576,
>  "num_objects": 429,
>  "num_object_clones": 28,
>  "num_object_copies": 0,
>  "num_objects_missing_on_primary": 0,
>  "num_objects_degraded": 0,
>  "num_objects_unfound": 0,
>  "num_read": 8053865,
>  "num_read_kb": 124022900,
>  "num_write": 363844886,
>  "num_write_kb": 2083536824,
>  "num_scrub_errors": 0,
>  "num_shallow_scrub_errors": 0,
>  "num_deep_scrub_errors": 0,
>  "num_objects_recovered": 2777,
>  "num_bytes_recovered": 11138282496,
>  "num_keys_recovered": 0},
>  "stat_cat_sum": {},
>  "up": [
>15,
>21],
>  "acting": [
>15,
>21,
>23]},
>  "empty": 0,
>  "dne": 0,
>  "incomplete": 0,
>  "last_epoch_started": 8576},
>  "recovery_state": [
>{ "name": "Started\/Primary\/Active",
>  "enter_time": "2014-10-23 03:25:03.179759",
>  "might_have_unfound": [],
>  "recovery_progress": { "backfill_target": -1,
>  "waiting_on_backfill": 0,
>  "backfill_pos": "0\/\/0\/\/-1",
>  "backfill_info": { "begin": "0\/\/0\/\/-1",
>  "end": "0\/\/0\/\/-1",
>  "objects": []},
>  "peer_backfill_info": { "begin": "0\/\/0\/\/-1",
>  

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan
My Ceph was hung, and"osd.21 172.30.5.2:6870/8047 879 : [ERR] 6.9d8 
has 4 objects unfound and apparently lost".


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440> 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- seq: 
3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], clone_subset: 
{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705> 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 (102377'11822991,102843'11832159] 
lb c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637> 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437> 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49]  (When 
start osd.21 then pg 7.9d8 and three remain pgs  to changed to state 
"active+recovering") . osd.21 still down after following logs:



2014-10-25 10:57:48.415920 osd.21 [WRN] slow request 30.835731 
seconds old, received at 2014-10-25 10:57:17.580013: 
MOSDPGPush(*7.9d8 *102803 [Push
Op(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6, 
version: 102798'7794851, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(e13589d8/rbd_data.4b843b2ae8944a.0c00/head//6@102
798'7794851, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complete
:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_rec

overed_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415927 osd.21 [WRN] slow request 30.275588 
seconds old, received at 2014-10-25 10:57:18.140156: 
MOSDPGPush(*23.596* 102803 [Pus
hOp(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24, 
version: 102798'295732, data_included: [0~4194304], data_size: 
4194304, omap_head
er_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(4ca76d96/rbd_data.5dd32f2ae8944a.0385/head//24@1
02798'295732, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:4194304, data_complet
e:true, omap_recovered_to:, omap_complete:true), before_progress: 
ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_re

covered_to:, omap_complete:false))]) v2 currently no flag points reached

2014-10-25 10:57:48.415910 osd.21 [WRN] slow request 30.860696 
seconds old, received at 2014-10-25 10:57:17.555048: 
MOSDPGPush(*23.9c6* 102803 [Pus
hOp(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24, 
version: 102798'66056, data_included: [0~4194304], data_size: 
4194304, omap_heade
r_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: 
ObjectRecoveryInfo(efdde9c6/rbd_data.5b64062ae8944a.0b15/head//24@10
2798'66056, copy_subset: [0~4194304], clone_subset: {}), 
after_progress: ObjectRecoveryProgress(

Re: [ceph-users] Can't start osd- one osd alway be down.

2014-10-25 Thread Ta Ba Tuan

#ceph pg *6.9d8* query
...
  "peer_info": [
{ "peer": "49",
  "pgid": "6.9d8",
  "last_update": "102889'7801917",
  "last_complete": "102889'7801917",
  "log_tail": "102377'7792649",
  "last_user_version": 7801879,
  "last_backfill": "MAX",
  "purged_snaps": 
"[1~7,9~44b,455~1f8,64f~63,6b3~3a,6ee~12f,81f~10,830~8,839~69b,ed7~7,edf~4,ee4~6f5,15da~f9,16d4~1f,16f5~7,16fd~4,1705~5

e,1764~7,1771~78,17eb~12,1800~2,1803~d,1812~3,181a~1,181c~a,1827~3b,1863~1,1865~1,1867~1,186b~e,187a~3,1881~1,1884~7,188c~1,188f~3,1894~5,189f~2,
18ab~1,18c6~1,1922~13,193d~1,1940~1,194a~1,1968~5,1975~1,1979~4,197e~4,1984~1,1987~11,199c~1,19a0~1,19a3~9,19ad~3,19b2~1,19b6~27,19de~8]",
  "history": { "epoch_created": 164,
  "last_epoch_started": 102888,
  "last_epoch_clean": 102888,
  "last_epoch_split": 0
  "parent_split_bits": 0,
  "last_scrub": "91654'7460936",
  "last_scrub_stamp": "2014-10-10 10:36:25.433016",
  "last_deep_scrub": "81667'5815892",
  "last_deep_scrub_stamp": "2014-08-29 09:44:14.012219",
  "last_clean_scrub_stamp": "2014-10-10 10:36:25.433016",
  "log_size": 9229,
  "ondisk_log_size": 9229,
  "stats_invalid": "1",
  "stat_sum": { "num_bytes": 17870536192,
  "num_objects": 4327,
  "num_object_clones": 29,
  "num_object_copies": 12981,*
**  "num_objects_missing_on_primary": 4,*
  "num_objects_degraded": 4,
  "num_objects_unfound": 0,
  "num_objects_dirty": 1092,
  "num_whiteouts": 0,
  "num_read": 4820626,
  "num_read_kb": 59073045,
  "num_write": 12748709,
  "num_write_kb": 181630845,
  "num_scrub_errors": 0,
  "num_shallow_scrub_errors": 0,
  "num_deep_scrub_errors": 0,
  "num_objects_recovered": 135847,
  "num_bytes_recovered": 562255538176,
  "num_keys_recovered": 0,
  "num_objects_omap": 0,
  "num_objects_hit_set_archive": 0},


On 10/25/2014 07:40 PM, Ta Ba Tuan wrote:
My Ceph was hung, and"osd.21 172.30.5.2:6870/8047 879 : [ERR] 
6.9d8 has 4 objects unfound and apparently lost".


After I restart all ceph-data nodes,  I can't start osd.21, have many 
logs about pg 6.9d8 as:


 -440> 2014-10-25 19:28:17.468161 7fec5731d700  5 -- op tracker -- 
seq: 3083, time: 2014-10-25 19:28:17.468161, event: reached_pg, op: 
MOSDPGPus
h(*6.9d8* 102856 
[PushOp(e8de59d8/*rbd_data.4d091f7304c844.e871/head//6*, 
version: 102853'7800592, data_included: [0~4194304], data_size:
 4194304, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, 
recovery_info: 
ObjectRecoveryInfo(e8de59d8/rbd_data.4d091f7304c844.
e871/head//6@102853'7800592, copy_subset: [0~4194304], 
clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, 
data_recovered_to:41
94304, data_complete:true, omap_recovered_to:, omap_complete:true), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_comp

lete:false, omap_recovered_to:, omap_complete:false))])

I think having some error objects. What'm I must do?,please!
Thanks!
--
Tuan
HaNoi-VietNam


On 10/25/2014 03:01 PM, Ta Ba Tuan wrote:

I send some related bugs:
(osd.21 not be able started)

 -8705> 2014-10-25 14:41:04.345727 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*6.5e1*( v 102843'11832159 
(102377'11822991,102843'11832159] lb 
c4951de1/rbd_data.3955c5cdbb2ea.000405f0/head//6 
local-les=101780 n=4719 ec=164 les/c 102841/102838 
102840/102840/102477) [40,0,21]/[40,0,60] r=-1 lpr=102840 
pi=31832-102839/230 luod=0'0 crt=102843'11832157 lcod 102843'11832158 
active+remapped] *exit Started/ReplicaActive/RepNotRecovering* 
0.000170 1 0.000296


 -1637> 2014-10-25 14:41:14.326580 7f12bac2f700  5 *osd.21* pg_epoch: 
102843 pg[*2.23b*( v 102839'91984 (91680'88526,102839'91984] 
local-les=102841 n=85 ec=25000 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100114-102839/50 
luod=0'0 crt=102839'91984 active] *enter 
Started/ReplicaActive/RepNotRecovering*


  -437> 2014-10-25 14:41:15.042174 7f12ba42e700  5 *osd.21 *pg_epoch: 
102843 pg[*27.239(* v 102808'38419 (81621'35409,102808'38419] 
local-les=102841 n=23 ec=25085 les/c 102841/102838 
102840/102840/102656) [90,21,120] r=1 lpr=102840 pi=100252-102839/53 
luod=0'0 crt=102808'38419 active] *enter 
**Started/ReplicaActive/RepNotRecovering*


Thanks!


On 10/25/2014 11:26 AM, Ta Ba Tuan wrote:

Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 
23.596, 23.9c6, 23.63 can't recovery as pasted log.


Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d

Re: [ceph-users] journals relabeled by OS, symlinks broken

2014-10-25 Thread Scott Laird
You'd be best off using /dev/disk/by-path/ or similar links; that way they
follow the disks if they're renamed again.

On Fri, Oct 24, 2014, 9:40 PM Steve Anthony  wrote:

> Hello,
>
> I was having problems with a node in my cluster (Ceph v0.80.7/Debian
> Wheezy/Kernel 3.12), so I rebooted it and the disks were relabled when
> it came back up. Now all the symlinks to the journals are broken. The
> SSDs are now sda, sdb, and sdc but the journals were sdc, sdd, and sde:
>
> root@ceph17:~# ls -l /var/lib/ceph/osd/ceph-*/journal
> lrwxrwxrwx 1 root root 9 Oct 20 16:47 /var/lib/ceph/osd/ceph-150/journal
> -> /dev/sde1
> lrwxrwxrwx 1 root root 9 Oct 20 16:53 /var/lib/ceph/osd/ceph-157/journal
> -> /dev/sdd1
> lrwxrwxrwx 1 root root 9 Oct 21 08:31 /var/lib/ceph/osd/ceph-164/journal
> -> /dev/sdc1
> lrwxrwxrwx 1 root root 9 Oct 21 16:33 /var/lib/ceph/osd/ceph-171/journal
> -> /dev/sde2
> lrwxrwxrwx 1 root root 9 Oct 22 10:50 /var/lib/ceph/osd/ceph-178/journal
> -> /dev/sdc2
> lrwxrwxrwx 1 root root 9 Oct 22 15:48 /var/lib/ceph/osd/ceph-184/journal
> -> /dev/sdd2
> lrwxrwxrwx 1 root root 9 Oct 23 10:46 /var/lib/ceph/osd/ceph-191/journal
> -> /dev/sde3
> lrwxrwxrwx 1 root root 9 Oct 23 15:22 /var/lib/ceph/osd/ceph-195/journal
> -> /dev/sdc3
> lrwxrwxrwx 1 root root 9 Oct 23 16:59 /var/lib/ceph/osd/ceph-201/journal
> -> /dev/sdd3
> lrwxrwxrwx 1 root root 9 Oct 24 21:32 /var/lib/ceph/osd/ceph-214/journal
> -> /dev/sde4
> lrwxrwxrwx 1 root root 9 Oct 24 21:33 /var/lib/ceph/osd/ceph-215/journal
> -> /dev/sdd4
>
> Any way to fix this without just removing all the OSDs and re-adding
> them? I thought about recreating the symlinks to point at the new SSD
> labels, but I figured I'd check here first. Thanks!
>
> -Steve
>
> --
> Steve Anthony
> LTS HPC Support Specialist
> Lehigh University
> sma...@lehigh.edu
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-25 Thread Andrey Korolyov
Thanks Haomai. Turns out that the master` recovery is too buggy right
now (recovery speed degrades over a time, OSD (non-kv) is going out of
cluster with no reason, misplaced object calculation is wrong and so
on), so I am sticking to giant with rocksdb now. So far no major
problems are revealed.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD getting unmapped every time when server reboot

2014-10-25 Thread Vickey Singh
Hello Cephers , need your advice and tips here.

*Problem statement : Ceph RBD getting unmapped each time i reboot my server
. After reboot every time i need to manually map it and mount it.*

*Setup : *

Ceph Firefly 0.80.1
CentOS 6.5  , Kernel : 3.15.0-1


I have tried doing as mentioned in the blog , but looks like this does not
works with CentOS

http://ceph.com/planet/mapunmap-rbd-device-on-bootshutdown/



# /etc/init.d/rbdmap start
/etc/init.d/rbdmap: line 26: log_daemon_msg: command not found
/etc/init.d/rbdmap: line 42: log_progress_msg: command not found
/etc/init.d/rbdmap: line 47: echo: write error: Invalid argument
/etc/init.d/rbdmap: line 52: log_end_msg: command not found
/etc/init.d/rbdmap: line 56: log_action_begin_msg: command not found
unable to read secretfile: No such file or directory
error reading secret file
failed to parse ceph_options
Thread::try_create(): pthread_create failed with error 13common/Thread.cc:
In function 'void Thread::create(size_t)' thread 7fb8ec4ed760 time
2014-10-26 00:01:10.180440
common/Thread.cc: 110: FAILED assert(ret == 0)
 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: (Thread::create(unsigned long)+0x8a) [0x6ba82a]
 2: (CephContext::CephContext(unsigned int)+0xba) [0x60ef7a]
 3: (common_preinit(CephInitParameters const&, code_environment_t,
int)+0x45) [0x6e8305]
 4: (global_pre_init(std::vector
>*, std::vector >&, unsigned int,
code_environment_t, int)+0xaf) [0x5ee21f]
 5: (global_init(std::vector >*,
std::vector >&, unsigned int,
code_environment_t, int)+0x2f) [0x5eed6f]
 6: (main()+0x7f) [0x5289af]
 7: (__libc_start_main()+0xfd) [0x3efa41ed1d]
 8: ceph-fuse() [0x5287c9]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
/etc/init.d/rbdmap: line 58: log_action_end_msg: command not found
#


# cat /etc/ceph/rbdmap
rbd/rbd-disk1 id=admin,secret=AQAinItT8Ip9AhAAS93FrXLrrnVp8/sQhjvTIg==
#


Many Thanks in Advance
Vicky
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD getting unmapped every time when server reboot

2014-10-25 Thread Christopher Armstrong
unable to read secretfile: No such file or directory

Looks like it's trying to mount, but your secretfile is gone.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Sat, Oct 25, 2014 at 2:07 PM, Vickey Singh 
wrote:

> Hello Cephers , need your advice and tips here.
>
> *Problem statement : Ceph RBD getting unmapped each time i reboot my
> server . After reboot every time i need to manually map it and mount it.*
>
> *Setup : *
>
> Ceph Firefly 0.80.1
> CentOS 6.5  , Kernel : 3.15.0-1
>
>
> I have tried doing as mentioned in the blog , but looks like this does not
> works with CentOS
>
> http://ceph.com/planet/mapunmap-rbd-device-on-bootshutdown/
>
>
>
> # /etc/init.d/rbdmap start
> /etc/init.d/rbdmap: line 26: log_daemon_msg: command not found
> /etc/init.d/rbdmap: line 42: log_progress_msg: command not found
> /etc/init.d/rbdmap: line 47: echo: write error: Invalid argument
> /etc/init.d/rbdmap: line 52: log_end_msg: command not found
> /etc/init.d/rbdmap: line 56: log_action_begin_msg: command not found
> unable to read secretfile: No such file or directory
> error reading secret file
> failed to parse ceph_options
> Thread::try_create(): pthread_create failed with error 13common/Thread.cc:
> In function 'void Thread::create(size_t)' thread 7fb8ec4ed760 time
> 2014-10-26 00:01:10.180440
> common/Thread.cc: 110: FAILED assert(ret == 0)
>  ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
>  1: (Thread::create(unsigned long)+0x8a) [0x6ba82a]
>  2: (CephContext::CephContext(unsigned int)+0xba) [0x60ef7a]
>  3: (common_preinit(CephInitParameters const&, code_environment_t,
> int)+0x45) [0x6e8305]
>  4: (global_pre_init(std::vector
> >*, std::vector >&, unsigned int,
> code_environment_t, int)+0xaf) [0x5ee21f]
>  5: (global_init(std::vector >*,
> std::vector >&, unsigned int,
> code_environment_t, int)+0x2f) [0x5eed6f]
>  6: (main()+0x7f) [0x5289af]
>  7: (__libc_start_main()+0xfd) [0x3efa41ed1d]
>  8: ceph-fuse() [0x5287c9]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> /etc/init.d/rbdmap: line 58: log_action_end_msg: command not found
> #
>
>
> # cat /etc/ceph/rbdmap
> rbd/rbd-disk1 id=admin,secret=AQAinItT8Ip9AhAAS93FrXLrrnVp8/sQhjvTIg==
> #
>
>
> Many Thanks in Advance
> Vicky
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Continuous OSD crash with kv backend (firefly)

2014-10-25 Thread Haomai Wang
On Sun, Oct 26, 2014 at 3:12 AM, Andrey Korolyov  wrote:
> Thanks Haomai. Turns out that the master` recovery is too buggy right
> now (recovery speed degrades over a time, OSD (non-kv) is going out of
> cluster with no reason, misplaced object calculation is wrong and so
> on), so I am sticking to giant with rocksdb now. So far no major
> problems are revealed.

Hmm, do you mean kvstore has problem on osd recovery? I'm eager to
know the operations about how to produce this situation. Could you
give more detail?



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com