Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Jos Collin
http://tracker.ceph.com/issues/19913

On 29/11/18 11:46 AM, Ashley Merrick wrote:
> Hey,
>
> After rebooting a server that hosts the MGR Dashboard I am now unable
> to get the dashboard module to run.
>
> Upon restarting the mgr service I see the following :
>
> ImportError: No module named ordered_dict
> Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14]
> ENGINE Serving on http://:::9283
> Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14]
> ENGINE Bus STARTED
>
>
> I have checked using pip install ordereddict and it states the module
> is already installed.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] install ceph-fuse on centos5

2018-11-29 Thread Zhenshi Zhou
Hi,

I have a Centos5 server with kernel version 2.6.18.
Does it support to mount cephfs with ceph-fuse?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded objects afte: ceph osd in $osd

2018-11-29 Thread Marco Gaiarin


I reply to myself.

> I've added a new node, added slowly 4 new OSD, but in the meantime an
> OSD (not the new, not the node to remove) died. My situation now is:
>  root@blackpanther:~# ceph osd df tree
>  ID WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  TYPE NAME   
>  -1 21.41985-  5586G 2511G  3074G 00 root default
>  -2  5.45996-  5586G 2371G  3214G 42.45 0.93 host capitanamerica 
>   0  1.81999  1.0  1862G  739G  1122G 39.70 0.87 osd.0   
>   1  1.81999  1.0  1862G  856G  1005G 46.00 1.00 osd.1   
>  10  0.90999  1.0   931G  381G   549G 40.95 0.89 osd.10  
>  11  0.90999  1.0   931G  394G   536G 42.35 0.92 osd.11  
>  -3  5.03996-  5586G 2615G  2970G 46.82 1.02 host vedovanera 
>   2  1.3  1.0  1862G  684G  1177G 36.78 0.80 osd.2   
>   3  1.81999  1.0  1862G 1081G   780G 58.08 1.27 osd.3   
>   4  0.90999  1.0   931G  412G   518G 44.34 0.97 osd.4   
>   5  0.90999  1.0   931G  436G   494G 46.86 1.02 osd.5   
>  -4  5.45996-   931G  583G   347G 00 host deadpool   
>   6  1.81999  1.0  1862G  898G   963G 48.26 1.05 osd.6   
>   7  1.81999  1.0  1862G  839G  1022G 45.07 0.98 osd.7   
>   8  0.909990  0 0  0 00 osd.8   
>   9  0.90999  1.0   931G  583G   347G 62.64 1.37 osd.9   
>  -5  5.45996-  5586G 2511G  3074G 44.96 0.98 host blackpanther   
>  12  1.81999  1.0  1862G  828G  1033G 44.51 0.97 osd.12  
>  13  1.81999  1.0  1862G  753G  1108G 40.47 0.88 osd.13  
>  14  0.90999  1.0   931G  382G   548G 41.11 0.90 osd.14  
>  15  0.90999  1.0   931G  546G   384G 58.66 1.28 osd.15  
> TOTAL 21413G 9819G 11594G 45.85  
>  MIN/MAX VAR: 0/1.37  STDDEV: 7.37
> 
> Perfectly healthy. But i've tried to, slowly, remove an OSD from
> 'vedovanera', and so i've tried with:
>   ceph osd crush reweight osd.2 
> as you can see, i'm arrived to weight 1.4 (from 1.81999), but if i go
> lower than that i catch:
[...]
> recovery 2/2556513 objects degraded (0.000%)

Seems that the trouble came from osd.8 that was out and down, but not
from the crushmap (still have weight 0.90999).

After removing osd 8 massive rebalance start. After that, now i can
lower weight of OSD for node vedovanera and i've no more degraded
object.

I think i'm starting to understand how concretely the crush algorithm
work. ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Lenz Grimmer
Hi Ashley,

On 11/29/18 7:16 AM, Ashley Merrick wrote:

> After rebooting a server that hosts the MGR Dashboard I am now unable to
> get the dashboard module to run.
> 
> Upon restarting the mgr service I see the following :
> 
> ImportError: No module named ordered_dict
> Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14] ENGINE
> Serving on http://:::9283
> Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14] ENGINE
> Bus STARTED
> 
> I have checked using pip install ordereddict and it states the module is
> already installed.

What version of Ceph is this? What OS?

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Ashley Merrick
Hey,

Sorry missed the basic info!!

Latest Mimic 13.2.2

Ubuntu 18.04

,Ashley

On Thu, 29 Nov 2018 at 5:26 PM, Lenz Grimmer  wrote:

> Hi Ashley,
>
> On 11/29/18 7:16 AM, Ashley Merrick wrote:
>
> > After rebooting a server that hosts the MGR Dashboard I am now unable to
> > get the dashboard module to run.
> >
> > Upon restarting the mgr service I see the following :
> >
> > ImportError: No module named ordered_dict
> > Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14] ENGINE
> > Serving on http://:::9283
> > Nov 29 07:13:14 ceph-m01 ceph-mgr[12486]: [29/Nov/2018:07:13:14] ENGINE
> > Bus STARTED
> >
> > I have checked using pip install ordereddict and it states the module is
> > already installed.
>
> What version of Ceph is this? What OS?
>
> Lenz
>
> --
> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
Hello,
I have a ceph installation in a proxmox cluster.
Due to a temporary hardware glitch now I get this error on osd startup

-6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033 crush map has
> features 1009089991638532096, adjusting msgr requires for osds
>-5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
> [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> Compaction error: Corruption: block checksum mismatch
> -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb: (Original Log Time
> 2018/11/26-18:02:34.143021)
> [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621] [default]
> compacted to: base level 1 max bytes base268435456 files[17$
>
> -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb: (Original Log Time
> 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros": 1543251754143044,
> "job": 3, "event": "compaction_finished", "compaction_time_micros":
>  1997048, "out$
>-2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
> [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> Waiting after background compaction error: Corruption: block checksum
> mismatch, Accumulated background err$
>-1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
> submit_transaction error: Corruption: block checksum mismatch code = 2
> Rocksdb transaction:
> Delete( Prefix = O key =
> 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
> Put( Prefix = S key = 'nid_max' Value size = 8)
> Put( Prefix = S key = 'blobid_max' Value size = 8)
> 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
> /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function 'void
> BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26
> 18:02:34.674193
> /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED assert(r ==
> 0)
>
> ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous
> (stable)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x55ec83876092]
> 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55]
> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d]
> 4: (()+0x7494) [0x7fa1d5027494]
> 5: (clone()+0x3f) [0x7fa1d4098acf]
>
>
I have tried to recover it using ceph-bluestore-tool fsck and repair DEEP
but it says it is ALL ok.
I see that rocksd ldb tool needs .db files to recover and not a partition
so I cannot use it.
I do not understand why I cannot start osd if ceph-bluestore-tools says me
I have lost no data.
Can you help me?
Thanks,
Mario
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Wido den Hollander


On 11/29/18 10:28 AM, Mario Giammarco wrote:
> Hello,
> I have a ceph installation in a proxmox cluster.
> Due to a temporary hardware glitch now I get this error on osd startup
> 
> -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033 crush map
> has features 1009089991638532096, adjusting msgr requires for osds 
>    -5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
> [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> Compaction error: Corruption: block checksum mismatch 
> -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb: (Original Log
> Time 2018/11/26-18:02:34.143021)
> [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621] [default]
> compacted to: base level 1 max bytes base268435456 files[17$ 
> 
> -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb: (Original Log
> Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros":
> 1543251754143044, "job": 3, "event": "compaction_finished",
> "compaction_time_micros": 1997048, "out$ 
>    -2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
> [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> Waiting after background compaction error: Corruption: block
> checksum mismatch, Accumulated background err$ 
>    -1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
> submit_transaction error: Corruption: block checksum mismatch code =
> 2 Rocksdb transaction: 
> Delete( Prefix = O key =
> 
> 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
>  
> Put( Prefix = S key = 'nid_max' Value size = 8) 
> Put( Prefix = S key = 'blobid_max' Value size = 8) 
> 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
> /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function 'void
> BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26
> 18:02:34.674193 
> /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED
> assert(r == 0) 
>   
> ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> luminous (stable) 
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x55ec83876092] 
> 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55] 
> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d] 
> 4: (()+0x7494) [0x7fa1d5027494] 
> 5: (clone()+0x3f) [0x7fa1d4098acf]
> 
> 
> I have tried to recover it using ceph-bluestore-tool fsck and repair
> DEEP but it says it is ALL ok.
> I see that rocksd ldb tool needs .db files to recover and not a
> partition so I cannot use it.
> I do not understand why I cannot start osd if ceph-bluestore-tools says
> me I have lost no data.
> Can you help me?

Why would you try to recover a individual OSD? If all your Placement
Groups are active(+clean) just wipe the OSD and re-deploy it.

What's the status of your PGs?

It says there is a checksum error (probably due to the hardware glitch)
so it refuses to start.

Don't try to outsmart Ceph, let backfill/recovery handle this. Trying to
manually fix this will only make things worse.

Wido

> Thanks,
> Mario
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
I have only that copy, it is a showroom system but someone put a production
vm on it.

Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander 
ha scritto:

>
>
> On 11/29/18 10:28 AM, Mario Giammarco wrote:
> > Hello,
> > I have a ceph installation in a proxmox cluster.
> > Due to a temporary hardware glitch now I get this error on osd startup
> >
> > -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033 crush map
> > has features 1009089991638532096, adjusting msgr requires for osds
> >-5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
> > [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> > Compaction error: Corruption: block checksum mismatch
> > -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb: (Original Log
> > Time 2018/11/26-18:02:34.143021)
> > [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621] [default]
> > compacted to: base level 1 max bytes base268435456 files[17$
> >
> > -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb: (Original Log
> > Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros":
> > 1543251754143044, "job": 3, "event": "compaction_finished",
> > "compaction_time_micros": 1997048, "out$
> >-2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
> > [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> > Waiting after background compaction error: Corruption: block
> > checksum mismatch, Accumulated background err$
> >-1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
> > submit_transaction error: Corruption: block checksum mismatch code =
> > 2 Rocksdb transaction:
> > Delete( Prefix = O key =
> >
>  
> 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
> > Put( Prefix = S key = 'nid_max' Value size = 8)
> > Put( Prefix = S key = 'blobid_max' Value size = 8)
> > 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
> > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function 'void
> > BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26
> > 18:02:34.674193
> > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED
> > assert(r == 0)
> >
> > ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> > luminous (stable)
> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x102) [0x55ec83876092]
> > 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55]
> > 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d]
> > 4: (()+0x7494) [0x7fa1d5027494]
> > 5: (clone()+0x3f) [0x7fa1d4098acf]
> >
> >
> > I have tried to recover it using ceph-bluestore-tool fsck and repair
> > DEEP but it says it is ALL ok.
> > I see that rocksd ldb tool needs .db files to recover and not a
> > partition so I cannot use it.
> > I do not understand why I cannot start osd if ceph-bluestore-tools says
> > me I have lost no data.
> > Can you help me?
>
> Why would you try to recover a individual OSD? If all your Placement
> Groups are active(+clean) just wipe the OSD and re-deploy it.
>
> What's the status of your PGs?
>
> It says there is a checksum error (probably due to the hardware glitch)
> so it refuses to start.
>
> Don't try to outsmart Ceph, let backfill/recovery handle this. Trying to
> manually fix this will only make things worse.
>
> Wido
>
> > Thanks,
> > Mario
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Wido den Hollander


On 11/29/18 10:45 AM, Mario Giammarco wrote:
> I have only that copy, it is a showroom system but someone put a
> production vm on it.
> 

I have a feeling this won't be easy to fix or actually fixable:

- Compaction error: Corruption: block checksum mismatch
- submit_transaction error: Corruption: block checksum mismatch

RocksDB got corrupted on that OSD and won't be able to start now.

I wouldn't know where to start with this OSD.

Wido

> Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander
> mailto:w...@42on.com>> ha scritto:
> 
> 
> 
> On 11/29/18 10:28 AM, Mario Giammarco wrote:
> > Hello,
> > I have a ceph installation in a proxmox cluster.
> > Due to a temporary hardware glitch now I get this error on osd startup
> >
> >     -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033
> crush map
> >     has features 1009089991638532096, adjusting msgr requires for
> osds 
> >    -5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
> >   
>  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> >     Compaction error: Corruption: block checksum mismatch 
> >     -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb:
> (Original Log
> >     Time 2018/11/26-18:02:34.143021)
> >     [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621]
> [default]
> >     compacted to: base level 1 max bytes base268435456 files[17$ 
> >     
> >     -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb:
> (Original Log
> >     Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros":
> >     1543251754143044, "job": 3, "event": "compaction_finished",
> >     "compaction_time_micros": 1997048, "out$ 
> >    -2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
> >   
>  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> >     Waiting after background compaction error: Corruption: block
> >     checksum mismatch, Accumulated background err$ 
> >    -1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
> >     submit_transaction error: Corruption: block checksum mismatch
> code =
> >     2 Rocksdb transaction: 
> >     Delete( Prefix = O key =
> >   
>  
> 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
>  
> >     Put( Prefix = S key = 'nid_max' Value size = 8) 
> >     Put( Prefix = S key = 'blobid_max' Value size = 8) 
> >     0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
> >     /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function
> 'void
> >     BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26
> >     18:02:34.674193 
> >     /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED
> >     assert(r == 0) 
> >       
> >     ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> >     luminous (stable) 
> >     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >     const*)+0x102) [0x55ec83876092] 
> >     2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55] 
> >     3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d] 
> >     4: (()+0x7494) [0x7fa1d5027494] 
> >     5: (clone()+0x3f) [0x7fa1d4098acf]
> >
> >
> > I have tried to recover it using ceph-bluestore-tool fsck and repair
> > DEEP but it says it is ALL ok.
> > I see that rocksd ldb tool needs .db files to recover and not a
> > partition so I cannot use it.
> > I do not understand why I cannot start osd if ceph-bluestore-tools
> says
> > me I have lost no data.
> > Can you help me?
> 
> Why would you try to recover a individual OSD? If all your Placement
> Groups are active(+clean) just wipe the OSD and re-deploy it.
> 
> What's the status of your PGs?
> 
> It says there is a checksum error (probably due to the hardware glitch)
> so it refuses to start.
> 
> Don't try to outsmart Ceph, let backfill/recovery handle this. Trying to
> manually fix this will only make things worse.
> 
> Wido
> 
> > Thanks,
> > Mario
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Lenz Grimmer
On 11/29/18 10:28 AM, Ashley Merrick wrote:

> Sorry missed the basic info!!
> 
> Latest Mimic 13.2.2
> 
> Ubuntu 18.04

Thanks. So it worked before the reboot and did not afterwards? What
changed? Did you perform an OS update?

Would it be possible for you to paste the entire mgr log file messages
that are printed after the manager restarted? Have you tried to
explicitly enable the dashboard by running "ceph mgr module enable
dashboard"?

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Ashley Merrick
Yeah had a few OS updates, but not related directly to CEPH.

The full error log after a reboot is :

2018-11-29 11:24:22.494 7faf046a1700  1 mgr[restful] server not running: no
certificate configured
2018-11-29 11:24:22.586 7faf05ee4700 -1 log_channel(cluster) log [ERR] :
Unhandled exception from module 'dashboard' while running on mgr.ceph-m01:
No module named ordered_dict
2018-11-29 11:24:22.586 7faf05ee4700 -1 dashboard.serve:
2018-11-29 11:24:22.586 7faf05ee4700 -1 Traceback (most recent call last):
  File "/usr/lib/ceph/mgr/dashboard/module.py", line 276, in serve
mapper = generate_routes(self.url_prefix)
  File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 118, in
generate_routes
ctrls = load_controllers()
  File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 73, in
load_controllers
package='dashboard')
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
  File "/usr/lib/ceph/mgr/dashboard/controllers/rgw.py", line 10, in

from ..services.rgw_client import RgwClient
  File "/usr/lib/ceph/mgr/dashboard/services/rgw_client.py", line 5, in

from ..awsauth import S3Auth
  File "/usr/lib/ceph/mgr/dashboard/awsauth.py", line 49, in 
from requests.auth import AuthBase
  File "/usr/lib/python2.7/dist-packages/requests/__init__.py", line 97, in

from . import utils
  File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 26, in

from ._internal_utils import to_native_string
  File "/usr/lib/python2.7/dist-packages/requests/_internal_utils.py", line
11, in 
from .compat import is_py2, builtin_str, str
  File "/usr/lib/python2.7/dist-packages/requests/compat.py", line 47, in

from urllib3.packages.ordered_dict import OrderedDict
ImportError: No module named ordered_dict


I have tried "ceph mgr module enable dashboard" and it says already
enabled, I tried a disable restart and enable and get the same error above.

,Ashley

On Thu, Nov 29, 2018 at 6:23 PM Lenz Grimmer  wrote:

> On 11/29/18 10:28 AM, Ashley Merrick wrote:
>
> > Sorry missed the basic info!!
> >
> > Latest Mimic 13.2.2
> >
> > Ubuntu 18.04
>
> Thanks. So it worked before the reboot and did not afterwards? What
> changed? Did you perform an OS update?
>
> Would it be possible for you to paste the entire mgr log file messages
> that are printed after the manager restarted? Have you tried to
> explicitly enable the dashboard by running "ceph mgr module enable
> dashboard"?
>
> Lenz
>
> --
> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Ashley Merrick
Managed to fix the issue with some googling from the error above.

There is a bug with urllib3 1.24.1 which breaks the module ordered_dict (1)

I rolled back to a working version "pip install urllib3==1.23" and
restarted the mgr service and all is now working.

Thanks,
Ashley

(1)https://github.com/urllib3/urllib3/issues/1456

On Thu, Nov 29, 2018 at 6:29 PM Ashley Merrick 
wrote:

> Yeah had a few OS updates, but not related directly to CEPH.
>
> The full error log after a reboot is :
>
> 2018-11-29 11:24:22.494 7faf046a1700  1 mgr[restful] server not running:
> no certificate configured
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 log_channel(cluster) log [ERR] :
> Unhandled exception from module 'dashboard' while running on mgr.ceph-m01:
> No module named ordered_dict
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 dashboard.serve:
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 Traceback (most recent call last):
>   File "/usr/lib/ceph/mgr/dashboard/module.py", line 276, in serve
> mapper = generate_routes(self.url_prefix)
>   File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 118, in
> generate_routes
> ctrls = load_controllers()
>   File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 73, in
> load_controllers
> package='dashboard')
>   File "/usr/lib/python2.7/importlib/__init__.py", line 37, in
> import_module
> __import__(name)
>   File "/usr/lib/ceph/mgr/dashboard/controllers/rgw.py", line 10, in
> 
> from ..services.rgw_client import RgwClient
>   File "/usr/lib/ceph/mgr/dashboard/services/rgw_client.py", line 5, in
> 
> from ..awsauth import S3Auth
>   File "/usr/lib/ceph/mgr/dashboard/awsauth.py", line 49, in 
> from requests.auth import AuthBase
>   File "/usr/lib/python2.7/dist-packages/requests/__init__.py", line 97,
> in 
> from . import utils
>   File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 26, in
> 
> from ._internal_utils import to_native_string
>   File "/usr/lib/python2.7/dist-packages/requests/_internal_utils.py",
> line 11, in 
> from .compat import is_py2, builtin_str, str
>   File "/usr/lib/python2.7/dist-packages/requests/compat.py", line 47, in
> 
> from urllib3.packages.ordered_dict import OrderedDict
> ImportError: No module named ordered_dict
>
>
> I have tried "ceph mgr module enable dashboard" and it says already
> enabled, I tried a disable restart and enable and get the same error above.
>
> ,Ashley
>
> On Thu, Nov 29, 2018 at 6:23 PM Lenz Grimmer  wrote:
>
>> On 11/29/18 10:28 AM, Ashley Merrick wrote:
>>
>> > Sorry missed the basic info!!
>> >
>> > Latest Mimic 13.2.2
>> >
>> > Ubuntu 18.04
>>
>> Thanks. So it worked before the reboot and did not afterwards? What
>> changed? Did you perform an OS update?
>>
>> Would it be possible for you to paste the entire mgr log file messages
>> that are printed after the manager restarted? Have you tried to
>> explicitly enable the dashboard by running "ceph mgr module enable
>> dashboard"?
>>
>> Lenz
>>
>> --
>> SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
>> GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
The only strange thing is that ceph-bluestore-tool says that repair was
done, no errors are found and all is ok.
I ask myself what really does that tool.
Mario

Il giorno gio 29 nov 2018 alle ore 11:03 Wido den Hollander 
ha scritto:

>
>
> On 11/29/18 10:45 AM, Mario Giammarco wrote:
> > I have only that copy, it is a showroom system but someone put a
> > production vm on it.
> >
>
> I have a feeling this won't be easy to fix or actually fixable:
>
> - Compaction error: Corruption: block checksum mismatch
> - submit_transaction error: Corruption: block checksum mismatch
>
> RocksDB got corrupted on that OSD and won't be able to start now.
>
> I wouldn't know where to start with this OSD.
>
> Wido
>
> > Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander
> > mailto:w...@42on.com>> ha scritto:
> >
> >
> >
> > On 11/29/18 10:28 AM, Mario Giammarco wrote:
> > > Hello,
> > > I have a ceph installation in a proxmox cluster.
> > > Due to a temporary hardware glitch now I get this error on osd
> startup
> > >
> > > -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033
> > crush map
> > > has features 1009089991638532096, adjusting msgr requires for
> > osds
> > >-5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
> > >
> >  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
> > > Compaction error: Corruption: block checksum mismatch
> > > -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb:
> > (Original Log
> > > Time 2018/11/26-18:02:34.143021)
> > > [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621]
> > [default]
> > > compacted to: base level 1 max bytes base268435456 files[17$
> > >
> > > -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb:
> > (Original Log
> > > Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros":
> > > 1543251754143044, "job": 3, "event": "compaction_finished",
> > > "compaction_time_micros": 1997048, "out$
> > >-2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
> > >
> >  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
> > > Waiting after background compaction error: Corruption: block
> > > checksum mismatch, Accumulated background err$
> > >-1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
> > > submit_transaction error: Corruption: block checksum mismatch
> > code =
> > > 2 Rocksdb transaction:
> > > Delete( Prefix = O key =
> > >
> >
>   
> 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
> > > Put( Prefix = S key = 'nid_max' Value size = 8)
> > > Put( Prefix = S key = 'blobid_max' Value size = 8)
> > > 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function
> > 'void
> > > BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time
> 2018-11-26
> > > 18:02:34.674193
> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED
> > > assert(r == 0)
> > >
> > > ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
> > > luminous (stable)
> > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int,
> char
> > > const*)+0x102) [0x55ec83876092]
> > > 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55]
> > > 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d]
> > > 4: (()+0x7494) [0x7fa1d5027494]
> > > 5: (clone()+0x3f) [0x7fa1d4098acf]
> > >
> > >
> > > I have tried to recover it using ceph-bluestore-tool fsck and
> repair
> > > DEEP but it says it is ALL ok.
> > > I see that rocksd ldb tool needs .db files to recover and not a
> > > partition so I cannot use it.
> > > I do not understand why I cannot start osd if ceph-bluestore-tools
> > says
> > > me I have lost no data.
> > > Can you help me?
> >
> > Why would you try to recover a individual OSD? If all your Placement
> > Groups are active(+clean) just wipe the OSD and re-deploy it.
> >
> > What's the status of your PGs?
> >
> > It says there is a checksum error (probably due to the hardware
> glitch)
> > so it refuses to start.
> >
> > Don't try to outsmart Ceph, let backfill/recovery handle this.
> Trying to
> > manually fix this will only make things worse.
> >
> > Wido
> >
> > > Thanks,
> > > Mario
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-

Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Lenz Grimmer
On 11/29/18 11:29 AM, Ashley Merrick wrote:

> Yeah had a few OS updates, but not related directly to CEPH.

But they seem to be the root cause of the issue you're facing. Thanks
for sharing the entire log entry.

> The full error log after a reboot is :
> 
> 2018-11-29 11:24:22.494 7faf046a1700  1 mgr[restful] server not running:
> no certificate configured
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 log_channel(cluster) log [ERR] :
> Unhandled exception from module 'dashboard' while running on
> mgr.ceph-m01: No module named ordered_dict
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 dashboard.serve:
> 2018-11-29 11:24:22.586 7faf05ee4700 -1 Traceback (most recent call last):
>   File "/usr/lib/ceph/mgr/dashboard/module.py", line 276, in serve
>     mapper = generate_routes(self.url_prefix)
>   File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 118,
> in generate_routes
>     ctrls = load_controllers()
>   File "/usr/lib/ceph/mgr/dashboard/controllers/__init__.py", line 73,
> in load_controllers
>     package='dashboard')
>   File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
>     __import__(name)
>   File "/usr/lib/ceph/mgr/dashboard/controllers/rgw.py", line 10, in
> 
>     from ..services.rgw_client import RgwClient
>   File "/usr/lib/ceph/mgr/dashboard/services/rgw_client.py", line 5, in
> 
>     from ..awsauth import S3Auth
>   File "/usr/lib/ceph/mgr/dashboard/awsauth.py", line 49, in 
>     from requests.auth import AuthBase
>   File "/usr/lib/python2.7/dist-packages/requests/__init__.py", line 97,
> in 
>     from . import utils
>   File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 26, in
> 
>     from ._internal_utils import to_native_string
>   File "/usr/lib/python2.7/dist-packages/requests/_internal_utils.py",
> line 11, in 
>     from .compat import is_py2, builtin_str, str
>   File "/usr/lib/python2.7/dist-packages/requests/compat.py", line 47,
> in 
>     from urllib3.packages.ordered_dict import OrderedDict
> ImportError: No module named ordered_dict
> 
> I have tried "ceph mgr module enable dashboard" and it says already
> enabled, I tried a disable restart and enable and get the same error above.

Try re-installing the following packages: "python-urllib3" and
"python-requests" via apt - somehow Python fails to import a method from
the former library.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW Swift metadata dropped when S3 bucket versioning enabled

2018-11-29 Thread Yehuda Sadeh-Weinraub
On Wed, Nov 28, 2018 at 10:07 AM Maxime Guyot  wrote:
>
> Hi Florian,
>
> You assumed correctly, the "test" container (private) was created with the 
> "openstack container create test", then I am using the S3 API to 
> enable/disable object versioning on it.
> I use the following Python snippet to enable/disable S3 bucket versioning:
>
> import boto, boto.s3, boto.s3.connection
> conn = conn = boto.connect_s3(aws_access_key_id='***', 
> aws_secret_access_key='***', host='***', port=8080, 
> calling_format=boto.s3.connection.OrdinaryCallingFormat())
> bucket = conn.get_bucket('test')
> bucket.configure_versioning(True) # Or False to disable S3 bucket versioning
> bucket.get_versioning_status()
>
> > Semi-related: I've seen some interesting things when mucking around with
> > a single container/bucket while switching APIs, when it comes to
> > container properties and metadata. For example, if you set a public read
> > ACL on an S3 bucket, the the corresponding Swift container is also
> > publicly readable but its read ACL looks empty (i.e. private) when you
> > ask via the Swift API.
>
> This can definitely become a problem if Swift API says "private" but data is 
> actually publicly available.
> Since the doc says "S3 and Swift APIs share a common namespace, so you may 
> write data with one API and retrieve it with the other", it might be useful 
> to document this kind of limitations somewhere.

Note that swift acls and S3 acls don't quite map perfectly to each
other. When S3 public read acl on a bucket doesn't mean that data is
accessible, but rather that bucket can be listed. In swift the
container acls are about the objects inside. Not sure that there is an
equivalent swift acl that would only deal with ability to list objects
in the container.

Yehuda
>
> Cheers,
> / Maxime
>
> On Wed, 28 Nov 2018 at 17:58 Florian Haas  wrote:
>>
>> On 27/11/2018 20:28, Maxime Guyot wrote:
>> > Hi,
>> >
>> > I'm running into an issue with the RadosGW Swift API when the S3 bucket
>> > versioning is enabled. It looks like it silently drops any metadata sent
>> > with the "X-Object-Meta-foo" header (see example below).
>> > This is observed on a Luminous 12.2.8 cluster. Is that a normal thing?
>> > Am I misconfiguring something here?
>> >
>> >
>> > With S3 bucket versioning OFF:
>> > $ openstack object set --property foo=bar test test.dat
>> > $ os object show test test.dat
>> > ++--+
>> > | Field  | Value|
>> > ++--+
>> > | account| v1   |
>> > | container  | test |
>> > | content-length | 507904   |
>> > | content-type   | binary/octet-stream  |
>> > | etag   | 03e8a398f343ade4e1e1d7c81a66e400 |
>> > | last-modified  | Tue, 27 Nov 2018 13:53:54 GMT|
>> > | object | test.dat |
>> > | properties | Foo='bar'|  <= Metadata is here
>> > ++--+
>> >
>> > With S3 bucket versioning ON:
>>
>> Can you elaborate on what exactly you're doing here to enable S3 bucket
>> versioning? Do I assume correctly that you are creating the "test"
>> container using the swift or openstack client, then sending a
>> VersioningConfiguration request against the "test" bucket, as explained
>> in
>> https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html#how-to-enable-disable-versioning-intro?
>>
>> > $ openstack object set --property foo=bar test test2.dat
>> > $ openstack object show test test2.dat
>> > ++--+
>> > | Field  | Value|
>> > ++--+
>> > | account| v1   |
>> > | container  | test |
>> > | content-length | 507904   |
>> > | content-type   | binary/octet-stream  |
>> > | etag   | 03e8a398f343ade4e1e1d7c81a66e400 |
>> > | last-modified  | Tue, 27 Nov 2018 13:56:50 GMT|
>> > | object | test2.dat| <= Metadata is absent
>> > ++--+
>>
>> Semi-related: I've seen some interesting things when mucking around with
>> a single container/bucket while switching APIs, when it comes to
>> container properties and metadata. For example, if you set a public read
>> ACL on an S3 bucket, the the corresponding Swift container is also
>> publicly readable but its read ACL looks empty (i.e. private) when you
>> ask via the Swift API.
>>
>> Cheers,
>> Florian
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users maili

Re: [ceph-users] MGR Dashboard

2018-11-29 Thread Lenz Grimmer
Hi Ashley,

On 11/29/18 11:41 AM, Ashley Merrick wrote:

> Managed to fix the issue with some googling from the error above.
> 
> There is a bug with urllib3 1.24.1 which breaks the module ordered_dict (1)

Good spotting!

> I rolled back to a working version "pip install urllib3==1.23" and
> restarted the mgr service and all is now working.

Glad to hear you got it working again. Thanks for the update!

> (1)https://github.com/urllib3/urllib3/issues/1456

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Igor Fedotov
'ceph-bluestore-tool repair' checks and repairs BlueStore metadata 
consistency not RocksDB one.


It looks like you're observing CRC mismatch during DB compaction which 
is probably not triggered during the repair.


Good point is that it looks like Bluestore's metadata are consistent and 
hence data recovery is still possible  - potentially, can't build up a 
working procedure using existing tools though..


Let me check if one can disable DB compaction using rocksdb settings.


On 11/29/2018 1:42 PM, Mario Giammarco wrote:
The only strange thing is that ceph-bluestore-tool says that repair 
was done, no errors are found and all is ok.

I ask myself what really does that tool.
Mario

Il giorno gio 29 nov 2018 alle ore 11:03 Wido den Hollander 
mailto:w...@42on.com>> ha scritto:




On 11/29/18 10:45 AM, Mario Giammarco wrote:
> I have only that copy, it is a showroom system but someone put a
> production vm on it.
>

I have a feeling this won't be easy to fix or actually fixable:

- Compaction error: Corruption: block checksum mismatch
- submit_transaction error: Corruption: block checksum mismatch

RocksDB got corrupted on that OSD and won't be able to start now.

I wouldn't know where to start with this OSD.

Wido

> Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander
> mailto:w...@42on.com> >> ha scritto:
>
>
>
>     On 11/29/18 10:28 AM, Mario Giammarco wrote:
>     > Hello,
>     > I have a ceph installation in a proxmox cluster.
>     > Due to a temporary hardware glitch now I get this error on
osd startup
>     >
>     >     -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033
>     crush map
>     >     has features 1009089991638532096, adjusting msgr
requires for
>     osds
>     >    -5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
>     >
>
  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
>     >     Compaction error: Corruption: block checksum mismatch
>     >     -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb:
>     (Original Log
>     >     Time 2018/11/26-18:02:34.143021)
>     >  [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621]
>     [default]
>     >     compacted to: base level 1 max bytes
base268435456 files[17$
>     >
>     >     -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb:
>     (Original Log
>     >     Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1
{"time_micros":
>     >     1543251754143044, "job": 3, "event":
"compaction_finished",
>     >     "compaction_time_micros": 1997048, "out$
>     >    -2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
>     >
>
  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
>     >     Waiting after background compaction error: Corruption:
block
>     >     checksum mismatch, Accumulated background err$
>     >    -1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
>     >     submit_transaction error: Corruption: block checksum
mismatch
>     code =
>     >     2 Rocksdb transaction:
>     >     Delete( Prefix = O key =
>     >
>
  
0x7f7ffb6400217363'rub_3.26!='0xfffe'o')

>     >     Put( Prefix = S key = 'nid_max' Value size = 8)
>     >     Put( Prefix = S key = 'blobid_max' Value size = 8)
>     >     0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
>     >  /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function
>     'void
>     >     BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time
2018-11-26
>     >     18:02:34.674193
>     >  /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717:
FAILED
>     >     assert(r == 0)
>     >
>     >     ceph version 12.2.9
(9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>     >     luminous (stable)
>     >     1: (ceph::__ceph_assert_fail(char const*, char const*,
int, char
>     >     const*)+0x102) [0x55ec83876092]
>     >     2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55]
>     >     3: (BlueStore::KVSyncThread::entry()+0xd)
[0x55ec8374040d]
>     >     4: (()+0x7494) [0x7fa1d5027494]
>     >     5: (clone()+0x3f) [0x7fa1d4098acf]
>     >
>     >
>     > I have tried to recover it using ceph-bluestore-tool fsck
and repair
>     > DEEP but it says it is ALL ok.
>     > I see that rocksd ldb tool needs .db files to recover and
not a
>     > partition so I cannot use it.
>     > I do not understand why I cannot start osd if
ceph-bluestore-tools
>     says
>     > me I have lost no data.
>     > Can you help me?
>
>     Why would you try to recover a individual OSD? If all your
Placement
> 

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Paul Emmerich
does objectstore-tool still work? If yes:

export all the PGs on the OSD with objectstore-tool and important them
into a new OSD.

Paul


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Do., 29. Nov. 2018 um 13:06 Uhr schrieb Igor Fedotov :
>
> 'ceph-bluestore-tool repair' checks and repairs BlueStore metadata 
> consistency not RocksDB one.
>
> It looks like you're observing CRC mismatch during DB compaction which is 
> probably not triggered during the repair.
>
> Good point is that it looks like Bluestore's metadata are consistent and 
> hence data recovery is still possible  - potentially, can't build up a 
> working procedure using existing tools though..
>
> Let me check if one can disable DB compaction using rocksdb settings.
>
>
> On 11/29/2018 1:42 PM, Mario Giammarco wrote:
>
> The only strange thing is that ceph-bluestore-tool says that repair was done, 
> no errors are found and all is ok.
> I ask myself what really does that tool.
> Mario
>
> Il giorno gio 29 nov 2018 alle ore 11:03 Wido den Hollander  
> ha scritto:
>>
>>
>>
>> On 11/29/18 10:45 AM, Mario Giammarco wrote:
>> > I have only that copy, it is a showroom system but someone put a
>> > production vm on it.
>> >
>>
>> I have a feeling this won't be easy to fix or actually fixable:
>>
>> - Compaction error: Corruption: block checksum mismatch
>> - submit_transaction error: Corruption: block checksum mismatch
>>
>> RocksDB got corrupted on that OSD and won't be able to start now.
>>
>> I wouldn't know where to start with this OSD.
>>
>> Wido
>>
>> > Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander
>> > mailto:w...@42on.com>> ha scritto:
>> >
>> >
>> >
>> > On 11/29/18 10:28 AM, Mario Giammarco wrote:
>> > > Hello,
>> > > I have a ceph installation in a proxmox cluster.
>> > > Due to a temporary hardware glitch now I get this error on osd 
>> > startup
>> > >
>> > > -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033
>> > crush map
>> > > has features 1009089991638532096, adjusting msgr requires for
>> > osds
>> > >-5> 2018-11-26 18:02:34.143084 7fa1c33f9700  3 rocksdb:
>> > >
>> >  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
>> > > Compaction error: Corruption: block checksum mismatch
>> > > -4> 2018-11-26 18:02:34.143123 7fa1c33f9700 4 rocksdb:
>> > (Original Log
>> > > Time 2018/11/26-18:02:34.143021)
>> > > [/build/ceph-12.2.9/src/rocksdb/db/compaction_job.cc:621]
>> > [default]
>> > > compacted to: base level 1 max bytes base268435456 files[17$
>> > >
>> > > -3> 2018-11-26 18:02:34.143126 7fa1c33f9700 4 rocksdb:
>> > (Original Log
>> > > Time 2018/11/26-18:02:34.143068) EVENT_LOG_v1 {"time_micros":
>> > > 1543251754143044, "job": 3, "event": "compaction_finished",
>> > > "compaction_time_micros": 1997048, "out$
>> > >-2> 2018-11-26 18:02:34.143152 7fa1c33f9700  2 rocksdb:
>> > >
>> >  [/build/ceph-12.2.9/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
>> > > Waiting after background compaction error: Corruption: block
>> > > checksum mismatch, Accumulated background err$
>> > >-1> 2018-11-26 18:02:34.674171 7fa1c4bfc700 -1 rocksdb:
>> > > submit_transaction error: Corruption: block checksum mismatch
>> > code =
>> > > 2 Rocksdb transaction:
>> > > Delete( Prefix = O key =
>> > >
>> >  
>> > 0x7f7ffb6400217363'rub_3.26!='0xfffe'o')
>> > > Put( Prefix = S key = 'nid_max' Value size = 8)
>> > > Put( Prefix = S key = 'blobid_max' Value size = 8)
>> > > 0> 2018-11-26 18:02:34.675641 7fa1c4bfc700 -1
>> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: In function
>> > 'void
>> > > BlueStore::_kv_sync_thread()' thread 7fa1c4bfc700 time 2018-11-26
>> > > 18:02:34.674193
>> > > /build/ceph-12.2.9/src/os/bluestore/BlueStore.cc: 8717: FAILED
>> > > assert(r == 0)
>> > >
>> > > ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217)
>> > > luminous (stable)
>> > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > > const*)+0x102) [0x55ec83876092]
>> > > 2: (BlueStore::_kv_sync_thread()+0x24b5) [0x55ec836ffb55]
>> > > 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55ec8374040d]
>> > > 4: (()+0x7494) [0x7fa1d5027494]
>> > > 5: (clone()+0x3f) [0x7fa1d4098acf]
>> > >
>> > >
>> > > I have tried to recover it using ceph-bluestore-tool fsck and repair
>> > > DEEP but it says it is ALL ok.
>> > > I see that rocksd ldb tool needs .db files to recover and not a
>> > > partition so I cannot use it.
>> > > I do not under

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Igor Fedotov

Yeah, that may be the way.

Preferably to disable compaction during this procedure though.

To do that please set

bluestore rocksdb options = "disable_auto_compactions=true"

in [osd] section in ceph.conf


Thanks,

Igor


On 11/29/2018 4:54 PM, Paul Emmerich wrote:

does objectstore-tool still work? If yes:

export all the PGs on the OSD with objectstore-tool and important them
into a new OSD.

Paul




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Paul Emmerich
If this is really the last copy of important data: consider making a
full raw clone of the disk before running any ceph-objectstore-tool
commands on it and consider getting some professional help if you are
not too familiar with the inner workings of Ceph.

That being said, it's basically just:

ceph-objectstore-tool --bluestore --op export --pg  --data-path
/var/lib/ceph/osd/ceph-XX --file 

and --op import into the new OSD. I don't think there's a command for
all PGs at once, so you'll have to iterate over the broken PGs (or
--op list-pgs)

good luck.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Do., 29. Nov. 2018 um 15:36 Uhr schrieb Davide Accetturi
:
>
> Hi guys
> Thanks for your replies,
> What’s the right syntax of Ceph-objectstore-tool in order to export all of 
> the PGs??
>
> Thanks you so much
> Davide
>
> Sent from my iPhone
>
> > On 29 Nov 2018, at 15:15, Igor Fedotov  wrote:
> >
> > Yeah, that may be the way.
> >
> > Preferably to disable compaction during this procedure though.
> >
> > To do that please set
> >
> > bluestore rocksdb options = "disable_auto_compactions=true"
> >
> > in [osd] section in ceph.conf
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> >> On 11/29/2018 4:54 PM, Paul Emmerich wrote:
> >> does objectstore-tool still work? If yes:
> >>
> >> export all the PGs on the OSD with objectstore-tool and important them
> >> into a new OSD.
> >>
> >> Paul
> >>
> >>
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Removing orphaned radosgw bucket indexes from pool

2018-11-29 Thread Bryan Stillwell
Wido,

I've been looking into this large omap objects problem on a couple of our 
clusters today and came across your script during my research.

The script has been running for a few hours now and I'm already over 100,000 
'orphaned' objects!

It appears that ever since upgrading to Luminous (12.2.5 initially, followed by 
12.2.8) this cluster has been resharding the large bucket indexes at least once 
a day and not cleaning up the previous bucket indexes:

for instance in $(radosgw-admin metadata list bucket.instance | jq -r '.[]' | 
grep go-test-dashboard); do
  mtime=$(radosgw-admin metadata get bucket.instance:${instance} | grep mtime)
  num_shards=$(radosgw-admin metadata get bucket.instance:${instance} | grep 
num_shards)
  echo "${instance}: ${mtime} ${num_shards}"
done | column -t | sort -k3
go-test-dashboard:default.188839135.327804:  "mtime":  "2018-06-01  
22:35:28.693095Z",  "num_shards":  0,
go-test-dashboard:default.617828918.2898:"mtime":  "2018-06-02  
22:35:40.438738Z",  "num_shards":  46,
go-test-dashboard:default.617828918.4:   "mtime":  "2018-06-02  
22:38:21.537259Z",  "num_shards":  46,
go-test-dashboard:default.617663016.10499:   "mtime":  "2018-06-03  
23:00:04.185285Z",  "num_shards":  46,
[...snip...]
go-test-dashboard:default.891941432.342061:  "mtime":  "2018-11-28  
01:41:46.777968Z",  "num_shards":  7,
go-test-dashboard:default.928133068.2899:"mtime":  "2018-11-28  
20:01:49.390237Z",  "num_shards":  46,
go-test-dashboard:default.928133068.5115:"mtime":  "2018-11-29  
01:54:17.788355Z",  "num_shards":  7,
go-test-dashboard:default.928133068.8054:"mtime":  "2018-11-29  
20:21:53.733824Z",  "num_shards":  7,
go-test-dashboard:default.891941432.359004:  "mtime":  "2018-11-29  
20:22:09.201965Z",  "num_shards":  46,

The num_shards is typically around 46, but looking at all 288 instances of that 
bucket index, it has varied between 3 and 62 shards.

Have you figured anything more out about this since you posted this originally 
two weeks ago?

Thanks,
Bryan

From: ceph-users  on behalf of Wido den 
Hollander 
Date: Thursday, November 15, 2018 at 5:43 AM
To: Ceph Users 
Subject: [ceph-users] Removing orphaned radosgw bucket indexes from pool

Hi,

Recently we've seen multiple messages on the mailinglists about people
seeing HEALTH_WARN due to large OMAP objects on their cluster. This is
due to the fact that starting with 12.2.6 OSDs warn about this.

I've got multiple people asking me the same questions and I've done some
digging around.

Somebody on the ML wrote this script:

for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do
  actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`
  for instance in `radosgw-admin metadata list bucket.instance | jq -r
'.[]' | grep ${bucket}: | cut -d ':' -f 2`
  do
if [ "$actual_id" != "$instance" ]
then
  radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
  radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
fi
  done
done

That partially works, but 'orphaned' objects in the index pool do not work.

So I wrote my own script [0]:

#!/bin/bash
INDEX_POOL=$1

if [ -z "$INDEX_POOL" ]; then
echo "Usage: $0 "
exit 1
fi

INDEXES=$(mktemp)
METADATA=$(mktemp)

trap "rm -f ${INDEXES} ${METADATA}" EXIT

radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA}
rados -p ${INDEX_POOL} ls > $INDEXES

for OBJECT in $(cat ${INDEXES}); do
MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5)
grep ${MARKER} ${METADATA} > /dev/null
if [ "$?" -ne 0 ]; then
echo $OBJECT
fi
done

It does not remove anything, but for example, it returns these objects:

.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186

The output of:

$ radosgw-admin metadata list|jq -r '.[]'

Does not contain:
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186

So for me these objects do not seem to be tied to any bucket and seem to
be leftovers which were not cleaned up.

For example, I see these objects tied to a bucket:

- b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167

But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186

Before I remove these objects I want to verify with other users if they
see the same and if my thinking is correct.

Wido

[0]: https://gist.github.com/wido/6650e66b09770ef02df89636891bef04

___
ceph-users mailing list
mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-cep

[ceph-users] client failing to respond to cache pressure

2018-11-29 Thread Zhenshi Zhou
Hi

I used to get warning message claims client faling to respond
to cache pressure. After I switch rockdb and wal data to ssd, the
message seems disappeared.

However it shows again yesterday and the message looks a little
different: *MDS_CLIENT_RECALL_MANY: 1 MDSs have many *
*clients failing to respond to cache pressure.*

I added some fuse client and it has 48 clients mounting the cephfs
at present. The cluster has only one MDS active and one standby.

How can I get the cluster healthy again? I'm happy to provide any
information of the cluster.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client failing to respond to cache pressure

2018-11-29 Thread Zhenshi Zhou
Hi

The cluster recovers healthy after I set "mds_cache_memory_limit" from 4G
to 8G.

Zhenshi Zhou  于2018年11月30日周五 上午11:04写道:

> Hi
>
> I used to get warning message claims client faling to respond
> to cache pressure. After I switch rockdb and wal data to ssd, the
> message seems disappeared.
>
> However it shows again yesterday and the message looks a little
> different: *MDS_CLIENT_RECALL_MANY: 1 MDSs have many *
> *clients failing to respond to cache pressure.*
>
> I added some fuse client and it has 48 clients mounting the cephfs
> at present. The cluster has only one MDS active and one standby.
>
> How can I get the cluster healthy again? I'm happy to provide any
> information of the cluster.
>
> Thanks
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Move Instance between Different Ceph and Openstack Installation

2018-11-29 Thread Konstantin Shalygin

I need to move instance from one Openstack with Ceph to another
different Openstack with Ceph installation. The instance use volume to
boot with another volume attach for data. The instance has 200GB volume
boot and attach with 1TB volume for data.

  From what I know, I need to download the volume boot to raw from rbd
then transfer to the new Openstack installation, then reupload to Glance
and create new instance from that image. The problem is uploading to
Glance is quite time consuming because the size of image.

Anyone know the another efficient way to moving instance between
different Openstack with Ceph installation?



1. Export from "one Openstack with Ceph" image (qemu-img/rbd export).

2. Create in "different Openstack with Ceph" image with same size.

3. Get uuid of this image.

4. Import your data to this image (qemu-img/rbd import).



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd IO monitoring

2018-11-29 Thread Michael Green
Hello collective wisdom,

Ceph neophyte here, running v13.2.2 (mimic).

Question: what tools are available to monitor IO stats on RBD level? That is, 
IOPS, Throughput, IOs inflight and so on?
I'm testing with FIO and want to verify independently the IO load on each RBD 
image.

--
Michael Green
Customer Support & Integration
gr...@e8storage.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd IO monitoring

2018-11-29 Thread Wido den Hollander



On 11/30/18 5:48 AM, Michael Green wrote:
> Hello collective wisdom,
> 
> Ceph neophyte here, running v13.2.2 (mimic).
> 
> Question: what tools are available to monitor IO stats on RBD level?
> That is, IOPS, Throughput, IOs inflight and so on?
> I'm testing with FIO and want to verify independently the IO load on
> each RBD image.
> 

There is no central point in Ceph where all I/O passes, so those
counters can only be found on the client issuing the I/O.

If you enable the admin socket on a librbd client you can use the 'perf
dump' command so see what it's doing.

This is how you enable the socket:
https://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/

Wido

> --
> *Michael Green*
> Customer Support & Integration
> gr...@e8storage.com 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com