date:20180903

[ceph-users] Slow requests from bluestore osds

2018-09-03 Thread Marc Schöchlin

Hi,

we are also experiencing this type of behavior for some weeks on our not
so performance critical hdd pools.
We haven't spent so much time on this problem, because there are
currently more important tasks - but here are a few details:

Running the following loop results in the following output:

while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health
detail); sleep 2; done

Sun Sep  2 20:59:47 CEST 2018
HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
    4 ops are blocked > 32.768 sec
    osd.43 has blocked requests > 32.768 sec
Sun Sep  2 20:59:50 CEST 2018
HEALTH_WARN 4 slow requests are blocked > 32 sec
REQUEST_SLOW 4 slow requests are blocked > 32 sec
    4 ops are blocked > 32.768 sec
    osd.43 has blocked requests > 32.768 sec
Sun Sep  2 20:59:52 CEST 2018
HEALTH_OK
Sun Sep  2 21:00:28 CEST 2018
HEALTH_WARN 1 slow requests are blocked > 32 sec
REQUEST_SLOW 1 slow requests are blocked > 32 sec
    1 ops are blocked > 32.768 sec
    osd.41 has blocked requests > 32.768 sec
Sun Sep  2 21:00:31 CEST 2018
HEALTH_WARN 7 slow requests are blocked > 32 sec
REQUEST_SLOW 7 slow requests are blocked > 32 sec
    7 ops are blocked > 32.768 sec
    osds 35,41 have blocked requests > 32.768 sec
Sun Sep  2 21:00:33 CEST 2018
HEALTH_WARN 7 slow requests are blocked > 32 sec
REQUEST_SLOW 7 slow requests are blocked > 32 sec
    7 ops are blocked > 32.768 sec
    osds 35,51 have blocked requests > 32.768 sec
Sun Sep  2 21:00:35 CEST 2018
HEALTH_WARN 7 slow requests are blocked > 32 sec
REQUEST_SLOW 7 slow requests are blocked > 32 sec
    7 ops are blocked > 32.768 sec
    osds 35,51 have blocked requests > 32.768 sec

Our details:

  * system details:
* Ubuntu 16.04
 * Kernel 4.13.0-39
 * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
 * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
   * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
   * 2*10GBITS SFP+ Network Adapters
   * 192GB RAM
 * Pools are using replication factor 3, 2MB object size,
   85% write load, 1700 write IOPS/sec
   (ops mainly between 4k and 16k size), 300 read IOPS/sec
  * we have the impression that this appears on deepscrub/scrub activity.
  * Ceph 12.2.5, we alread played with the osd settings OSD Settings
(our assumtion was that the problem is related to rocksdb compaction)
bluestore cache kv max = 2147483648
bluestore cache kv ratio = 0.9
bluestore cache meta ratio = 0.1
bluestore cache size hdd = 10737418240
  * this type problem only appears on hdd/bluestore osds, ssd/bluestore
osds did never experienced that problem
  * the system is healthy, no swapping, no high load, no errors in dmesg

I attached a log excerpt of osd.35 - probably this is useful for
investigating the problem is someone owns deeper bluestore knowledge.
(slow requests appeared on Sun Sep  2 21:00:35)

Regards
Marc


Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
> The warnings look like this. 
>
> 6 ops are blocked > 32.768 sec on osd.219
> 1 osds have slow requests
>
> On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza  > wrote:
>
> On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
> mailto:bchancel...@salesforce.com>>
> wrote:
> > Hi Cephers,
> >   I am in the process of upgrading a cluster from Filestore to
> bluestore,
> > but I'm concerned about frequent warnings popping up against the new
> > bluestore devices. I'm frequently seeing messages like this,
> although the
> > specific osd changes, it's always one of the few hosts I've
> converted to
> > bluestore.
> >
> > 6 ops are blocked > 32.768 sec on osd.219
> > 1 osds have slow requests
> >
> > I'm running 12.2.4, have any of you seen similar issues? It
> seems as though
> > these messages pop up more frequently when one of the bluestore
> pgs is
> > involved in a scrub.  I'll include my bluestore creation process
> below, in
> > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
> sdf are
> > SSD)
>
> Would be useful to include what those warnings say. The ceph-volume
> commands look OK to me
>
> >
> >
> > ## Process used to create osds
> > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
> /dev/sdf
> > sudo ceph-volume lvm zap /dev/sdb
> > sudo ceph-volume lvm zap /dev/sdc
> > sudo ceph-volume lvm zap /dev/sdd
> > sudo ceph-volume lvm zap /dev/sde
> > sudo ceph-volume lvm zap /dev/sdf
> > sudo sgdisk -n 0:2048:+133GiB -t 0: -c 1:"ceph block.db sdb"
> /dev/sdf
> > sudo sgdisk -n 0:0:+133GiB -t 0: -c 2:"ceph block.db sdc"
> /dev/sdf
> > sudo sgdisk -n 0:0:+133GiB -t 0: -c 3:"ceph block.db sdd"
> /dev/sdf
> > sudo sgdisk -n 0:0:+133GiB -t 0: -c 4:"ceph block.db sde"
> /dev/sdf
> > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
> --data

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-03 Thread William Lawton

Which configuration option determines the MDS timeout period?

William Lawton

From: Gregory Farnum 
Sent: Thursday, August 30, 2018 5:46 PM
To: William Lawton 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS does not always failover to hot standby on reboot

Yes, this is a consequence of co-locating the MDS and monitors — if the MDS 
reports to its co-located monitor and both fail, the monitor cluster has to go 
through its own failure detection and then wait for a full MDS timeout period 
after that before it marks the MDS down. :(

We might conceivably be able to optimize for this, but there's not a general 
solution. If you need to co-locate, one thing that would make it better without 
being a lot of work is trying to have the MDS connect to one of the monitors on 
a different host. You can do that by just restricting the list of monitors you 
feed it in the ceph.conf, although it's not a guarantee that will *prevent* it 
from connecting to its own monitor if there are failures or reconnects after 
first startup.
-Greg
On Thu, Aug 30, 2018 at 8:38 AM William Lawton 
mailto:william.law...@irdeto.com>> wrote:
Hi.

We have a 5 node Ceph cluster (refer to ceph -s output at bottom of email). 
During resiliency tests we have an occasional problem when we reboot the active 
MDS instance and a MON instance together i.e.  dub-sitv-ceph-02 and 
dub-sitv-ceph-04. We expect the MDS to failover to the standby instance 
dub-sitv-ceph-01 which is in standby-replay mode, and 80% of the time it does 
with no problems. However, 20% of the time it doesn’t and the MDS_ALL_DOWN 
health check is not cleared until 30 seconds later when the rebooted 
dub-sitv-ceph-02 and dub-sitv-ceph-04 instances come back up.

When the MDS successfully fails over to the standby we see in the ceph.log the 
following:

2018-08-25 00:30:02.231811 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 50 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 00:30:02.237389 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 52 : cluster [INF] Standby daemon 
mds.dub-sitv-ceph-01 assigned to filesystem cephfs as rank 0
2018-08-25 00:30:02.237528 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 54 : cluster [INF] Health check 
cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)

When the active MDS role does not failover to the standby the MDS_ALL_DOWN 
check is not cleared until after the rebooted instances have come back up e.g.:

2018-08-25 03:30:02.936554 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 55 : cluster [ERR] Health check 
failed: 1 filesystem is offline (MDS_ALL_DOWN)
2018-08-25 03:30:04.235703 mon.dub-sitv-ceph-05 mon.2 
10.18.186.208:6789/0 226 : cluster [INF] 
mon.dub-sitv-ceph-05 calling monitor election
2018-08-25 03:30:04.238672 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 56 : cluster [INF] 
mon.dub-sitv-ceph-03 calling monitor election
2018-08-25 03:30:09.242595 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 57 : cluster [INF] 
mon.dub-sitv-ceph-03 is new leader, mons dub-sitv-ceph-03,dub-sitv-ceph-05 in 
quorum (ranks 0,2)
2018-08-25 03:30:09.252804 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 62 : cluster [WRN] Health check 
failed: 1/3 mons down, quorum dub-sitv-ceph-03,dub-sitv-ceph-05 (MON_DOWN)
2018-08-25 03:30:09.258693 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 63 : cluster [WRN] overall 
HEALTH_WARN 2 osds down; 2 hosts (2 osds) down; 1/3 mons down, quorum 
dub-sitv-ceph-03,dub-sitv-ceph-05
2018-08-25 03:30:10.254162 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 64 : cluster [WRN] Health check 
failed: Reduced data availability: 2 pgs inactive, 115 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:12.429145 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 66 : cluster [WRN] Health check 
failed: Degraded data redundancy: 712/2504 objects degraded (28.435%), 86 pgs 
degraded (PG_DEGRADED)
2018-08-25 03:30:16.137408 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 67 : cluster [WRN] Health check 
update: Reduced data availability: 1 pg inactive, 69 pgs peering 
(PG_AVAILABILITY)
2018-08-25 03:30:17.193322 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 68 : cluster [INF] Health check 
cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 69 pgs 
peering)
2018-08-25 03:30:18.432043 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 69 : cluster [WRN] Health check 
update: Degraded data redundancy: 1286/2572 objects degraded (50.000%), 166 pgs 
degraded (PG_DEGRADED)
2018-08-25 03:30:26.139491 mon.dub-sitv-ceph-03 mon.0 
10.18.53.32:6789/0 71 : cluster [WRN] Health chec

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

2018-09-03 Thread Eugen Block


Hi Jones,

I still don't think creating an OSD on a partition will work. The  
reason is that SES creates an additional partition per OSD resulting  
in something like this:


vdb   253:16   05G  0 disk
├─vdb1253:17   0  100M  0 part /var/lib/ceph/osd/ceph-1
└─vdb2253:18   0  4,9G  0 part

Even with external block.db and wal.db on additional devices you would  
still need two partitions for the OSD. I'm afraid with your setup this  
can't work.


Regards,
Eugen


Zitat von Jones de Andrade :


Hi Eugen.

Sorry for the double email, but now it stoped complaining (too much) about
repositories and NTP and moved forward again.

So, I ran on master:

**


























































*# salt-run state.orch ceph.stage.deployfirewall :
disabledapparmor : disabledfsid :
validpublic_network   : validcluster_network  :
validcluster_interface: validmonitors :
validmgrs : validstorage  :
validganesha  : validmaster_role  :
validtime_server  : validfqdn :
valid[ERROR   ] {'out': 'highstate', 'ret': {'bohemia.iq.ufrgs.br
':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.639582', 'duration': 40.998, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.680857', 'duration': 19.265, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
bohemia.iq.ufrgs.br  for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.701179', 'duration': 38.789, '__id__': 'deploy
OSDs'}}, 'torcello.iq.ufrgs.br ':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.768119', 'duration': 39.544, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.807977', 'duration': 16.645, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
torcello.iq.ufrgs.br  for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.825744', 'duration': 39.334, '__id__': 'deploy
OSDs'}}, 'patricia.iq.ufrgs.br ':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:52.039506', 'duration': 41.975, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/et in
advancec/ceph/ceph.client.storage.keyring_|-managed': {'changes': {},
'pchanges': {}, 'comment': 'File /etc/ceph/ceph.client.storage.keyring is
in the correct state', 'name': '/etc/ceph/ceph.client.storage.keyring',
'result': True, '__sls__': 'ceph.osd.keyring.default', '__run_num__': 1,
'start_time': '12:43:52.081767', 'duration': 17.852, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw

[ceph-users] luminous 12.2.6 -> 12.2.7 active+clean+inconsistent PGs workaround (or wait for 12.2.8+ ?)

2018-09-03 Thread SCHAER Frederic

Hi,

For those facing (lots of) active+clean+inconsistent PGs after the luminous 
12.2.6 metadata corruption and 12.2.7 upgrade, I'd like to explain how I 
finally got rid of those.

Disclaimer : my cluster doesn't contain highly valuable data, and I can sort of 
recreate what is actually contains : VMs. The following is risky...

One reason I needed to fix those issues is that I faced IO errors whit pool 
overlays/tiering which were apparently related to the inconsistencies, and the 
only way I could get my VMs running again was to completely disable the SSDs 
overlay, which is far from  ideal.
For those not feeling the need to fix this "harmless" issue, please stop 
reading.
For the others, please understand the risks of the following... or wait for an 
official "pg repair" solution

So :

1st step :
since I was getting an ever growing list of damaged PGs, I decided to 
deep-scrub... all PGs.
Yes. If you have 1+PB data... stop reading (or not ?).

How to do that :
# for j in  ; do for i in `ceph pg ls-by-pool $j |cut -d " " -f 
1|tail -n +2`; do ceph pg deep-scrub $i ; done ; done

I think I already had a full list of damaged PGs until I upgraded to mimic and 
restarted the MONs/the OSDs : I believe daemons restarts caused ceph to forget 
about known inconsistencies.
If you believe the number of damaged PGs is sort of stable for you then skip 
step 1...

2nd step is sort of easy : it is to apply the method described here :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/021054.html

I tried to add some rados locking before overwriting the objects (4M rbd 
objects in my case), but was still able to overwrite a locked object even with 
"rados -p rbd lock get --lock-type exclusive" ... maybe I haven't tried hard 
enough.
It would have been great if it were possible to make sure the object was not 
overwritten between a get and a put :/ - that would make this procedure much 
safer...

In my case, I had 2000+ damaged PGs, so I wrote a small script that should 
process those PGs and should try to apply the procedure:
https://gist.github.com/fschaer/cb851eae4f46287eaf30715e18f14524

My Ceph cluster has been healthy since Friday evening and I haven't seen any 
data corruption nor any hung VM...

Cheers
Frederic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph Luminous - journal setting

2018-09-03 Thread M Ranga Swami Reddy

Hi  - I am using the Ceph Luminous release. here what are the OSD
journal settings needed for OSD?
NOTE: I used SSDs for journal till Jewel release.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Packages for debian in Ceph repo

2018-09-03 Thread Abhishek Lekshmanan

arad...@tma-0.net writes:

> Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages for 
> Debian? I'm not seeing any, but maybe I'm missing something...
>
> I'm seeing ceph-deploy install an older version of ceph on the nodes (from 
> the 
> Debian repo) and then failing when I run "ceph-deploy osd ..." because ceph-
> volume doesn't exist on the nodes.
>
The newer versions of Ceph (from mimic onwards) requires compiler
toolchains supporting c++17 which we unfortunately do not have for
stretch/jessie yet. 

-
Abhishek 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous RGW errors at start

2018-09-03 Thread Janne Johansson

Did you change the default pg_num or pgp_num so the pools that did show up
made it go past the mon_max_pg_per_osd ?


Den fre 31 aug. 2018 kl 17:20 skrev Robert Stanford :

>
>  I installed a new Luminous cluster.  Everything is fine so far.  Then I
> tried to start RGW and got this error:
>
> 2018-08-31 15:15:41.998048 7fc350271e80  0 rgw_init_ioctx ERROR:
> librados::Rados::pool_create returned (34) Numerical result out of range
> (this can be due to a pool or placement group misconfiguration, e.g. pg_num
> < pgp_num or mon_max_pg_per_osd exceeded)
> 2018-08-31 15:15:42.005732 7fc350271e80 -1 Couldn't init storage provider
> (RADOS)
>
>  I notice that the only pools that exist are the data and index RGW pools
> (no user or log pools like on Jewel).  What is causing this?
>
>  Thank you
>  R
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Can't upgrade to MDS version 12.2.8

2018-09-03 Thread Marlin Cremers

So we now have a different error, I ran `ceph fs reset k8s` because of the
map that was in the strange state. Now I'm getting the following error in
the MDS log when it tries to 'join' the cluster (even though its the only
one):

https://gist.github.com/Marlinc/59d0a9fe3c34fed86c3aba2ebff850fb


 0> 2018-09-03 07:59:05.143026 7f9f9381d700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/mds/MDCache.cc:
In function 'void MDCache::rejoin_send_rejoins()' thread 7f9f9381d700 time
2018-09-03 07:59:05.140564
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/mds/MDCache.cc:
4029: FAILED assert(auth >= 0)

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x5604f3f96510]
 2: (MDCache::rejoin_send_rejoins()+0x29b4) [0x5604f3d623e4]
 3: (MDCache::process_imported_caps()+0x12d8) [0x5604f3d66328]
 4: (MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x3ec) [0x5604f3d690dc]
 5: (MDSInternalContextBase::complete(int)+0x1eb) [0x5604f3ee0e2b]
 6: (void finish_contexts(CephContext*,
std::list
>&, int)+0xac) [0x5604f3c6395c]
 7: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&,
int)+0x15c) [0x5604f3d1f05c]
 8: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::list&,
int)+0x493) [0x5604f3d477a3]
 9: (MDSIOContextBase::complete(int)+0xa4) [0x5604f3ee1144]
 10: (Finisher::finisher_thread_entry()+0x198) [0x5604f3f95488]
 11: (()+0x7e25) [0x7f9f9e870e25]
 12: (clone()+0x6d) [0x7f9f9d950bad]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
  20/20 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file
--- end dump of recent events ---
*** Caught signal (Aborted) **
 in thread 7f9f9381d700 thread_name:fn_anonymous
 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)
 1: (()+0x5b7a11) [0x5604f3f55a11]
 2: (()+0xf6d0) [0x7f9f9e8786d0]
 3: (gsignal()+0x37) [0x7f9f9d888277]
 4: (abort()+0x148) [0x7f9f9d889968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x284) [0x5604f3f96684]
 6: (MDCache::rejoin_send_rejoins()+0x29b4) [0x5604f3d623e4]
 7: (MDCache::process_imported_caps()+0x12d8) [0x5604f3d66328]
 8: (MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x3ec) [0x5604f3d690dc]
 9: (MDSInternalContextBase::complete(int)+0x1eb) [0x5604f3ee0e2b]
 10: (void finish_contexts(CephContext*,
std::list
>&, int)+0xac) [0x5604f3c6395c]
 11: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&,
int)+0x15c) [0x5604f3d1f05c]
 12: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::list&,
int)+0x493) [0x5604f3d477a3]
 13: (MDSIOContextBase::complete(int)+0xa4) [0x5604f3ee1144]
 14: (Finisher::finisher_thread_entry()+0x198) [0x5604f3f95488]
 15: (()+0x7e25) [0x7f9f9e870e25]
 16: (clone()+0x6d) [0x7f9f9d950bad]
2018-09-03 07:59:05.214546 7f9f9381d700 -1 *** Caught signal (Aborted) **
 in thread 7f9f9381d700 thread_name:fn_anonymous

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)
 1: (()+0x5b7a11) [0x5604f3f55a11]
 2: (()+0xf6d0) [0x7f9f9e8786d0]
 3: (gsignal()+0x37) [0x7f9f9d888277]
 4: (abort()+0x148) [0x7f9f9d889968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x284) [0x5604f3f96684]
 6: (MDCache::rejoin_send_rejoins()+0x29b4) [0x5604f3d623e4]
 7: (MDCache::process_imported_caps()+0x12d8) [0x5604f3d66328]
 8: (MDCache::rejoin_open_ino_finish(inodeno_t, int)+0x3ec) [0x5604f3d690dc]
 9: (MDSInternalContextBase::complete(int)+0x1eb) [0x5604f3ee0e2b]
 10: (void finish_contexts(CephContext*,
std::list
>&, int)+0xac) [0x5604f3c6395c]
 11: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&,
int)+0

Re: [ceph-users] Luminous missing osd_backfill_full_ratio

2018-09-03 Thread David C

In the end if was because I hadn't completed the upgrade with "ceph osd
require-osd-release luminous", after setting that I had the default
backfill full (0.9 I think) and was able to change it with ceph osd set
backfillfull-ratio.

Potential gotcha for a Jewel -> Luminous upgrade if you delay the
"...require-osd-release luminous" for whatever reason as it appears to
leave you with no backfillfull limit

Still having a bit of an issue with new OSDs over filling but will start a
new thread for that

Cheers,

On Thu, Aug 30, 2018 at 10:34 PM David Turner  wrote:

> This moved to the PG map in luminous. I think it might have been there in
> Jewel as well.
>
> http://docs.ceph.com/docs/luminous/man/8/ceph/#pg
> ceph pg set_full_ratio 
> ceph pg set_backfillfull_ratio 
> ceph pg set_nearfull_ratio 
>
>
> On Thu, Aug 30, 2018, 1:57 PM David C  wrote:
>
>> Hi All
>>
>> I feel like this is going to be a silly query with a hopefully simple
>> answer. I don't seem to have the osd_backfill_full_ratio config option on
>> my OSDs and can't inject it. This a Lumimous 12.2.1 cluster that was
>> upgraded from Jewel.
>>
>> I added an OSD to the cluster and woke up the next day to find the OSD
>> had hit OSD_FULL. I'm pretty sure the reason it filled up was because the
>> new host was weighted too high (I initially add two OSDs but decided to
>> only backfill one at a time). The thing that surprised me was why a
>> backfill full ratio didn't kick in to prevent this from happening.
>>
>> One potentially key piece of info is I haven't run the "ceph osd
>> require-osd-release luminous" command yet (I wasn't sure what impact this
>> would have so was waiting for a window with quiet client I/O).
>>
>> ceph osd dump is showing zero for all full ratios:
>>
>> # ceph osd dump | grep full_ratio
>> full_ratio 0
>> backfillfull_ratio 0
>> nearfull_ratio 0
>>
>> Do I simply need to run ceph osd set -backfillfull-ratio? Or am I missing
>> something here. I don't understand why I don't have a default backfill_full
>> ratio on this cluster.
>>
>> Thanks,
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C

Hi all

Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
time. I've only added one so far but it's getting too full.

The drive is the same size (4TB) as all others in the cluster, all OSDs
have crush weight of 3.63689. Average usage on the drives is 81.70%

With the new OSD I start with a crush weight 0 and steadily increase. It's
currently crush weight 3.0 and is 94.78% full. If I increase to 3.63689
it's going to hit too full.

It's been a while since I've added a host to an existing cluster. Any idea
why the drive is getting too full? Do I just have to leave this one with a
lower crush weight and then continue adding the drives and then eventually
even out the crush weights?

Thanks
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous new OSD being over filled

2018-09-03 Thread Marc Roos

 

I am adding a node like this, I think it is more efficient, because in 
your case you will have data being moved within the added node (between 
the newly added osd's there). So far no problems with this.

Maybe limit your 
ceph tell osd.* injectargs --osd_max_backfills=X
Because pg's being moved are taking space until the move is completed. 

sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
sudo -u ceph ceph osd crush reweight osd.24 1 
sudo -u ceph ceph osd crush reweight osd.25 1 
sudo -u ceph ceph osd crush reweight osd.26 1 
sudo -u ceph ceph osd crush reweight osd.27 1 
sudo -u ceph ceph osd crush reweight osd.28 1 
sudo -u ceph ceph osd crush reweight osd.29 1 

And then after recovery

sudo -u ceph ceph osd crush reweight osd.23 2
sudo -u ceph ceph osd crush reweight osd.24 2
sudo -u ceph ceph osd crush reweight osd.25 2
sudo -u ceph ceph osd crush reweight osd.26 2
sudo -u ceph ceph osd crush reweight osd.27 2
sudo -u ceph ceph osd crush reweight osd.28 2
sudo -u ceph ceph osd crush reweight osd.29 2

Etc etc


-Original Message-
From: David C [mailto:dcsysengin...@gmail.com] 
Sent: maandag 3 september 2018 14:34
To: ceph-users
Subject: [ceph-users] Luminous new OSD being over filled

Hi all


Trying to add a new host to a Luminous cluster, I'm doing one OSD at a 
time. I've only added one so far but it's getting too full.

The drive is the same size (4TB) as all others in the cluster, all OSDs 
have crush weight of 3.63689. Average usage on the drives is 81.70%


With the new OSD I start with a crush weight 0 and steadily increase. 
It's currently crush weight 3.0 and is 94.78% full. If I increase to 
3.63689 it's going to hit too full. 


It's been a while since I've added a host to an existing cluster. Any 
idea why the drive is getting too full? Do I just have to leave this one 
with a lower crush weight and then continue adding the drives and then 
eventually even out the crush weights?

Thanks
David






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how to swap osds between servers

2018-09-03 Thread Andrei Mikhailovsky

Hello everyone, 

I am in the process of adding an additional osd server to my small ceph cluster 
as well as migrating from filestore to bluestore. Here is my setup at the 
moment: 

Ceph - 12.2.5 , running on Ubuntu 16.04 with latest updates 
3 x osd servers with 10x3TB SAS drives, 2 x Intel S3710 200GB ssd and 64GB ram 
in each server. The same servers are also mon servers. 

I am adding the following to the cluster: 
1 x osd+mon server with 64GB of ram, 2xIntel S3710 200GB ssds. 
Adding 4 x 6TB disks and 2x 3TB disks. 

Thus, the new setup will have the following configuration: 
4 x osd servers with 8x3TB SAS drives and 1x6TB SAS drive, 2 x Intel S3710 
200GB ssd and 64GB ram in each server. This will make sure that all servers 
have the same amount/capacity drives. There will be 3 mon servers in total. 

As a result, I will have to remove 2 x 3TB drives from the existing three osd 
servers and place them into the new osd server and add a 6TB drive into each 
osd server. As those 6 x 3TB drives which will be taken from the existing osd 
servers and placed to the new server will have the data stored on them, what is 
the best way to do this? I would like to minimise the data migration all over 
the place as it creates a havoc on the cluster performance. What is the best 
workflow to achieve the hardware upgrade? If I add the new osd host server into 
the cluster and physically take the osd disk from one server and place it in 
the other server, will it be recognised and accepted by the cluster? 

Thanks 

Andrei 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous new OSD being over filled

2018-09-03 Thread David C

Hi Marc

I like that approach although I think I'd go in smaller weight increments.

Still a bit confused by the behaviour I'm seeing, it looks like I've got
things weighted correctly. Redhat's docs recommend doing an OSD at a time
and I'm sure that's how I've done it on other clusters in the past although
they would have been running older versions.

Thanks,

On Mon, Sep 3, 2018 at 1:45 PM Marc Roos  wrote:

>
>
> I am adding a node like this, I think it is more efficient, because in
> your case you will have data being moved within the added node (between
> the newly added osd's there). So far no problems with this.
>
> Maybe limit your
> ceph tell osd.* injectargs --osd_max_backfills=X
> Because pg's being moved are taking space until the move is completed.
>
> sudo -u ceph ceph osd crush reweight osd.23 1 (all osd's in the node)
> sudo -u ceph ceph osd crush reweight osd.24 1
> sudo -u ceph ceph osd crush reweight osd.25 1
> sudo -u ceph ceph osd crush reweight osd.26 1
> sudo -u ceph ceph osd crush reweight osd.27 1
> sudo -u ceph ceph osd crush reweight osd.28 1
> sudo -u ceph ceph osd crush reweight osd.29 1
>
> And then after recovery
>
> sudo -u ceph ceph osd crush reweight osd.23 2
> sudo -u ceph ceph osd crush reweight osd.24 2
> sudo -u ceph ceph osd crush reweight osd.25 2
> sudo -u ceph ceph osd crush reweight osd.26 2
> sudo -u ceph ceph osd crush reweight osd.27 2
> sudo -u ceph ceph osd crush reweight osd.28 2
> sudo -u ceph ceph osd crush reweight osd.29 2
>
> Etc etc
>
>
> -Original Message-
> From: David C [mailto:dcsysengin...@gmail.com]
> Sent: maandag 3 september 2018 14:34
> To: ceph-users
> Subject: [ceph-users] Luminous new OSD being over filled
>
> Hi all
>
>
> Trying to add a new host to a Luminous cluster, I'm doing one OSD at a
> time. I've only added one so far but it's getting too full.
>
> The drive is the same size (4TB) as all others in the cluster, all OSDs
> have crush weight of 3.63689. Average usage on the drives is 81.70%
>
>
> With the new OSD I start with a crush weight 0 and steadily increase.
> It's currently crush weight 3.0 and is 94.78% full. If I increase to
> 3.63689 it's going to hit too full.
>
>
> It's been a while since I've added a host to an existing cluster. Any
> idea why the drive is getting too full? Do I just have to leave this one
> with a lower crush weight and then continue adding the drives and then
> eventually even out the crush weights?
>
> Thanks
> David
>
>
>
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to swap osds between servers

2018-09-03 Thread Ronny Aasen


On 03.09.2018 17:42, Andrei Mikhailovsky wrote:

Hello everyone,

I am in the process of adding an additional osd server to my small 
ceph cluster as well as migrating from filestore to bluestore. Here is 
my setup at the moment:


Ceph - 12.2.5 , running on Ubuntu 16.04 with latest updates
3 x osd servers with 10x3TB SAS drives, 2 x Intel S3710 200GB ssd and 
64GB ram in each server. The same servers are also mon servers.


I am adding the following to the cluster:
1 x osd+mon server with 64GB of ram, 2xIntel S3710 200GB ssds.
Adding 4 x 6TB disks and 2x 3TB disks.

Thus, the new setup will have the following configuration:
4 x osd servers with 8x3TB SAS drives and 1x6TB SAS drive, 2 x Intel 
S3710 200GB ssd and 64GB ram in each server. This will make sure that 
all servers have the same amount/capacity drives. There will be 3 mon 
servers in total.


As a result, I will have to remove 2 x 3TB drives from the existing 
three osd servers and place them into the new osd server and add a 6TB 
drive into each osd server. As those 6 x 3TB drives which will be 
taken from the existing osd servers and placed to the new server will 
have the data stored on them, what is the best way to do this? I would 
like to minimise the data migration all over the place as it creates a 
havoc on the cluster performance. What is the best workflow to achieve 
the hardware upgrade? If I add the new osd host server into the 
cluster and physically take the osd disk from one server and place it 
in the other server, will it be recognised and accepted by the cluster?


Data will migrate no matter how you change the crushmap.  since you want 
to migrate to bluestore this is also unavoidable.


if it is critical data, and you want to minimize impact, I prefer to do 
it the slow and steady way of adding a new bluestore drive to the new 
host, with weight 0 and gradually upping it's weight, while gradually 
lowering the weight of the filestore drive beeing removed.


a worse option if you do not have a drive to spare for that, is to 
gradually drain a drive, remove it from the cluster, move it over, zap 
and recreate as bluestore, and gradually fill it again. but this takes 
longer, and if you have space issues can be complicated.


an even worse option is to move the osd drive over, (with it's journal 
and  data), and have the cluster shuffle all the data around, this is a 
big impact.
And then you are still running filestore. so you still need to migrate 
to bluestore


kind regards
Ronny Aasen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

2018-09-03 Thread Marc Roos

 

Yes you are right. I had moved the fs_meta (16 pg's) to the ssd's. I had 
to check the crush rules, but that pool is only 200MB. Still puzzles me 
why ceph 'out of the box' is not distributing data more evenly.

I will try the balancer first thing, when remapping of the newly added 
node has finished.

-Original Message-
From: Jack [mailto:c...@jack.fr.eu.org] 
Sent: zondag 2 september 2018 15:53
To: Marc Roos; ceph-users
Subject: Re: [ceph-users] 3x replicated rbd pool ssd data spread across 
4 osd's

Well, you have more than one pool here

pg_num = 8, size = 3 -> 24 pgs
The extra 48 pgs comes from somewhere else

About the pg's distribution, check out the balancer module


tldr: that distribution is computed based on an algorithm, it is thus 
predictable (that is the point) but the perfect size-wise (there is no 
"central point" that coult take everything into account) The balancer 
module will do that: move pg around to get the best repartition

On 09/02/2018 03:14 PM, Marc Roos wrote:
> 
> So that changes the question to: why is ceph not distributing the pg's 

> evenly across four osd's?
> 
> [@c01 ~]# ceph osd df |egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.0  447G   133G   313G 29.81 0.70  16
> 20   ssd 0.48000  1.0  447G   158G   288G 35.40 0.83  19
> 21   ssd 0.48000  1.0  447G   208G   238G 46.67 1.10  20
> 30   ssd 0.48000  1.0  447G   149G   297G 33.50 0.79  17
> 
> rbd.ssd: pg_num 8 pgp_num 8
> 
> I will look into the balancer, but I am still curious why these 8 pg 
> (8x8=64? + 8? = 72) are still not spread evenly. Why not 18 on every 
> osd?
> 
> -Original Message-
> From: Jack [mailto:c...@jack.fr.eu.org]
> Sent: zondag 2 september 2018 14:06
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] 3x replicated rbd pool ssd data spread 
> across
> 4 osd's
> 
> ceph osd df will get you more information: variation & pg number for 
> each OSD
> 
> Ceph does not spread object on a per-object basis, but on a pg-basis
> 
> The data repartition is thus not perfect You may increase your pg_num, 

> and/or use the mgr balancer module
> (http://docs.ceph.com/docs/mimic/mgr/balancer/)
> 
> 
> On 09/02/2018 01:28 PM, Marc Roos wrote:
>>
>> If I have only one rbd ssd pool, 3 replicated, and 4 ssd osd's. Why 
>> are these objects so unevenly spread across the four osd's? Should 
>> they all not have 162G?
>>
>>
>> [@c01 ]# ceph osd status 2>&1
>> ++--+---+---++-++-+--
>> ++--+---+---++-++-+-
>> ++--+---+---++-++-+--
>> --+
>> | id | host |  used | avail | wr ops | wr data | rd ops | rd data |   

>> state   |
>> ++--+---+---++-++-+--
>> ++--+---+---++-++-+-
>> ++--+---+---++-++-+--
>> --+
>> | 19 | c01  |  133G |  313G |0   | 0   |0   | 0   | 
>> exists,up |
>> | 20 | c02  |  158G |  288G |0   | 0   |0   | 0   | 
>> exists,up |
>> | 21 | c03  |  208G |  238G |0   | 0   |0   | 0   | 
>> exists,up |
>> | 30 | c04  |  149G |  297G |0   | 0   |0   | 0   | 
>> exists,up |
>> ++--+---+---++-++-+--
>> ++--+---+---++-++-+-
>> ++--+---+---++-++-+--
>> --+
>>
>> All objects in the rbd pool are 4MB not? Should be easy to spread 
>> them
> 
>> evenly, what am I missing here?
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] data_extra_pool for RGW Luminous still needed?

2018-09-03 Thread Nhat Ngo

Hi all,


I am new to Ceph and we are setting up a new RadosGW and Ceph storage cluster 
on Luminous. We are using only EC for our `buckets.data` pool at the moment.


However, I just read the Red Hat Ceph object Gateway for Production article and 
it mentions an extra  duplicated `buckets.non-ec` pool is needed for multi-part 
uploads because each multi-upload parts must be stored without EC. EC will only 
apply to the whole objects, not partial uploads. Is this still hold true for 
Luminous?


The data layout document on Ceph does not make any mention of non-ec pool:

http://docs.ceph.com/docs/luminous/radosgw/layout/


Thanks,

Nhat Ngo | DevOps Engineer

Cloud Research Team, University of Melbourne, 3010, VIC
Email: nhat.n...@unimelb.edu.au
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No announce for 12.2.8 / available in repositories

2018-09-03 Thread Linh Vu

Version 12.2.8 seems broken. Someone earlier on the ML had a MDS issue. We 
accidentally upgraded an openstack compute node from 12.2.7 to 12.2.8 (librbd) 
and it caused all kinds of issues writing to the VM disks.

From: ceph-users  on behalf of Nicolas 
Huillard 
Sent: Sunday, 2 September 2018 7:31:08 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] No announce for 12.2.8 / available in repositories

Hi all,

I just noticed that 12.2.8 was available on the repositories, without
any announce. Since upgrading to unannounced 12.2.6 was a bad idea,
I'll wait a bit anyway ;-)
Where can I find info on this bugfix release ?
Nothing there : http://lists.ceph.com/pipermail/ceph-announce-ceph.com/

TIA

--
Nicolas Huillard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No announce for 12.2.8 / available in repositories

2018-09-03 Thread Dan van der Ster

I don't think those issues are known... Could you elaborate on your
librbd issues with v12.2.8 ?

-- dan

On Tue, Sep 4, 2018 at 7:30 AM Linh Vu  wrote:
>
> Version 12.2.8 seems broken. Someone earlier on the ML had a MDS issue. We 
> accidentally upgraded an openstack compute node from 12.2.7 to 12.2.8 
> (librbd) and it caused all kinds of issues writing to the VM disks.
>
> 
> From: ceph-users  on behalf of Nicolas 
> Huillard 
> Sent: Sunday, 2 September 2018 7:31:08 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] No announce for 12.2.8 / available in repositories
>
> Hi all,
>
> I just noticed that 12.2.8 was available on the repositories, without
> any announce. Since upgrading to unannounced 12.2.6 was a bad idea,
> I'll wait a bit anyway ;-)
> Where can I find info on this bugfix release ?
> Nothing there : 
> https://protect-au.mimecast.com/s/j2P5CoVzwKh2xA9Kh1Y3Ew?domain=lists.ceph.com
>
> TIA
>
> --
> Nicolas Huillard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] No announce for 12.2.8 / available in repositories

2018-09-03 Thread Linh Vu

We're going to reproduce this again in testing (12.2.8 drops right between our 
previous testing and going production) and compare it to 12.2.7. Will update 
with our findings soon. :)


From: Dan van der Ster 
Sent: Tuesday, 4 September 2018 3:41:01 PM
To: Linh Vu
Cc: nhuill...@dolomede.fr; ceph-users
Subject: Re: [ceph-users] No announce for 12.2.8 / available in repositories

I don't think those issues are known... Could you elaborate on your
librbd issues with v12.2.8 ?

-- dan

On Tue, Sep 4, 2018 at 7:30 AM Linh Vu  wrote:
>
> Version 12.2.8 seems broken. Someone earlier on the ML had a MDS issue. We 
> accidentally upgraded an openstack compute node from 12.2.7 to 12.2.8 
> (librbd) and it caused all kinds of issues writing to the VM disks.
>
> 
> From: ceph-users  on behalf of Nicolas 
> Huillard 
> Sent: Sunday, 2 September 2018 7:31:08 PM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] No announce for 12.2.8 / available in repositories
>
> Hi all,
>
> I just noticed that 12.2.8 was available on the repositories, without
> any announce. Since upgrading to unannounced 12.2.6 was a bad idea,
> I'll wait a bit anyway ;-)
> Where can I find info on this bugfix release ?
> Nothing there : http://lists.ceph.com/pipermail/ceph-announce-ceph.com/
>
> TIA
>
> --
> Nicolas Huillard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Slow requests from bluestore osds

Re: [ceph-users] MDS does not always failover to hot standby on reboot

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

[ceph-users] luminous 12.2.6 -> 12.2.7 active+clean+inconsistent PGs workaround (or wait for 12.2.8+ ?)

[ceph-users] Ceph Luminous - journal setting

Re: [ceph-users] Packages for debian in Ceph repo

Re: [ceph-users] Luminous RGW errors at start

Re: [ceph-users] Can't upgrade to MDS version 12.2.8

Re: [ceph-users] Luminous missing osd_backfill_full_ratio

[ceph-users] Luminous new OSD being over filled

Re: [ceph-users] Luminous new OSD being over filled

[ceph-users] how to swap osds between servers

Re: [ceph-users] Luminous new OSD being over filled

Re: [ceph-users] how to swap osds between servers

Re: [ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

[ceph-users] data_extra_pool for RGW Luminous still needed?

Re: [ceph-users] No announce for 12.2.8 / available in repositories

Re: [ceph-users] No announce for 12.2.8 / available in repositories

Re: [ceph-users] No announce for 12.2.8 / available in repositories

19 matches

Site Navigation

Mail list logo

Footer information