Hi,
I'm extremely new to ceph and have a small 4-node/20-osd cluster.
I just upgraded from kraken to luminous without much ado, except now when I
run ceph status, I get a health_warn because "2 osds exist in the crush map
but not in the osdmap"
Googling the error message only took me to the sour
d the easiest way to remove
> it.
>
> On Tue, Jun 27, 2017, 12:12 PM Daniel K wrote:
>
>> Hi,
>>
>> I'm extremely new to ceph and have a small 4-node/20-osd cluster.
>>
>> I just upgraded from kraken to luminous without much ado, except now when
>
Is there anywhere that details the various compression settings for
bluestore backed pools?
I can see compression in the list of options when I run ceph osd pool set,
but can't find anything that details what valid settings are.
I've tried discovering the options via the command line utilities an
Hi,
As mentioned in my previous emails, I'm extremely new to ceph, so please
forgive my lack of knowledge.
I'm trying to find a good way to mount ceph rbd images for export by
LIO/targetcli
rbd-nbd isn't good as it stops at 16 block devices (/dev/nbd0-15)
kernel rbd mapping doesn't have support
thank you!
On Wed, Jun 28, 2017 at 11:48 AM, Mykola Golub wrote:
> On Tue, Jun 27, 2017 at 07:17:22PM -0400, Daniel K wrote:
>
> > rbd-nbd isn't good as it stops at 16 block devices (/dev/nbd0-15)
>
> modprobe nbd nbds_max=1024
>
> Or, if nbd module is loaded by rb
Once again my google-fu has failed me and I can't find the 'correct' way to
map an rbd using rbd-nbd on boot. Everything takes me to rbdmap, which
isn't using rbd-nbd.
If someone could just point me in the right direction I'd appreciated it.
Thanks!
Dan
_
Luminous 12.1.0(RC)
I replaced two OSD drives(old ones were still good, just too small), using:
ceph osd out osd.12
ceph osd crush remove osd.12
ceph auth del osd.12
systemctl stop ceph-osd@osd.12
ceph osd rm osd.12
I later found that I also should have unmounted it from /var/lib/ceph/osd-12
(r
I am in the process of doing exactly what you are -- this worked for me:
1. mount the first partition of the bluestore drive that holds the missing
PGs (if it's not already mounted)
> mkdir /mnt/tmp
> mount /dev/sdb1 /mnt/tmp
2. export the pg to a suitable temporary storage location:
> ceph-obje
arnum wrote:
>
> On Fri, Jul 21, 2017 at 10:23 PM Daniel K wrote:
>
>> Luminous 12.1.0(RC)
>>
>> I replaced two OSD drives(old ones were still good, just too small),
>> using:
>>
>> ceph osd out osd.12
>> ceph osd crush remove osd.12
>> ceph au
List --
I have a 4-node cluster running on baremetal and have a need to use the
kernel client on 2 nodes. As I read you should not run the kernel client on
a node that runs an OSD daemon, I decided to move the OSD daemons into a VM
on the same device.
Orignal host is stor-vm2(bare metal), new hos
I did some bad things to my cluster, broke 5 OSDs and wound up with 1
unfound object.
I mounted one of the OSD drives, used ceph-objectstore-tool to find and
exported the object:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10
162.001c0ed4 get-bytes filename.obj
What's the b
98180 7f25a88af700 1 -- 10.0.15.142:6800/16150
<http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0
<http://10.0.15.51:6789/0> 9 mon_command_ack([{"prefix": "osd crush
set-device-class", "class": "hdd", "ids": ["7&quo
sd.7
Still reading and learning.
On Tue, Jul 25, 2017 at 2:38 PM, Daniel K wrote:
> Update to this -- I tried building a new host and a new OSD, new disk, and
> I am having the same issue.
>
>
>
> I set osd debug level to 10 -- the issue looks like it's coming from a
d extracted
previously)
# rados -p cephfs_data put 162.001c0ed4 162.001c0ed4.obj
Still have more recovery to do but this seems to have fixed my unfound
object problem.
On Tue, Jul 25, 2017 at 12:54 PM, Daniel K wrote:
> I did some bad things to my cluster, broke 5 OSDs and wo
Does the client track which OSDs are reachable? How does it behave if some
are not reachable?
For example:
Cluster network with all OSD hosts on a switch.
Public network with OSD hosts split between two switches, failure domain is
switch.
copies=3 so with a failure of the public switch, 1 copy w
All 3 of my mons crashed while I was adding OSDs and now error out with:
(/build/ceph-12.1.1/src/mon/OSDMonitor.cc: 3018: FAILED
assert(osdmap.get_up_osd_features() & CEPH_FEATURE_MON_STATEFUL_SUB)
I've resorted to just rebuilding the mon DB and making 3 new mon daemons,
using the steps here:
h
I finally figured out how to get the ceph-monstore-tool (compiled from
source) and am ready to attemp to recover my cluster.
I have one question -- in the instructions,
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/
under Recovery from OSDs, Known limitations:
->
Are there any client-side options to encrypt an RBD device?
Using latest luminous RC, on Ubuntu 16.04 and a 4.10 kernel
I assumed adding client site encryption would be as simple as using
luks/dm-crypt/cryptsetup after adding the RBD device to /etc/ceph/rbdmap
and enabling the rbdmap service --
Awesome -- I searched and all I could find was restricting access at the
pool level
I will investigate the dm-crypt/RBD path also.
Thanks again!
On Thu, Aug 24, 2017 at 7:40 PM, Alex Gorbachev
wrote:
>
> On Mon, Aug 21, 2017 at 9:03 PM Daniel K wrote:
>
>> Are there any clie
Just curious why it wouldn't work as long as the IPs were reachable? Is
there something going on in layer 2 with Ceph that wouldn't survive a trip
across a router?
On Wed, Aug 30, 2017 at 1:52 PM, David Turner wrote:
> ALL OSDs need to be running the same private network at the same time.
> AL
I'm trying to understand why adding OSDs would cause pgs to go inactive.
This cluster has 88 OSDs, and had 6 OSD with device class "hdd_10TB_7.2k"
I added two more OSDs, set the device class to "hdd_10TB_7.2k" and 10% of
pgs went inactive.
I have an EC pool on these OSDs with the profile:
user@a
n Tue, Dec 19, 2017 at 8:57 PM, Daniel K wrote:
> I'm trying to understand why adding OSDs would cause pgs to go inactive.
>
> This cluster has 88 OSDs, and had 6 OSD with device class "hdd_10TB_7.2k"
>
> I added two more OSDs, set the device class to "hdd_10TB_7.2
m one chassis to another (there was no data on it
> yet).
>
> I tried restarting OSD's to no avail.
>
> Couldn't find anything about the stuck in "activating+remapped" state so
> in the end i threw away the pool and started over.
>
> Could this be a bug i
are all 3 nics in the same bond together?
I don't think bonding nics of various speeds is a great idea,
How are you separating the ceph traffic into the individual nics?
On Fri, Jul 6, 2018 at 7:04 AM, John Spray wrote:
> On Fri, Jul 6, 2018 at 11:10 AM Marcus Haarmann
> wrote:
> >
> > Hi e
There's been quite a few VMWare/Ceph threads on the mailing list in the
past.
One setup I've been toying with is a linux guest running on the vmware host
on local storage, with the guest mounting a ceph RBD with a filesystem on
it, then exporting that via NFS to the VMWare host as a datastore.
Ex
I had a similar problem with some relatively underpowered servers (2x
E5-2603 6 core 1.7ghz no HT, 12-14 2TB OSDs per server, 32Gb RAM)
There was a process on a couple of the servers that would hang and chew up
all available CPU. When that happened, I started getting scrub errors on
those servers.
Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs
enroute.
I was "beating up" on the cluster, and had been writing to a 6TB file in
CephFS for several hours, during which I changed the crushmap to better
match my environment, generating a bunch of recovery IO. After about 5.
nt to waste anyone's time on a wild goose
chase.
On Wed, May 24, 2017 at 6:15 AM, John Spray wrote:
> On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote:
> > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs
> > enroute.
> >
> > I
production the pieces will be separated.
On Wed, May 24, 2017 at 4:55 PM, Gregory Farnum wrote:
> On Wed, May 24, 2017 at 3:15 AM, John Spray wrote:
> > On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote:
> >> Have a 20 OSD cluster -"my first ceph cluster&q
Hi,
I see several mentions that compression is available in Kraken for
bluestore OSDs, however, I can find almost nothing in the documentation
that indicates how to use it.
I've found:
- http://docs.ceph.com/docs/master/radosgw/compression/
- http://ceph.com/releases/v11-2-0-kraken-released/
I'm
I've built 'my-first-ceph-cluster' with two of the 4-node, 12 drive
Supermicro servers and dual 10Gb interfaces(one cluster, one public)
I now have 9x 36-drive supermicro StorageServers made available to me, each
with dual 10GB and a single Mellanox IB/40G nic. No 1G interfaces except
IPMI. 2x 6-c
o waste. Might be worth getting a second IB
card for each server.
Again, thanks a million for the advice. I'd rather learn this the easy way
than to have to rebuild this 6 times over the next 6 months.
On Tue, Jun 6, 2017 at 2:05 AM, Christian Balzer wrote:
>
> Hello,
>
> l
I started down that path and got so deep that I couldn't even find where I
went in. I couldn't make heads or tails out of what would or wouldn't work.
We didn't need multiple hosts accessing a single datastore, so on the
client side I just have a single VM guest running on each ESXi hosts, with
th
12.2.5 on Proxmox cluster.
6 nodes, about 50 OSDs, bluestore and cache tiering on an EC pool. Mostly
spinners with an SSD OSD drive and an SSD WAL DB drive on each node. PM863
SSDs with ~75%+ endurance remaning.
Has been running relatively okay besides some spinner failures until I
checked today
Did you ever get anywhere with this?
I have 6 OSDs out of 36 continuously flapping with this error in the logs.
Thanks,
Dan
On Fri, Jun 8, 2018 at 11:10 AM Caspar Smit wrote:
> Hi all,
>
> Maybe this will help:
>
> The issue is with shards 3,4 and 5 of PG 6.3f:
>
> LOG's of OSD's 16, 17 & 36
I'm hitting this same issue on 12.2.5. Upgraded one node to 12.2.10 and it
didn't clear.
6 OSDs flapping with this error. I know this is an older issue but are
traces still needed? I don't see a resolution available.
Thanks,
Dan
On Wed, Sep 6, 2017 at 10:30 PM Brad Hubbard wrote:
> These erro
56 OSD, 6-node 12.2.5 cluster on Proxmox
We had multiple drives fail(about 30%) within a few days of each other,
likely faster than the cluster could recover.
After the dust settled, we have 2 out of 896 pgs stuck inactive. The failed
drives are completely inaccessible, so I can't mount them and
I bought the wrong drives trying to be cheap. They were 2TB WD Blue 5400rpm
2.5 inch laptop drives.
They've been replace now with HGST 10K 1.8TB SAS drives.
On Sat, Mar 2, 2019, 12:04 AM wrote:
>
>
> Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com <
> satha...@gmail.com>:
>
> 56 O
OS
>
>
> Saturday, 2 March 2019, 14.34 +0100 from Daniel K :
>
> I bought the wrong drives trying to be cheap. They were 2TB WD Blue
> 5400rpm 2.5 inch laptop drives.
>
> They've been replace now with HGST 10K 1.8TB SAS drives.
>
>
>
> On Sat, Mar 2, 2019,
eph osd
> force-create-pg " to reset the PGs instead.
> Data will obviously be lost afterwards.
>
> Paul
>
> >
> > On Sat, Mar 2, 2019 at 6:08 AM Daniel K wrote:
> >>
> >> They all just started having read errors. Bus resets. Slow reads. Which
>
Have some friends I set up a Ceph cluster for use with PVE a few years ago.
It wasn't maintained and is now in bad shape.
They've reached out to me for help, but I do not have the time to assist
right now.
Is there anyone on the list that would be willing to help? As a
professional service of cou
41 matches
Mail list logo