Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x OSD
Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
with 2 nodes running correctly, 2 hours later a second OSD node failed
complaining
Hi all,
I just noticed that 12.2.8 was available on the repositories, without
any announce. Since upgrading to unannounced 12.2.6 was a bad idea,
I'll wait a bit anyway ;-)
Where can I find info on this bugfix release ?
Nothing there : http://lists.ceph.com/pipermail/ceph-announce-ceph.com/
TIA
If I have only one rbd ssd pool, 3 replicated, and 4 ssd osd's. Why are
these objects so unevenly spread across the four osd's? Should they all
not have 162G?
[@c01 ]# ceph osd status 2>&1
++--+---+---++-++-+-
--+
| id | host | used | a
Hello,
I'm trying to understand the output of the dump_historic_ops admin sock
command.
I can't find information on what are the meaning of the different states
that an OP can be in.
For example, in the following excerpt:
{
"description": "MOSDPGPush(1.a5 421/239
[PushOp(1:a
Does "ceph health detail" work?
Have you manually confirmed the OSDs on the nodes are working?
What was the replica size of the pools?
Are you seeing any progress with the recovery?
On Sun, Sep 2, 2018 at 9:42 AM Lee wrote:
> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is
ceph osd df will get you more information: variation & pg number for
each OSD
Ceph does not spread object on a per-object basis, but on a pg-basis
The data repartition is thus not perfect
You may increase your pg_num, and/or use the mgr balancer module
(http://docs.ceph.com/docs/mimic/mgr/balance
There should be useful logs from ceph-volume in
/var/log/ceph/ceph-volume.log that might show a bit more here.
I would also try the command that fails directly on the server (sans
ceph-deploy) to see what is it that is actually failing. Seems like
the ceph-deploy log output is a bit out of order (
On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
wrote:
> Hi Cephers,
> I am in the process of upgrading a cluster from Filestore to bluestore,
> but I'm concerned about frequent warnings popping up against the new
> bluestore devices. I'm frequently seeing messages like this, although the
> sp
Hello there,
I've tried setting up some MDS VMs with version 12.2.8 but they are
unable to replay which appears to be caused by an error on the monitors.
2018-09-01 18:52:39.101001 7fb7c4c4f700 1 mon.mon2@1(peon).mds e3320
mds mds.? 10.14.4.62:6800/605610442 can't write to fsmap
compat={},rocom
So that changes the question to: why is ceph not distributing the pg's
evenly across four osd's?
[@c01 ~]# ceph osd df |egrep '^19|^20|^21|^30'
19 ssd 0.48000 1.0 447G 133G 313G 29.81 0.70 16
20 ssd 0.48000 1.0 447G 158G 288G 35.40 0.83 19
21 ssd 0.48000 1.0 4
The warnings look like this.
6 ops are blocked > 32.768 sec on osd.219
1 osds have slow requests
On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote:
> On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
> wrote:
> > Hi Cephers,
> > I am in the process of upgrading a cluster from Filestore to blue
Well, you have more than one pool here
pg_num = 8, size = 3 -> 24 pgs
The extra 48 pgs comes from somewhere else
About the pg's distribution, check out the balancer module
tldr: that distribution is computed based on an algorithm, it is thus
predictable (that is the point) but the perfect size-
When the first node went offline with a dead SSD journal, all of the dates
on the OSDs was useless. Unless you could flush the journals, you can't
guarantee that a wire the cluster think happened actually made it to the
disk. The proper procedure here is to remove those OSDs and add them again
as
>
>
> Hi David,
>
> Yes heath detail outputs all the errors etc and recovery / backfill is
> going on, just taking time 25% misplaced and 1.5 degraded.
>
> I can list out the pools and see sizes etc..
>
> My main problem is I have no client IO from a read perspective, I cannot
> start vms I'm opens
Hi David,
Yes heath detail outputs all the errors etc and recovery / backfill is
going on, just taking time 25% misplaced and 1.5 degraded.
I can list out the pools and see sizes etc..
My main problem is I have no client IO from a read perspective, I cannot
start vms I'm openstack and ceph -w st
I followed:
$ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid)
$ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal'
--partition-guid=1:$journal_uuid
--typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk
Then
$ sudo ceph-osd --mkjournal -i 20
$ sudo serv
The problem is with never getting a successful run of `ceph-osd
--flush-journal` on the old SSD journal drive. All of the OSDs that used
the dead journal need to be removed from the cluster, wiped, and added back
in. The data on them is not 100% consistent because the old journal died.
Any word tha
Should I just out the OSD's first or completely zap them and recreate? Or
delete and let the cluster repair itself?
On the second node when it started back up I had problems with the Journals
for ID 5 and 7 they were also recreated all the rest are still the
originals.
I know that some PG's are o
Ah, ceph-volume.log pointed out the actual problem:
RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
or an existing device is needed
When I changed "--data /dev/storage/bluestore" to "--data
storage/bluestore", everything worked fine.
I agree that the ceph-deploy logs are a
Ok, rather than going gunhoe at this..
1. I have set out, 31,24,21,18,15,14,13,6 and 7,5 (10 is a new OSD)
Which gives me
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 23.65970 root default
-5 8.18990 host data33-a4
13 0.90999 osd.13 up0
Hi,
I follow and use your articles regularly to help with our Ceph environment,
I am looking for urgent help with our infrastructure after a series of
outages over the weekend has ground our ceph environment to its knees.
The system is .0.94.5 and deployed as part of open stack.
In the series of
On 02.09.2018 17:12, Lee wrote:
Should I just out the OSD's first or completely zap them and recreate?
Or delete and let the cluster repair itself?
On the second node when it started back up I had problems with the
Journals for ID 5 and 7 they were also recreated all the rest are
still the or
Agreed on not going the disks until your cluster is healthy again. Making
them out and seeing how healthy you can get in the meantime is a good idea.
On Sun, Sep 2, 2018, 1:18 PM Ronny Aasen wrote:
> On 02.09.2018 17:12, Lee wrote:
> > Should I just out the OSD's first or completely zap them and
Hey there,
So I now have a problem since none of my MDSes can start anymore.
They are stuck in the resolve state since Ceph things there are still MDSes
alive which I can see when I run:
ceph mds deactivate k8s:0
Error EEXIST: mds.4:0 not active (???)
ceph mds deactivate k8s:1
Error EEXIST: mds.
Hi Ronni,
Am 2. September 2018 13:32:05 MESZ schrieb Ronnie Lazar
:
>Hello,
>
>I'm trying to understand the output of the dump_historic_ops admin sock
>command.
>I can't find information on what are the meaning of the different
>states
>that an OP can be in.
>For example, in the following excerpt
On Sun, Sep 2, 2018 at 12:00 PM, David Wahler wrote:
> Ah, ceph-volume.log pointed out the actual problem:
>
> RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
> or an existing device is needed
That is odd, is it possible that the error log wasn't the one that
matched what y
On Sun, Sep 2, 2018 at 1:31 PM Alfredo Deza wrote:
>
> On Sun, Sep 2, 2018 at 12:00 PM, David Wahler wrote:
> > Ah, ceph-volume.log pointed out the actual problem:
> >
> > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path
> > or an existing device is needed
>
> That is odd, i
Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages for
Debian? I'm not seeing any, but maybe I'm missing something...
I'm seeing ceph-deploy install an older version of ceph on the nodes (from the
Debian repo) and then failing when I run "ceph-deploy osd ..." because ceph-
v
On Mon, Sep 3, 2018 at 1:57 AM Marlin Cremers
wrote:
>
> Hey there,
>
> So I now have a problem since none of my MDSes can start anymore.
>
> They are stuck in the resolve state since Ceph things there are still MDSes
> alive which I can see when I run:
>
need mds log to check why mds are stuck
29 matches
Mail list logo