I've had a 4 node ceph cluster working well for month.
This weekend I added a 5th node to the cluster and after many hours of
rebalancing I have the following warning:
HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck
unclean
But, my big problem is that the cluster is
This is the same issue as yesterday, but I'm still searching for a
solution. We have a lot of data on the cluster that we need and can't
get to it reasonably (It took over 12 hours to export a 2GB image).
The only thing that status reports as wrong is:
health HEALTH_WARN 1 pgs incomplete;
Thanks! I tried restarting osd.11 (the primary osd for the incomplete pg) and
that helped a LOT. We went from 0/1 op/s to 10-800+ op/s!
We still have "HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck
unclean", but at least we can
use our cluster :-)
ceph pg dump_stuck inactive
OK - so while things are definitely better, we still are not where we
were and "rbd ls -l" still hangs.
Any suggestions?
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Greg,
Thanks for the hints. I looked through the logs and found OSD's
with RETRY's. I marked those "out" (marked in orange) and let ceph
rebalance. Then I ran the bench command.
I now have many more errors than before :-(.
health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 151
After more than a week of trying to restore our cluster I've given up.
I'd like to reset the data, metadata and rbd pools to their initial clean
states (wiping out all data). Is there an easy way to do this? I tried
deleting and adding pools, but still have:
health HEALTH_WARN 32 pgs
Hi,
I have a 5 node ceph cluster that is running well (no problems using
any of the
rbd images and that's really all we use).
I have replication set to 3 on all three pools (data, metadata and rbd).
"ceph -s" reports:
health HEALTH_WARN 3 pgs degraded;
Thanks for the suggestion. I had tried stopping each OSD for 30
seconds, then restarting it, waiting 2 minutes and then doing the next
one (all OSD's eventually restarted). I tried this twice.
--
___
ceph-users mailing list
ceph-users@lists.ceph.co
Hi,
The activity on our ceph cluster has gone up a lot. We are using exclusively
RBD
storage right now.
Is there a tool/technique that could be used to find out which rbd images are
receiving the most activity (something like "rbdtop")?
Thanks,
Jeff
--
___
;
> On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow wrote:
> > Thanks for the suggestion. I had tried stopping each OSD for 30 seconds,
> > then restarting it, waiting 2 minutes and then doing the next one (all OSD
02:41:11PM -0700, Samuel Just wrote:
> Are you using any kernel clients? Will osds 3,14,16 be coming back?
> -Sam
>
> On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow wrote:
> > Sam,
> >
> > I've attached both files.
> >
> > Thanks!
> >
Sam,
Thanks that did it :-)
health HEALTH_OK
monmap e17: 5 mons at
{a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0},
election epoch 9794, quorum 0,1,2,3,4 a,b,c,d,e
osdmap e23445: 14 osds: 13 up, 13 in
pgmap v1355
ncing everything is working fine :-)
(ceph auth del osd.x ; ceph osd crush rm osd.x ; ceph osd rm osd.x).
Jeff
On Wed, Aug 14, 2013 at 01:54:16PM -0700, Gregory Farnum wrote:
> On Thu, Aug 1, 2013 at 9:57 AM, Jeff Moskow wrote:
> > Greg,
> >
> > Thanks for the hints.
Hi,
When we rebuilt our ceph cluster, we opted to make our rbd storage
replication level 3 rather than the previously
configured replication level 2.
Things are MUCH slower (5 nodes, 13 osd's) than before even though most
of our I/O is read. Is this to be expected?
What are th
00, Sage Weil wrote:
> On Sat, 17 Aug 2013, Jeff Moskow wrote:
> > Hi,
> >
> > When we rebuilt our ceph cluster, we opted to make our rbd storage
> > replication level 3 rather than the previously configured replication
> > level 2.
> >
> > T
Hi,
More information. If I look in /var/log/ceph/ceph.log, I see 7893 slow
requests in the last 3 hours of which 7890 are from osd.4. Should I
assume a bad drive? I SMART says the drive is healthy? Bad osd?
Thanks,
Jeff
--
___
ceph
Martin,
Thanks for the confirmation about 3-replica performance.
dmesg | fgrep /dev/sdb # returns no matches
Jeff
--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
17 matches
Mail list logo