Hi Erwin,
Did you try and restart the primary osd for that pg (24) - sometimes it
needs a little ..nudge that way.
Otherwise what does ceph pg dump say about that pg?
Cheers,
Martin
On Thu, Sep 4, 2014 at 9:00 AM, Erwin Lubbers
wrote:
> Hi,
>
> My cluster is giving one stuck pg which seems t
Hi Dan,
We took a different approach (and our cluster is tiny compared to many
others) - we have two pools; normal and ssd.
We use 14 disks in each osd-server; 8 platter and 4 ssd for ceph, and 2 ssd
for OS/journals. We partitioned the two OS ssd as raid1 using about half
the space for the OS and
On Thu, Sep 4, 2014 at 10:23 PM, Dan van der Ster wrote:
> Hi Martin,
>
> September 4 2014 10:07 PM, "Martin B Nielsen" wrote:
> > Hi Dan,
> >
> > We took a different approach (and our cluster is tiny compared to many
> others) - we have two pools;
> &
Just echoing what Christian said.
Also, iirc the "currently waiting for subobs on [" could also mean a
problem on those as it waits for ack from them (I might remember wrong).
If that is the case you might want to check in on osd 13 & 37 as well.
With the cluster load and size you should not hav
Hi,
Or did you mean some OSD are near full while others are under-utilized?
On Sat, Sep 6, 2014 at 5:04 PM, Christian Balzer wrote:
>
> Hello,
>
> On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
>
> > Hello Cephers,
> >
> > We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most o
Hi,
I cannot recognize that picture; we've been using samsumg 840 pro in
production for almost 2 years now - and have had 1 fail.
We run a 8node mixed ssd/platter cluster with 4x samsung 840 pro (500gb) in
each so that is 32x ssd.
They've written ~25TB data in avg each.
Using the dd you had ins
the osds with Samsung journal drive compared with the Intel drive on the
> same server. Something like 2-3ms for Intel vs 40-50ms for Samsungs.
>
> At some point we had enough with Samsungs and scrapped them.
>
> Andrei
>
> --
>
> *From: *"
Hi,
Inside your mounted osd there is a symlink - journal - pointing to a file
or disk/partition used with it.
Cheers,
Martin
On Thu, May 7, 2015 at 11:06 AM, Patrik Plank wrote:
> Hi,
>
>
> i cant remember on which drive I install which OSD journal :-||
> Is there any command to show this?
>
Hi,
I'd just like to echo what Wolfgang said about ceph being a complex system.
I initially started out testing ceph with a setup much like yours. And
while it overall performed ok, it was not as good as sw raid on the same
machine.
Also, as Mark said you'll have at very best half write speeds b
Hi Jeff,
I would be surprised as well - we initially tested on a 2-replica cluster
with 8 nodes having 12 osd each - and went to 3-replica as we re-built the
cluster.
The performance seems to be where I'd expect it (doing consistent writes in
a rbd VM @ ~400MB/sec on 10GbE which I'd expect is eit
Hi Shain,
Those R515 seem to mimic our servers (2U supermicro w. 12x 3.5" bays and 2x
2.5" in the rear for OS).
Since we need a mix of SSD & platter we have 8x 4TB drives and 4x 500GB SSD
+ 2x 250GB SSD for OS in each node (2x 8-port LSI 2308 in IT-mode)
We've partitioned 10GB from each 4x 500GB
Hi Scott,
Just some observations from here.
We run 8 nodes, 2U units with 12x OSD each (4x 500GB ssd, 8x 4TB platter)
attached to 2x LSI 2308 cards. Each node uses an intel E5-2620 with 32G mem.
Granted, we only have like 25 VM (some fairly io-hungry, both iops and
throughput-wise though) on tha
Hi,
Plus reads will still come from your non-SSD disks unless you're using
something like flashcache in front and as Greg said, having much more IOPS
available for your db often makes a difference (depending on load, usage
etc ofc).
We're using Samsung Pro 840 256GB pretty much like Martin descri
Probably common sense but I was bitten by this once in a likewise
situation..
If you run 3x replica and distribute them over 3x hosts (is that default
now?) make sure that the disks on the host with the failed disk have space
for it - the remaining two disks will have to hold the content of the
fa
Hi,
I'd almost always go with more lesser beefy nodes than bigger ones. You're
much more vulnerable if the big one(s) die and replication will not impact
your cluster as much.
I also find it easier to extend a cluster with smaller nodes. At least it
feels like you can increase in more smooth rate
Hi,
We settled on Samsung pro 840 240GB drives 1½ year ago and we've been happy
so far. We've over-provisioned them a lot (left 120GB unpartitioned).
We have 16x 240GB and 32x 500GB - we've lost 1x 500GB so far.
smartctl states something like
Wear = 092%, Hours = 12883, Datawritten = 15321.83 TB
A bit late getting back on this one.
On Wed, Oct 1, 2014 at 5:05 PM, Christian Balzer wrote:
> > smartctl states something like
> > Wear = 092%, Hours = 12883, Datawritten = 15321.83 TB avg on those. I
> > think that is ~30TB/day if I'm doing the calc right.
> >
> Something very much does not ad
Hi Luis,
I might remember wrong, but don't you need to actually create the osd
first? (ceph osd create)
Then you can use assign it a position using cli crushrules.
Like Jason said, can you send the ceph osd tree output?
Cheers,
Martin
On Mon, Jan 12, 2015 at 1:45 PM, Luis Periquito wrote:
>
Hi,
You didn't state what version of ceph or kvm/qemu you're using. I think it
wasn't until qemu 1.5.0 (1.4.2+?) that an async patch from inktank was
accepted into mainstream which significantly helps in situations like this.
If not using that on top of not limiting recovery threads you'll prob.
Hi,
At least it used to be like that - I'm not sure if that has changed. I
believe this is also part why it is adviced to go with the same kind of hw
and setup if possible.
Since at least rbd images are spread in objects throughout the cluster
you'll prob. have to wait for a slow disk when readin
Hi,
I would prob. start by figuring out exactly what pg are stuck unclean.
You can do 'ceph pg dump | grep unclean' to get that info - then if your
theory holds you should be able to verify the disk(s) in question.
I cannot see any _too_full so am curious what could be the cause.
You can also a
Hi Pavel,
Will try and answer some of your questions:
My first question will be about monitor data directory. How much space I
> need to reserve for it? Can monitor-fs be corrupted if monitor goes out of
> storage space?
>
We have about 20GB partitions for monitors - they really don't use much
s
Hi,
You can't form quorom with your monitors on cuttlefish if you're mixing <
0.61.5 with any 0.61.5+ ( https://ceph.com/docs/master/release-notes/ ) =>
section about 0.61.5.
I'll advice installing pre-0.61.5, form quorom and then upgrade to 0.61.9
(if needs be) - and then latest dumpling on top.
Hi,
I'd probably start by looking at your nodes and check if the SSDs are
saturated or if they have high write access times. If any of that is true,
does that account for all SSD or just some of them? Maybe some of the disks
needs a trim. Maybe test them individually directly on the cluster.
If y
x27;ll try accessing the ticket monday to get all
the details if it is still there.
Cheers,
Martin
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
> On Fri, Mar 7, 2014 at 6:10 PM, Martin B Nielsen wrote:
>
>> Hi,
>>
>> I'd probably start b
Hi,
I can see ~17% hardware interrupts which I find a little high - can you
make sure all load is spread over all your cores (/proc/interrupts)?
What about disk util once you restart them? Are they all 100% utilized or
is it 'only' mostly cpu-bound?
Also you're running a monitor on this node - h
Hi,
I experienced this from time to time with older releases of ceph, but
haven't stumbled upon it for some time.
Often I had to revert to the older state by using:
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
and dump the monlist, find
Hi,
I can see you're running mon, mds and osd on the same server.
Also, from a quick glance you're using around 13GB resident memory.
If you only have 16GB in your system I'm guessing you'll be swapping about
now (or close). How much mem does the system hold?
Also, how busy are the disks? Or is
Hi,
We're running mysql in multi-master cluster (galera), mysql standalones,
postgresql, mssql and oracle db's on ceph RBD via QEMU/KVM. As someone else
pointed out it is usually faster with ceph, but sometimes you'll get some
odd slow reads.
Latency is our biggest enemy.
Oracle comes with an aw
First off, congrats to inktank!
I'm sure having Redhat backing the project it will see even quicker
development.
My only worry is support for future non-RHEL platforms; like many others
we've built our ceph stack around ubuntu and I'm just hoping it won't
deteriorate into something like how it is
Hi,
I experienced exactly the same with 14.04 and the 0.79 release.
It was a fresh clean install with default crushmap and ceph-deploy install
as pr. the quick-start guide.
Oddly enough changing replica size (incl min_size) from 3 - 2 (and 2->1)
and back again it worked.
I didn't have time to l
ERR] 1.73c missing primary copy of
> 9d7a673c/11b30 6c./head//1,
> unfound
>
>
> Summary: pg wont repair... what do u suggest
>
>
> Regards,
> Femi.
>
>
> On Fri, Feb 22, 2013 at 1:26 PM, Martin B Nielsen wr
Hi,
We did the opposite here; adding some SSD in free slots after having a
normal cluster running with SATA.
We just created a new pool for them and separated the two types. I
used this as a template:
http://ceph.com/docs/master/rados/operations/crush-map/?highlight=ssd#placing-different-pools-on
Hi Charles,
http://ceph.com/docs/master/rados/configuration/ceph-conf/#ceph-runtime-config
has a great example.
For all daemons of a type use * ( ceph osd tell \* injectargs
'--debug-osd 20 --debug-ms 1' )
More about loglevels here:
http://ceph.com/docs/master/rados/configuration/ceph-conf/#logs
Hi Ashish,
Yep, that would be the correct way to do it.
If you already have a cluster running, a ceph -s will also show usage, ie
like:
>ceph -s
pgmap v1842777: 8064 pgs: 8064 active+clean; 1069 GB data, 2144 GB used,
7930 GB / 10074 GB avail; 3569B/s wr, 0op/s
This is a small test-cluster with
Hi Kakito,
You def. _want_ scrubbing to happen!
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
If you feel it kills your system you can tweak some of the values; like:
osd scrub load threshold
osd scrub max interval
osd deep scrub interval
I have no experience in chan
Hi Bryan,
I asked the same question a few months ago:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-February/000221.html
But basically, that is pretty bad; you'll be stuck on your own and
would need to get in contact with Inktank - they might be able to help
rebuild a monitor for you.
Hi,
We're using ceph 10.2.5 and cephfs.
We had a weird monitor (mon0r0) which had some sort of meltdown as current
active mds node.
The monitor node called elections on/off over ~1 hour, sometimes with
5-10min between.
On every occasion mds was also doing a replay, reconnect, rejoin => active
(
38 matches
Mail list logo