We've encountered this problem a lot. As far as I know the best practice should
be making the distribution of PG across OSDs as even as you can after you
create the pool and before you write any data.
1. the disk utilization = (PGs per OSD) * (files per PG). Ceph is good at
making (files per PG
Hi,
I want to collect linux kernel rbd log, I know it use the dout() method to
debug after I read the linux kernel code,
But how to enable it and where I can find it?
Thanks
Jian Li
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.cep
On 11/19/2014 06:51 PM, Jay Janardhan wrote:
Can someone help me what I can tune to improve the performance? The
cluster is pushing data at about 13 MB/s with a single copy of data
while the underlying disks can push 100+MB/s.
Can anyone help me with this?
*rados bench results:*
Concurrency R
Can someone help me what I can tune to improve the performance? The cluster
is pushing data at about 13 MB/s with a single copy of data while the
underlying disks can push 100+MB/s.
Can anyone help me with this?
*rados bench results:*
Concurrency Replication size Write(MB/s) Seq Read(
Hi
I'm testing a Giant cluster. There are 6 OSD on 3 virtual machines. One
OSD is marked down and out. The process still exists, it is in
uninterruptible sleep. It has stopped logging.
I've uploaded what I think are relevant fragments of the log to
pastebin: http://pastebin.com/Y42GvGjr
Ca
After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode.
Sone are in the unclean and others are in the stale state. Somehow the MDS is
also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read
through the documentation and on the web but no luck so far.
p
You don't really need to do much. There are some "ceph mds" commands
that let you clean things up in the MDSMap if you like, but moving an
MDS essentially it boils down to:
1) make sure your new node has a cephx key (probably for a new MDS
entity named after the new host, but not strictly necessary
Well, the heartbeats are failing due to networking errors preventing
the heartbeats from arriving. That is causing osds to go down, and
that is causing pgs to become degraded. You'll have to work out what
is preventing the tcp connections from being stable.
-Sam
AM: Sam, I will start the ne
Well, the heartbeats are failing due to networking errors preventing
the heartbeats from arriving. That is causing osds to go down, and
that is causing pgs to become degraded. You'll have to work out what
is preventing the tcp connections from being stable.
-Sam
On Wed, Nov 19, 2014 at 1:39 PM,
>You indicated that osd 12 and 16 were the ones marked down, but it
>looks like only 0,1,2,3,7 were marked down in the ceph.log you sent.
>The logs for 12 and 16 did indicate that they had been partitioned
>from the other nodes. I'd bet that you are having intermittent
>network trouble since
I think these numbers are about what is expected. You could try a couple
things to improve it, but neither of them are common:
1) increase the number of PGs (and pgp_num) a lot more. I you decide to
experiment with this, watch your CPU and memory numbers carefully.
2) try to correct for the inequ
Hi Robert,
an improvement to your checks could be the addition of check
parameters (instead of using hard coded values for warn and crit) so
that someone can change their values in main.mk. Hope to find some
time soon and send you a PR about it. Nice job btw!
On 19 November 2014 18:23, Robert Sand
Making mds cache size 5 million seems to have helped significantly, but we’re
still seeing issues occasionally on metadata reads while under load. Settings
over 5 million don’t seem to have any noticeable impact on this problem. I’m
starting the upgrade to Giant today.
--
Kevin Sumner
ke...@su
Hi,
I noticed that the docs [1] on adding and removing an MDS are not yet
written...
[1] https://ceph.com/docs/master/rados/deployment/ceph-deploy-mds/
I would like to do exactly that, however. I have an MDS on one machine,
but I'd like a faster machine to take over instead. In fact, It would be
Hi,
On 14.11.2014 11:38, Nick Fisk wrote:
> I've just been testing your ceph check and I have made a small modification
> to allow it to adjust itself to suit the autoscaling of the units Ceph
> outputs.
Thanks for the feedback. I took your idea, added PB and KB, and pushed
it to github again:
Hi, I am using radosgw, I have a large number of connections with huge
files, the memory usage keeps getting higher until killed by the
kernel, I *think* because of buffering for each request.
Is the a way to limit the buffer size for each object (or for each
connection)? and What do you suggest?
logs/andrei » grep failed ceph.log.7
2014-11-12 01:37:19.857143 mon.0 192.168.168.13:6789/0 969265 :
cluster [INF] osd.3 192.168.168.200:6818/26170 failed (3 reports from
3 peers after 22.000550 >= grace 20.995772)
2014-11-12 01:37:21.176073 mon.0 192.168.168.13:6789/0 969287 :
cluster [INF] osd.0
Hi Massimiliano,
On Tue, Nov 18, 2014 at 5:23 PM, Massimiliano Cuttini
wrote:
> Then.
> ...very good! :)
>
> Ok, the next bad thing is that I have installed GIANT on Admin node.
>
However ceph-deploy ignore ADMIN node installation and install FIREFLY.
> Now i have ceph-deploy of Giant on my
Hi,
I rebooted a node (I'm doing some tests, and breaking many things ;) ), I see I
have :
[root@ceph0 ~]# mount|grep sdp1
/dev/sdp1 on /var/lib/ceph/tmp/mnt.eml1yz type xfs
(rw,noatime,attr2,inode64,noquota)
/dev/sdp1 on /var/lib/ceph/osd/ceph-55 type xfs
(rw,noatime,attr2,inode64,noquota)
[
Hi,
I know a lot of people already asked these questions but looking at
all the answers I'm still having problems with my OSD balancing. Disks
size is 4TB drives. We are seeing differences up to 10%
Here is a DF from one server:
/dev/sdc1 3.7T 3.0T 680G 82% /var/lib/cep
Hello again,
So whatever magic allows the Dell MD1200 to report the slot position for
each disk isn't present in your JBODs. Time for something else.
There are two sides to your problem:
1) Identifying which disk is where in your JBOD
Quite easy. Again I'd go for a udev rule + script that will
Currently Firefly on Debian stable, all updated.
I already tried it with Giant and it's same.
But it's look like I solved it. I change crush tunables to optimal and now it
shows the size right. And even when I switch back to default it shows it right.
It's weird, but hopefully it's solved for now.
Rama,
Thanks for your reply.
My end goal is to use iSCSI (with LIO/targetcli) to export rbd block devices.
I was encountering issues with iSCSI which are explained in my previous emails.
I ended up being able to reproduce the problem at will on various Kernel and OS
combinations, even on raw RB
Hi
Thanks.
I hoped it would be it, but no ;)
With this mapping :
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb ->
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb
lrwxrwxr
Hi,
I am using firefly version 0.80.7.
I am testing disaster recovery mechanism for rados gateways.
I have followed the federated gateway setup as mentioned in the docs.
There is one region with two zones on the same cluster.
After sync(using radosgw-agent, with "--sync-scope=full"), container
crea
Maybe I should rephrase my question by asking what the relationship is
between bonding and ethtool?
*Roland*
On 18 November 2014 22:14, Roland Giesler wrote:
> Hi people, I have two identical servers (both Sun X2100 M2's) that form
> part of a cluster of 3 machines (other machines will be adde
Hi all,
I'm running firefly 0.80.5 and here is osd perf output :
ceph osd perf
osdid fs_commit_latency(ms) fs_apply_latency(ms)
0 1111
1 1111
2 1370
3
27 matches
Mail list logo