Hi Christian,
> > Hi David,
> >
> > The planned usage for this CephFS cluster is scratch space for an image
> > processing cluster with 100+ processing nodes.
>
> Lots of clients, how much data movement would you expect, how many images
> come in per timeframe, lets say an hour?
> Typical size o
Hi list,
we have a productive Hammer cluster for our OpenStack cloud and
recently a colleague added a cache tier consisting of 2 SSDs and also
a pool size of 2, we're still experimenting with this topic.
Now we have some hardware maintenance to do and need to shutdown
nodes, one at a time
Hi Ceph gurus,
I've got the following problem with our Ceph installation (Jewel): There
are various websites served from the CephFS mount. Sometimes, when I
copy many new (large?) files onto this mount, it seems that after a
certain delay, everything grinds to a halt. No websites are served;
On Tue, 22 Aug 2017 09:54:34 + Eugen Block wrote:
> Hi list,
>
> we have a productive Hammer cluster for our OpenStack cloud and
> recently a colleague added a cache tier consisting of 2 SSDs and also
> a pool size of 2, we're still experimenting with this topic.
>
Risky, but I guess you
Hello,
On Tue, 22 Aug 2017 16:51:47 +0800 Nick Tan wrote:
> Hi Christian,
>
>
>
> > > Hi David,
> > >
> > > The planned usage for this CephFS cluster is scratch space for an image
> > > processing cluster with 100+ processing nodes.
> >
> > Lots of clients, how much data movement would you
I had some issues with the iscsi software starting to early, maybe this
can give you some ideas.
systemctl show target.service -p After
mkdir /etc/systemd/system/target.service.d
cat << 'EOF' > /etc/systemd/system/target.service.d/10-waitforrbd.conf
[Unit]
After=systemd-journald.socket sys-
Hello,
I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD
journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2
Ceph version is
Neat, hadn't seen that command before. Here's the fsck log from the primary
OSD: https://pastebin.com/nZ0H5ag3
Looks like the OSD's bluestore "filesystem" itself has some underlying errors,
though I'm not sure what to do about them.
-Original Message-
From: Brad Hubbard [mailto:bhubb.
Hi Ramazan,
I'm no Ceph expert, but what I can say from my experience using Ceph is:
1) During "Scrubbing", Ceph can be extremely slow. This is probably
where your "blocked requests" are coming from. BTW: Perhaps you can even
find out which processes are currently blocking with: ps aux | grep
Hi all,
I'm still new to ceph and cephfs. Trying out the multi-fs configuration on at
Luminous test cluster. I shutdown the cluster to do an upgrade and when I
brought the cluster back up I now have a warnings that one of the file systems
has a failed mds daemon:
2017-08-21 17:00:00.81 m
On Tue, Aug 22, 2017 at 4:58 PM, Bryan Banister
wrote:
> Hi all,
>
>
>
> I’m still new to ceph and cephfs. Trying out the multi-fs configuration on
> at Luminous test cluster. I shutdown the cluster to do an upgrade and when
> I brought the cluster back up I now have a warnings that one of the f
Hey folks,
I'm staring at a problem that I have found no solution for and which is causing
major issues.
We've had a PG go down with the first 3 OSDs all crashing and coming back only
to crash again with the following error in their logs:
-1> 2017-08-22 17:27:50.961633 7f4af4057700 -1 osd.
Hi Ranjan,
Thanks for your reply. I did set scrub and nodeep-scrub flags. But active
scrubbing operation can’t working properly. Scrubbing operation always in same
pg (20.1e).
$ ceph pg dump | grep scrub
dumped all in format plain
pg_stat objects mip degrmispunf bytes log
Hm. That's quite weird. On our cluster, when I set "noscrub",
"nodeep-scrub", scrubbing will always stop pretty quickly (a few
minutes). I wonder why this doesnt happen on your cluster. When exactly
did you set the flag? Perhaps it just needs some more time... Or there
might be a disk problem w
Hello everyone,
I've been using ceph to provide storage using RBD for 60 KVM virtual
machines running on proxmox.
The ceph cluster we have is very small (2 OSDs + 1 mon per node, and a
total of 3 nodes) and we are having some performace issues, like big
latency times (apply lat:~0.5 s; commi
Have you experienced huge memory consumption by flapping OSD daemons? Restart
could be triggered by no memory (omkiller).
If yes,this could be connected with osd device error,(bad blocks?), but we've
experienced something similar on Jewel, not Kraken release. Solution was to
find PG that cause
It is likely your 2 spinning disks cannot keep up with the load. Things
are likely to improve if you double your OSDs hooking them up to your
existing SSD journal. Technically it would be nice to run a
load/performance tool (either atop/collectl/sysstat) and measure how
busy your resources are, but
Dears,
Some days ago, I read about this comands rbd lock add and rbd lock
remove , this commands will go maintened in ceph in future versions, or the better
form, to use lock in ceph, will go exclusive-lock and this commands will go depreciated
?
Thanks a Lot,
Marcelo
Em 24/07/2017, Jaso
Also examine your network layout. Any saturation in the private cluster
network or client facing network will be felt in clients / libvirt /
virtual machines
As OSD count increases...
- Ensure client network private cluster network seperation - different
nics, different wires, different sw
Hi John,
Seems like you're right... strange that it seemed to work with only one mds
before I shut the cluster down. Here is the `ceph fs get` output for the two
file systems:
[root@carf-ceph-osd15 ~]# ceph fs get carf_ceph_kube01
Filesystem 'carf_ceph_kube01' (2)
fs_name carf_ceph_kube0
I would run some benchmarking throughout the cluster environment to see
where your bottlenecks are before putting time and money into something
that might not be your limiting resource. Sebastian Han put together a
great guide for benchmarking your cluster here.
https://www.sebastien-han.fr/blog/
On Tue, Aug 22, 2017 at 8:49 PM, Bryan Banister
wrote:
> Hi John,
>
>
>
> Seems like you're right... strange that it seemed to work with only one mds
> before I shut the cluster down. Here is the `ceph fs get` output for the
> two file systems:
>
>
>
> [root@carf-ceph-osd15 ~]# ceph fs get carf_c
All sounds right to me... looks like this is a little too bleeding edge for my
taste! I'll probably drop it at this point and just wait till we are actually
on a 4.8 kernel before checking on status again.
Thanks for your help!
-Bryan
-Original Message-
From: John Spray [mailto:jsp...@
Hello,
I have the boto python API working with our ceph cluster but haven't figured
out a way to get boto3 to communicate yet to our RGWs. Anybody have a simple
example?
Cheers for any help!
-Bryan
Note: This email is for the confidential use of the named add
Thanks for your advices Maged, Chris
I'll answer bellow
On 08/22/2017 04:30 PM, Mazzystr wrote:
Also examine your network layout. Any saturation in the private
cluster network or client facing network will be felt in clients /
libvirt / virtual machines
As OSD count increases...
* Ensur
Hi David
I'll try to perform these tests soon.
Thank you.
On 08/22/2017 04:52 PM, David Turner wrote:
I would run some benchmarking throughout the cluster environment to
see where your bottlenecks are before putting time and money into
something that might not be your limiting resource. Seb
On Mon, Aug 21, 2017 at 5:37 PM, Alessandro De Salvo
wrote:
> Hi,
>
> when trying to use df on a ceph-fuse mounted cephfs filesystem with ceph
> luminous >= 12.1.3 I'm having hangs with the following kind of messages in
> the logs:
>
>
> 2017-08-22 02:20:51.094704 7f80addb7700 0 client.174216 ms_
Thanks for the advice Christian. I think I'm leaning more towards the
'traditional' storage server with 12 disks - as you say they give a lot
more flexibility with the performance tuning/network options etc.
The cache pool is an interesting idea but as you say it can get quite
expensive for the c
On Wed, 23 Aug 2017 13:38:25 +0800 Nick Tan wrote:
> Thanks for the advice Christian. I think I'm leaning more towards the
> 'traditional' storage server with 12 disks - as you say they give a lot
> more flexibility with the performance tuning/network options etc.
>
> The cache pool is an intere
29 matches
Mail list logo