Hello Everyone,
I've got a 3 node Jewel cluster setup, and I think I'm missing
something. When I want to take one of my nodes down for maintenance
(kernel upgrades or the like) all of my clients (running the kernel
module for the cephfs filesystem) hang for a couple of minutes before
the r
On Mon, Jan 21, 2019 at 12:12 PM Albert Yue wrote:
>
> Hi Yan Zheng,
>
> 1. mds cache limit is set to 64GB
> 2. we get the size of meta data pool by running `ceph df` and saw meta data
> pool just used 200MB space.
>
That's very strange. One file uses about 1k metadata storage. 560M
files should
On 21.01.19 09:22, Charles Tassell wrote:
> Hello Everyone,
>
> I've got a 3 node Jewel cluster setup, and I think I'm missing
> something. When I want to take one of my nodes down for maintenance
> (kernel upgrades or the like) all of my clients (running the kernel
> module for the cephfs
I think his downtime is coming from the mds failover, that takes a while
in my case to. But I am not using the cephfs that much yet.
-Original Message-
From: Robert Sander [mailto:r.san...@heinlein-support.de]
Sent: 21 January 2019 10:05
To: ceph-users@lists.ceph.com
Subject: Re: [
Hi,
We are running the following radosgw( luminous 12.2.8) replications
scenario.
1) We have 2 clusters, each running a radosgw, Cluster1 defined as master,
and Cluster2 as slave.
2) We create a number of bucket with objects via master and slave
3) We shutdown the Cluster1
4) We execute failover on
Hi Mohamad, How do you do that client side, I am having currently two
kernel mounts?
-Original Message-
From: Mohamad Gebai [mailto:mge...@suse.de]
Sent: 17 January 2019 15:57
To: Marc Roos; ceph-users
Subject: Re: [ceph-users] monitor cephfs mount io's
You can do that either st
I will do that next time. Do you know by any chance if using 'timeout'
could prevent this? From the manual I get that timeout will send a
signal like HUP or KILL, in that case it will be not so different.
-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: 21 Janua
I had this weekend a process stuck in D state writing to a cephfs kernel
mount, causing the load of the server go to 80 (normally around 1).
Forcing me to reboot it.
I think this problem is related to the networking between this vm and
ceph nodes. Rsync also sometimes complains about a broke
Hello!
Sun, Jan 20, 2019 at 09:07:35PM +, robbat2 wrote:
> On Sun, Jan 20, 2019 at 09:05:10PM +, Max Krasilnikov wrote:
> > > Just checking, since it isn't mentioned here: Did you explicitly add
> > > public_network+cluster_network as empty variables?
> > >
> > > Trace the code in the
Hi, we're trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and 3
MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on
different clients without problem.
Then one day we perform failover test and stopped one of the OSD. Not sure if
it's related but after that test
hi marc,
> - how to prevent the D state process to accumulate so much load?
you can't. in linux, uninterruptable tasks themself count as "load",
this does not mean you eg ran out of cpu resources.
stijn
>
> Thanks,
>
>
>
>
>
> ___
> ceph-users ma
On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) wrote:
>
> Hi, we’re trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and 3
> MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on
> different clients without problem.
Is this an upgraded or a fresh cluster?
>
> Th
It could also be the kernel client versions, what are you running? I
remember older kernel clients didn't always deal with recovery scenarios
very well.
On Mon, Jan 21, 2019 at 9:18 AM Marc Roos wrote:
>
>
> I think his downtime is coming from the mds failover, that takes a while
> in my case to
День добрий!
Mon, Jan 21, 2019 at 10:42:58AM +, pseudo wrote:
> > On Sun, Jan 20, 2019 at 09:05:10PM +, Max Krasilnikov wrote:
> > > > Just checking, since it isn't mentioned here: Did you explicitly add
> > > > public_network+cluster_network as empty variables?
> > > >
> > > > Trace
Hi,
I'm curious.what is the advantage of OSPF in your setup over e.g.
LACP bonding both links?
Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
If ToR switches are L3 then you can not use LACP.
On Mon, Jan 21, 2019 at 4:02 PM Burkhard Linke
wrote:
>
> Hi,
>
>
> I'm curious.what is the advantage of OSPF in your setup over e.g.
> LACP bonding both links?
>
>
> Regards,
>
> Burkhard
>
>
> ___
On Sun, Jan 20, 2019 at 11:30 PM Brian Topping wrote:
>
> Hi all, looks like I might have pooched something. Between the two nodes I
> have, I moved all the PGs to one machine, reformatted the other machine,
> rebuilt that machine, and moved the PGs back. In both cases, I did this by
> taking t
On 1/18/2019 6:33 PM, KEVIN MICHAEL HRPCEK wrote:
On 1/18/19 7:26 AM, Igor Fedotov wrote:
Hi Kevin,
On 1/17/2019 10:50 PM, KEVIN MICHAEL HRPCEK wrote:
Hey,
I recall reading about this somewhere but I can't find it in the
docs or list archive and confirmation from a dev or someone who
kn
Burkhard Linke writes:
> I'm curious.what is the advantage of OSPF in your setup over
> e.g. LACP bonding both links?
Good question! Some people (including myself) are uncomfortable with
LACP (in particular "MLAG", i.e. port aggregation across multiple
chassis), and with fancy L2 setups in gen
Hi all,
I'm looking to keep some extra meta-data associated with radosgw users
created by radosgw-admin. I saw in the output of 'radosgw-admin
metadata get user:someuser" there is an 'attrs' structure that looked
promising. However it seems to be strict about what it accepts so I
wonder if that'
On Mon, Jan 21, 2019 at 5:41 PM Marc Roos wrote:
>
>
>
> I will do that next time. Do you know by any chance if using 'timeout'
> could prevent this? From the manual I get that timeout will send a
> signal like HUP or KILL, in that case it will be not so different.
>
>
no, there is no config for
Hi,
my use case for Ceph is serving a central backup storage.
This means I will backup multiple databases in Ceph storage cluster.
This is my question:
What is the best practice for creating pools & images?
Should I create multiple pools, means one pool per database?
Or should I create a single
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote:
>
> Dear Ceph Users,
>
> We have set up a cephFS cluster with 6 osd machines, each with 16 8TB
> harddisk. Ceph version is luminous 12.2.5. We created one data pool with
> these hard disks and created another meta data pool with 3 ssd. We create
Hello list.
Today while redeploying an OSD I've noticed that links to DB/WAL devices
are pointing to partitions themselves, not to the partition UUID how it was
before.
I think that changed with latest ceph-deploy.
I'm using 12.2.2 on my mon/osd nodes.
ceph-deploy is 2.0.1 on admin node.
All node
How can you see that the cache is filling up and you need to execute
"echo 2 > /proc/sys/vm/drop_caches"?
-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: 21 January 2019 15:50
To: Albert Yue
Cc: ceph-users
Subject: Re: [ceph-users] MDS performance issue
On Mon,
Hey all,
Here's the tech talk recording:
https://www.youtube.com/watch?v=uW6NvsYFX-s
--
Mike Perez (thingee)
On Wed, Jan 16, 2019 at 4:01 PM Sage Weil wrote:
>
> Hi everyone,
>
> First, this is a reminder that there is a Tech Talk tomorrow from Guy
> Margalit about NooBaa, a multi-cloud object
Den fre 18 jan. 2019 kl 12:42 skrev Robert Sander
:
> > Assuming BlueStore is too fat for my crappy nodes, do I need to go to
> > FileStore? If yes, then with xfs as the file system? Journal on the SSD as
> > a directory, then?
>
> Journal for FileStore is also a block device.
It can be a file
>>How can you see that the cache is filling up and you need to execute
>>"echo 2 > /proc/sys/vm/drop_caches"?
you can monitor number of ceph dentry in slabinfo
here a small script I'm running in cron.
#!/bin/bash
if pidof -o %PPID -x "dropcephinodecache.sh">/dev/null; then
echo "Pro
Hi,
my use case for Ceph is serving a central backup storage.
This means I will backup multiple databases in Ceph storage cluster.
This is my question:
What is the best practice for creating pools & images?
Should I create multiple pools, means one pool per database?
Or should I create a single
Hey everyone,
Cephalocon Barcelona 2019 early bird registration is now available
through February 15th. After that rates go up, so please register now
to lock in your discounted ticket.
https://ceph.com/cephalocon/barcelona-2019/
As a reminder, the CFP will close February 1st. If you need assist
Hi Sebastien,
Thank you for following up on this. I have resolved the issue by
zapping all the disks and switching to the LVM scenario. I will open
an issue on GitHub if I ever run into the same problem again later.
Thanks and Cheers,
Cody
On Mon, Jan 21, 2019 at 4:23 AM Sebastien Han wrote:
>
> On Jan 18, 2019, at 3:48 AM, Eugen Leitl wrote:
>
>
> (Crossposting this from Reddit /r/ceph , since likely to have more technical
> audience present here).
>
> I've scrounged up 5 old Atom Supermicro nodes and would like to run them
> 365/7 for limited production as RBD with Bluestore (ide
> On Jan 21, 2019, at 6:47 AM, Alfredo Deza wrote:
>
> When creating an OSD, ceph-volume will capture the ID and the FSID and
> use these to create a systemd unit. When the system boots, it queries
> LVM for devices that match that ID/FSID information.
Thanks Alfredo, I see that now. The name co
http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
should still be current enough and makes good reading on the subject.
On Mon, Jan 21, 2019 at 8:46 PM Stijn De Weirdt wrote:
>
> hi marc,
>
> > - how to prevent the D state process to accumulate so much load?
> you can't. in lin
On Mon, Jan 21, 2019 at 12:52 AM Yan, Zheng wrote:
> On Mon, Jan 21, 2019 at 12:12 PM Albert Yue
> wrote:
> >
> > Hi Yan Zheng,
> >
> > 1. mds cache limit is set to 64GB
> > 2. we get the size of meta data pool by running `ceph df` and saw meta
> data pool just used 200MB space.
> >
>
> That's v
Hi,
> Is this an upgraded or a fresh cluster?
It's a fresh cluster.
> Does client.acapp1 have the permission to blacklist other clients? You can
> check with "ceph auth get client.acapp1".
No, it's our first Ceph cluster with basic setup for testing, without any
blacklist implemented.
Hi Yan Zheng,
In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB
memory machine?
On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng wrote:
> On Mon, Jan 21, 2019 at 11:16 AM Albert Yue
> wrote:
> >
> > Dear Ceph Users,
> >
> > We have set up a cephFS cluster with 6 osd machines,
37 matches
Mail list logo