[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-03 Thread Marc
Not using cephadm, I would also question other things like:

- If it uses docker and docker daemon fails what happens to you containers?
- I assume the ceph-osd containers need linux capability sysadmin. So if you 
have to allow this via your OC, all your tasks have potentially access to this 
permission. (That is why I chose not to allow the OC access to it)
- cephadm only runs with docker?




> -Original Message-
> From: Martin Verges 
> Sent: 02 June 2021 13:29
> To: Matthew Vernon 
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: Why you might want packages not containers for
> Ceph deployments
> 
> Hello,
> 
> I agree to Matthew, here at croit we work a lot with containers all day
> long. No problem with that and enough knowledge to say for sure it's not
> about getting used to it.
> For us and our decisions here, Storage is the most valuable piece of IT
> equipment in a company. If you have problems with your storage, most
> likely
> you have a huge pain, costs, problems, downtime, whatever. Therefore,
> your
> storage solution must be damn simple, you switch it on, it has to work.
> 
> If you take a short look into Ceph documentation about how to deploy a
> cephadm cluster vs croit. We strongly believe it's much easier as we
> take
> away all the pain from OS up to Ceph while keeping it simple behind the
> scene. You still can always login to a node, kill a process, attach some
> strace or whatever you like as you know it from years of linux
> administration without any complexity layers like docker/podman/... It's
> just friction less. In the end, what do you need? A kernel, an
> initramfs,
> some systemd, a bit of libs and tooling, and the Ceph packages.
> 
> In addition, we help lot's of Ceph users on a regular basis with their
> hand
> made setups, but we don't really wanna touch the cephadm ones, as they
> are
> often harder to debug. But of course we do it anyways :).
> 
> To have a perfect storage, strip away anything unneccessary. Avoid any
> complexity, avoid anything that might affect your system. Keep it simply
> stupid.
> 
> --
> Martin Verges
> Managing director
> 
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> 
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
> 
> 
> On Wed, 2 Jun 2021 at 11:38, Matthew Vernon  wrote:
> 
> > Hi,
> >
> > In the discussion after the Ceph Month talks yesterday, there was a
> bit
> > of chat about cephadm / containers / packages. IIRC, Sage observed
> that
> > a common reason in the recent user survey for not using cephadm was
> that
> > it only worked on containerised deployments. I think he then went on
> to
> > say that he hadn't heard any compelling reasons why not to use
> > containers, and suggested that resistance was essentially a user
> > education question[0].
> >
> > I'd like to suggest, briefly, that:
> >
> > * containerised deployments are more complex to manage, and this is
> not
> > simply a matter of familiarity
> > * reducing the complexity of systems makes admins' lives easier
> > * the trade-off of the pros and cons of containers vs packages is not
> > obvious, and will depend on deployment needs
> > * Ceph users will benefit from both approaches being supported into
> the
> > future
> >
> > We make extensive use of containers at Sanger, particularly for
> > scientific workflows, and also for bundling some web apps (e.g.
> > Grafana). We've also looked at a number of container runtimes (Docker,
> > singularity, charliecloud). They do have advantages - it's easy to
> > distribute a complex userland in a way that will run on (almost) any
> > target distribution; rapid "cloud" deployment; some separation (via
> > namespaces) of network/users/processes.
> >
> > For what I think of as a 'boring' Ceph deploy (i.e. install on a set
> of
> > dedicated hardware and then run for a long time), I'm not sure any of
> > these benefits are particularly relevant and/or compelling - Ceph
> > upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud
> > Archive) provide .debs of a couple of different Ceph releases per
> Ubuntu
> > LTS - meaning we can easily separate out OS upgrade from Ceph upgrade.
> > And upgrading the Ceph packages _doesn't_ restart the daemons[1],
> > meaning that we maintain control over restart order during an upgrade.
> > And while we might briefly install packages from a PPA or similar to
> > test a bugfix, we roll those (test-)cluster-wide, rather than trying
> to
> > run a mixed set of versions on a single cluster - and I understand
> this
> > single-version approach is best practice.
> >
> > Deployment via containers does bring complexity; some examples we've
> > found at Sanger (not all Ceph-related, which we run from packages):
> >
> > * you now have 2 process supervision points - dockerd and systemd
> > * dock

[ceph-users] radosgw-admin bucket delete linear memory growth?

2021-06-03 Thread Janne Johansson
I am seeing a huge usage of ram, while my bucket delete is churning
over left-over multiparts, and while I realize there are *many* being
done a 1000 at a time, like this:

2021-06-03 07:29:06.408 7f9b7f633240  0 abort_bucket_multiparts
WARNING : aborted 254000 incomplete multipart uploads

..my first run ended radosgw-admin with out-of-mem, so it seems some
part of this keeps data or forgets to free up old parts of the lists
after they have been cancelled? I also don't know if restarting means
it iterates over the same ones again, or if this log line means 254k
are processed and if I need to restart again, it will at least have
made a few hours of progress.

At this point, RES is 2.1g so roughly 1 bytes per "entry" if it is
linear somehow.

ceph 13.2.10 in this case.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-03 Thread Sasha Litvak
Podman containers will not restart due to restart or failure of centralized
podman daemon.  Container is not synonymous to Docker.  This thread reminds
me systemd haters threads more and more by I guess it is fine.

On Thu, Jun 3, 2021, 2:16 AM Marc  wrote:

> Not using cephadm, I would also question other things like:
>
> - If it uses docker and docker daemon fails what happens to you containers?
> - I assume the ceph-osd containers need linux capability sysadmin. So if
> you have to allow this via your OC, all your tasks have potentially access
> to this permission. (That is why I chose not to allow the OC access to it)
> - cephadm only runs with docker?
>
>
>
>
> > -Original Message-
> > From: Martin Verges 
> > Sent: 02 June 2021 13:29
> > To: Matthew Vernon 
> > Cc: ceph-users@ceph.io
> > Subject: [ceph-users] Re: Why you might want packages not containers for
> > Ceph deployments
> >
> > Hello,
> >
> > I agree to Matthew, here at croit we work a lot with containers all day
> > long. No problem with that and enough knowledge to say for sure it's not
> > about getting used to it.
> > For us and our decisions here, Storage is the most valuable piece of IT
> > equipment in a company. If you have problems with your storage, most
> > likely
> > you have a huge pain, costs, problems, downtime, whatever. Therefore,
> > your
> > storage solution must be damn simple, you switch it on, it has to work.
> >
> > If you take a short look into Ceph documentation about how to deploy a
> > cephadm cluster vs croit. We strongly believe it's much easier as we
> > take
> > away all the pain from OS up to Ceph while keeping it simple behind the
> > scene. You still can always login to a node, kill a process, attach some
> > strace or whatever you like as you know it from years of linux
> > administration without any complexity layers like docker/podman/... It's
> > just friction less. In the end, what do you need? A kernel, an
> > initramfs,
> > some systemd, a bit of libs and tooling, and the Ceph packages.
> >
> > In addition, we help lot's of Ceph users on a regular basis with their
> > hand
> > made setups, but we don't really wanna touch the cephadm ones, as they
> > are
> > often harder to debug. But of course we do it anyways :).
> >
> > To have a perfect storage, strip away anything unneccessary. Avoid any
> > complexity, avoid anything that might affect your system. Keep it simply
> > stupid.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > On Wed, 2 Jun 2021 at 11:38, Matthew Vernon  wrote:
> >
> > > Hi,
> > >
> > > In the discussion after the Ceph Month talks yesterday, there was a
> > bit
> > > of chat about cephadm / containers / packages. IIRC, Sage observed
> > that
> > > a common reason in the recent user survey for not using cephadm was
> > that
> > > it only worked on containerised deployments. I think he then went on
> > to
> > > say that he hadn't heard any compelling reasons why not to use
> > > containers, and suggested that resistance was essentially a user
> > > education question[0].
> > >
> > > I'd like to suggest, briefly, that:
> > >
> > > * containerised deployments are more complex to manage, and this is
> > not
> > > simply a matter of familiarity
> > > * reducing the complexity of systems makes admins' lives easier
> > > * the trade-off of the pros and cons of containers vs packages is not
> > > obvious, and will depend on deployment needs
> > > * Ceph users will benefit from both approaches being supported into
> > the
> > > future
> > >
> > > We make extensive use of containers at Sanger, particularly for
> > > scientific workflows, and also for bundling some web apps (e.g.
> > > Grafana). We've also looked at a number of container runtimes (Docker,
> > > singularity, charliecloud). They do have advantages - it's easy to
> > > distribute a complex userland in a way that will run on (almost) any
> > > target distribution; rapid "cloud" deployment; some separation (via
> > > namespaces) of network/users/processes.
> > >
> > > For what I think of as a 'boring' Ceph deploy (i.e. install on a set
> > of
> > > dedicated hardware and then run for a long time), I'm not sure any of
> > > these benefits are particularly relevant and/or compelling - Ceph
> > > upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud
> > > Archive) provide .debs of a couple of different Ceph releases per
> > Ubuntu
> > > LTS - meaning we can easily separate out OS upgrade from Ceph upgrade.
> > > And upgrading the Ceph packages _doesn't_ restart the daemons[1],
> > > meaning that we maintain control over restart order during an upgrade.
> > > And while we might briefly install

[ceph-users] Re: Can we deprecate FileStore in Quincy?

2021-06-03 Thread Ansgar Jazdzewski
Hi folks,

I'm fine with dropping Filestore in the R release!
Only one thing to add is: please add a warning to all versions we can
upgrade from to the R release son not only Quincy but also pacific!

Thanks,
Ansgar

Neha Ojha  schrieb am Di., 1. Juni 2021, 21:24:

> Hello everyone,
>
> Given that BlueStore has been the default and more widely used
> objectstore since quite some time, we would like to understand whether
> we can consider deprecating FileStore in our next release, Quincy and
> remove it in the R release. There is also a proposal [0] to add a
> health warning to report FileStore OSDs.
>
> We discussed this topic in the Ceph Month session today [1] and there
> were no objections from anybody on the call. I wanted to reach out to
> the list to check if there are any concerns about this or any users
> who will be impacted by this decision.
>
> Thanks,
> Neha
>
> [0] https://github.com/ceph/ceph/pull/39440
> [1] https://pad.ceph.com/p/ceph-month-june-2021
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-03 Thread Nico Schottelius


Dera Sasha and for everyone else as well,

Sasha Litvak  writes:

> Podman containers will not restart due to restart or failure of centralized
> podman daemon.  Container is not synonymous to Docker.  This thread reminds
> me systemd haters threads more and more by I guess it is fine.

calling people "haters" who have reservations or critique is simply
wrong, harmful and disrespectful.

I ask you to retract above statement immediately for the sake of
the ceph and open source community.

Nico


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can we deprecate FileStore in Quincy?

2021-06-03 Thread Willem Jan Withagen

On 2-6-2021 21:56, Neha Ojha wrote:

On Wed, Jun 2, 2021 at 12:31 PM Willem Jan Withagen  wrote:

On 1-6-2021 21:24, Neha Ojha wrote:

Hello everyone,

Given that BlueStore has been the default and more widely used
objectstore since quite some time, we would like to understand whether
we can consider deprecating FileStore in our next release, Quincy and
remove it in the R release. There is also a proposal [0] to add a
health warning to report FileStore OSDs.

We discussed this topic in the Ceph Month session today [1] and there
were no objections from anybody on the call. I wanted to reach out to
the list to check if there are any concerns about this or any users
who will be impacted by this decision.

That means that I need really get going on finishing the bluestore stuff
in the FreeBSD port. Getting things in has been slow, real slow, also due
to me being bussy with company stuff.

So when is the expected "kill filestore" date?

Currently, the proposal is to remove it in the R release (one release
after deprecation), which should be in March 2023. Would that give you
enough time to migrate?


If I can't get it in by then, that would be odd. In concept I already 
had it working
but getting the code in is sometimes hard because the panels are sliding 
at quite
some pace, and I do not always have time to keep up due to work 
requirements.


I'll probably modify/conditionalise the deperication remark for FreeBSD, 
not to

scare the few testing users that are there.

--WjW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-03 Thread Robert W. Eckert
My Cephadm deployment on RHEL8 created a service for each container, complete 
with restarts.  And on the host, the processes run under the 'ceph' user 
account.

The biggest issue I had with running as containers is that the unit.run script 
generated runs podman -rm ...   with the -rm, the logs are removed when there 
is an issue, so it took extra effort to figure out what the issue I was having 
on one machine was. (Although that turned out to be a bad memory chip, which 
would have manifested itself anyway).

As a manager of a team who develops microservices in containers, I have a mixed 
attitude towards them - the ability to know the version of Java, and support 
libraries deployed for a given container, and isolate updates one image at a 
time can be a bonus, but with my client's CI/CD pipeline and how they want our 
containers to be built, simple tasks like upgrading a package version, 
upgrading the version of Java or even a simple replacement of a certificate has 
become significantly more difficult, because we need to rebuild all of the 
containers and go through the QA processes rather than update a cert.

For my usage, (At home running Ceph on older hardware I converted to servers) I 
don't want to have to care about ceph dependencies and also isolate things from 
other things running on the server, so a container infrastructure works well, 
but I can see where packages can be much better in a well maintained server 
infrastructure.



-Original Message-
From: Sasha Litvak  
Sent: Thursday, June 3, 2021 3:57 AM
To: ceph-users 
Subject: [ceph-users] Re: Why you might want packages not containers for Ceph 
deployments

Podman containers will not restart due to restart or failure of centralized 
podman daemon.  Container is not synonymous to Docker.  This thread reminds me 
systemd haters threads more and more by I guess it is fine.

On Thu, Jun 3, 2021, 2:16 AM Marc  wrote:

> Not using cephadm, I would also question other things like:
>
> - If it uses docker and docker daemon fails what happens to you containers?
> - I assume the ceph-osd containers need linux capability sysadmin. So 
> if you have to allow this via your OC, all your tasks have potentially 
> access to this permission. (That is why I chose not to allow the OC 
> access to it)
> - cephadm only runs with docker?
>
>
>
>
> > -Original Message-
> > From: Martin Verges 
> > Sent: 02 June 2021 13:29
> > To: Matthew Vernon 
> > Cc: ceph-users@ceph.io
> > Subject: [ceph-users] Re: Why you might want packages not containers 
> > for Ceph deployments
> >
> > Hello,
> >
> > I agree to Matthew, here at croit we work a lot with containers all 
> > day long. No problem with that and enough knowledge to say for sure 
> > it's not about getting used to it.
> > For us and our decisions here, Storage is the most valuable piece of 
> > IT equipment in a company. If you have problems with your storage, 
> > most likely you have a huge pain, costs, problems, downtime, 
> > whatever. Therefore, your storage solution must be damn simple, you 
> > switch it on, it has to work.
> >
> > If you take a short look into Ceph documentation about how to deploy 
> > a cephadm cluster vs croit. We strongly believe it's much easier as 
> > we take away all the pain from OS up to Ceph while keeping it simple 
> > behind the scene. You still can always login to a node, kill a 
> > process, attach some strace or whatever you like as you know it from 
> > years of linux administration without any complexity layers like 
> > docker/podman/... It's just friction less. In the end, what do you 
> > need? A kernel, an initramfs, some systemd, a bit of libs and 
> > tooling, and the Ceph packages.
> >
> > In addition, we help lot's of Ceph users on a regular basis with 
> > their hand made setups, but we don't really wanna touch the cephadm 
> > ones, as they are often harder to debug. But of course we do it 
> > anyways :).
> >
> > To have a perfect storage, strip away anything unneccessary. Avoid 
> > any complexity, avoid anything that might affect your system. Keep 
> > it simply stupid.
> >
> > --
> > Martin Verges
> > Managing director
> >
> > Mobile: +49 174 9335695
> > E-Mail: martin.ver...@croit.io
> > Chat: https://t.me/MartinVerges
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht 
> > Munich HRB 231263
> >
> > Web: https://croit.io
> > YouTube: https://goo.gl/PGE1Bx
> >
> >
> > On Wed, 2 Jun 2021 at 11:38, Matthew Vernon  wrote:
> >
> > > Hi,
> > >
> > > In the discussion after the Ceph Month talks yesterday, there was 
> > > a
> > bit
> > > of chat about cephadm / containers / packages. IIRC, Sage observed
> > that
> > > a common reason in the recent user survey for not using cephadm 
> > > was
> > that
> > > it only worked on containerised deployments. I think he then went 
> > > on
> > to
> > > say that he hadn't heard any compelling reasons why not to use 
> > > conta

[ceph-users] Re: Can we deprecate FileStore in Quincy?

2021-06-03 Thread Neha Ojha
On Thu, Jun 3, 2021 at 2:34 AM Ansgar Jazdzewski
 wrote:
>
> Hi folks,
>
> I'm fine with dropping Filestore in the R release!
> Only one thing to add is: please add a warning to all versions we can upgrade 
> from to the R release son not only Quincy but also pacific!

Sure!

- Neha

>
> Thanks,
> Ansgar
>
> Neha Ojha  schrieb am Di., 1. Juni 2021, 21:24:
>>
>> Hello everyone,
>>
>> Given that BlueStore has been the default and more widely used
>> objectstore since quite some time, we would like to understand whether
>> we can consider deprecating FileStore in our next release, Quincy and
>> remove it in the R release. There is also a proposal [0] to add a
>> health warning to report FileStore OSDs.
>>
>> We discussed this topic in the Ceph Month session today [1] and there
>> were no objections from anybody on the call. I wanted to reach out to
>> the list to check if there are any concerns about this or any users
>> who will be impacted by this decision.
>>
>> Thanks,
>> Neha
>>
>> [0] https://github.com/ceph/ceph/pull/39440
>> [1] https://pad.ceph.com/p/ceph-month-june-2021
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD

2021-06-03 Thread Mark Nelson
I suspect the behavior of the controller and the behavior of the drive 
firmware will end up mattering more than SAS vs SATA.  As always it's 
best if you can test it first before committing to buying a pile of 
them.  Historically I have seen SATA drives that have performed well as 
far as HDDs go though.



Mark

On 6/3/21 4:25 PM, Dave Hall wrote:

Hello,

We're planning another batch of OSD nodes for our cluster.  Our prior nodes
have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
circumstances and the shortage of drives those 12TB SAS drives are in short
supply.

Our integrator has offered an option of 8 x 14TB SATA drives (still
Enterprise).  For Ceph, will the switch to SATA carry a performance
difference that I should be concerned about?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD

2021-06-03 Thread Jamie Fargen
Dave-

These are just general observations of how SATA drives operate in storage
clusters.

It has been a while since I have run a storage cluster with SATA drives,
but in the past I did notice that SATA drives would drop off the
controllers pretty frequently. Depending on many factors, it may just be a
brief outage where the drive wasn't available but recoverable, sometimes it
meant going into the controller and rescanning for drives before they could
be added back to the system, the worst was one chassis that would mark the
drives as failed after the drive dropped off a certain number of times and
the vendor could not correct the issue with a firmware update and had to
replace the storage chassis.


Regards,
-Jamie

On Thu, Jun 3, 2021 at 5:26 PM Dave Hall  wrote:

> Hello,
>
> We're planning another batch of OSD nodes for our cluster.  Our prior nodes
> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
> circumstances and the shortage of drives those 12TB SAS drives are in short
> supply.
>
> Our integrator has offered an option of 8 x 14TB SATA drives (still
> Enterprise).  For Ceph, will the switch to SATA carry a performance
> difference that I should be concerned about?
>
> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
Jamie Fargen
Senior Consultant
jfar...@redhat.com
813-817-4430
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Ansible fails on check if monitor initial keyring already exists

2021-06-03 Thread Jared Jacob
I am running the Ceph ansible script to install ceph version Stable-6.0
(Pacific).

When running the sample yml file that was supplied by the github repo it
runs fine up until the "ceph-mon : check if monitor initial keyring already
exists" step. There it will hang for 30-40 minutes before failing.

>From my understanding ceph ansible should be creating this keyring and
using it for communication between monitors, so does anyone know why the
playbook would have a hard time with this step?

Thanks in advance!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD

2021-06-03 Thread Anthony D'Atri

Agreed.  I think oh …. maybe 15-20 years ago there was often a wider difference 
between SAS and SATA drives, but with modern queuing etc. my sense is that 
there is less of an advantage.  Seek and rotational latency I suspect dwarf 
interface differences wrt performance.  The HBA may be a bigger bottleneck (and 
way more trouble).

500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with RGW, or 
as dmcache or something?

Depending on your constraints, QLC flash might be more competitive than you 
think ;)

— aad


> I suspect the behavior of the controller and the behavior of the drive 
> firmware will end up mattering more than SAS vs SATA.  As always it's best if 
> you can test it first before committing to buying a pile of them.  
> Historically I have seen SATA drives that have performed well as far as HDDs 
> go though.
> 
> 
> Mark
> 
> On 6/3/21 4:25 PM, Dave Hall wrote:
>> Hello,
>> 
>> We're planning another batch of OSD nodes for our cluster.  Our prior nodes
>> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
>> circumstances and the shortage of drives those 12TB SAS drives are in short
>> supply.
>> 
>> Our integrator has offered an option of 8 x 14TB SATA drives (still
>> Enterprise).  For Ceph, will the switch to SATA carry a performance
>> difference that I should be concerned about?
>> 
>> Thanks.
>> 
>> -Dave
>> 
>> --
>> Dave Hall
>> Binghamton University
>> kdh...@binghamton.edu
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD - WAL+DB sizing.

2021-06-03 Thread Mark Nelson
FWIW, those guidelines try to be sort of a one-size-fits-all 
recommendation that may not apply to your situation.  Typically RBD has 
pretty low metadata overhead so you can get away with smaller DB 
partitions.  4% should easily be enough.  If you are running heavy RGW 
write workloads with small objects, you will almost certainly use more 
than 4% for metadata (I've seen worst case up to 50%, but that was 
before column family sharding which should help to some extent).  Having 
said that, bluestore will roll the higher rocksdb levels over to the 
slow device and keep the wall, L0, and other lower LSM levels on the 
fast device.  It's not necessarily the end of the world if you end up 
with some of the more rarely used metadata on the HDD but having it on 
flash certain is nice.



Mark


On 6/3/21 5:18 PM, Dave Hall wrote:

Anthony,

I had recently found a reference in the Ceph docs that indicated something
like 40GB per TB for WAL+DB space.  For a 12TB HDD that comes out to
480GB.  If this is no longer the guideline I'd be glad to save a couple
dollars.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu

On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri 
wrote:


Agreed.  I think oh …. maybe 15-20 years ago there was often a wider
difference between SAS and SATA drives, but with modern queuing etc. my
sense is that there is less of an advantage.  Seek and rotational latency I
suspect dwarf interface differences wrt performance.  The HBA may be a
bigger bottleneck (and way more trouble).

500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with
RGW, or as dmcache or something?

Depending on your constraints, QLC flash might be more competitive than
you think ;)

— aad



I suspect the behavior of the controller and the behavior of the drive

firmware will end up mattering more than SAS vs SATA.  As always it's best
if you can test it first before committing to buying a pile of them.
Historically I have seen SATA drives that have performed well as far as
HDDs go though.


Mark

On 6/3/21 4:25 PM, Dave Hall wrote:

Hello,

We're planning another batch of OSD nodes for our cluster.  Our prior

nodes

have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
circumstances and the shortage of drives those 12TB SAS drives are in

short

supply.

Our integrator has offered an option of 8 x 14TB SATA drives (still
Enterprise).  For Ceph, will the switch to SATA carry a performance
difference that I should be concerned about?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD - WAL+DB sizing.

2021-06-03 Thread Anthony D'Atri
In releases before … Pacific I think, there are certain discrete capacities 
that DB will actually utilize:  the sum of RocksDB levels.  Lots of discussion 
in the archives. AIUI in those releases, with a 500 GB BlueStore WAL+DB device, 
you’ll with default settings only actually use ~~300 GB most of the time, 
though the extra might accelerate compaction.  With Pacific I believe code was 
merged that shards OSD RocksDB to make better use of arbitrary partition / 
devices sizes.

With older releases one can (or so I’ve read) game this a bit by carefully 
adjusting rocksdb.max-bytes-for-level-base; ISTR that Karan did that for his 
impressive 10 Billion Object exercise.

I’ve seen threads on the list over the past couple of years that seemed to show 
spillover despite the DB device not being fully utilized; I hope that’s since 
been addressed.

My understanding is that with column sharding, compaction only takes place on a 
fraction of the DB at any one time, so the transient space used for it (and 
thus prone to spillover) should be lessened.

I may of course be out of my Vulcan mind, but HTH.

— aad

> On Jun 3, 2021, at 5:29 PM, Dave Hall  wrote:
> 
> Mark,
> 
> We are running a mix of RGW, RDB, and CephFS.  Our CephFS is pretty big,
> but we're moving a lot of it to RGW.  What prompted me to go looking for a
> guideline was a high frequency of Spillover warnings as our cluster filled
> up past the 50% mark.  That was with 14.2.9, I think.  I understand that
> some things have changed since, but I think I'd like to have the
> flexibility and performance of a generous WAL+DB - the cluster is used to
> store research data, and the usage pattern is tending to change as the
> research evolves.  No telling what our mix will be a year from now.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
> 
> 
> On Thu, Jun 3, 2021 at 7:39 PM Mark Nelson  wrote:
> 
>> FWIW, those guidelines try to be sort of a one-size-fits-all
>> recommendation that may not apply to your situation.  Typically RBD has
>> pretty low metadata overhead so you can get away with smaller DB
>> partitions.  4% should easily be enough.  If you are running heavy RGW
>> write workloads with small objects, you will almost certainly use more
>> than 4% for metadata (I've seen worst case up to 50%, but that was
>> before column family sharding which should help to some extent).  Having
>> said that, bluestore will roll the higher rocksdb levels over to the
>> slow device and keep the wall, L0, and other lower LSM levels on the
>> fast device.  It's not necessarily the end of the world if you end up
>> with some of the more rarely used metadata on the HDD but having it on
>> flash certain is nice.
>> 
>> 
>> Mark
>> 
>> 
>> On 6/3/21 5:18 PM, Dave Hall wrote:
>>> Anthony,
>>> 
>>> I had recently found a reference in the Ceph docs that indicated
>> something
>>> like 40GB per TB for WAL+DB space.  For a 12TB HDD that comes out to
>>> 480GB.  If this is no longer the guideline I'd be glad to save a couple
>>> dollars.
>>> 
>>> -Dave
>>> 
>>> --
>>> Dave Hall
>>> Binghamton University
>>> kdh...@binghamton.edu
>>> 
>>> On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri 
>>> wrote:
>>> 
 Agreed.  I think oh …. maybe 15-20 years ago there was often a wider
 difference between SAS and SATA drives, but with modern queuing etc. my
 sense is that there is less of an advantage.  Seek and rotational
>> latency I
 suspect dwarf interface differences wrt performance.  The HBA may be a
 bigger bottleneck (and way more trouble).
 
 500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with
 RGW, or as dmcache or something?
 
 Depending on your constraints, QLC flash might be more competitive than
 you think ;)
 
 — aad
 
 
> I suspect the behavior of the controller and the behavior of the drive
 firmware will end up mattering more than SAS vs SATA.  As always it's
>> best
 if you can test it first before committing to buying a pile of them.
 Historically I have seen SATA drives that have performed well as far as
 HDDs go though.
> 
> Mark
> 
> On 6/3/21 4:25 PM, Dave Hall wrote:
>> Hello,
>> 
>> We're planning another batch of OSD nodes for our cluster.  Our prior
 nodes
>> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
>> circumstances and the shortage of drives those 12TB SAS drives are in
 short
>> supply.
>> 
>> Our integrator has offered an option of 8 x 14TB SATA drives (still
>> Enterprise).  For Ceph, will the switch to SATA carry a performance
>> difference that I should be concerned about?
>> 
>> Thanks.
>> 
>> -Dave
>> 
>> --
>> Dave Hall
>> Binghamton University
>> kdh...@binghamton.edu
>> ___
>> ceph-users mailing list --

[ceph-users] OSD Won't Start - LVM IOCTL Error - Read-only

2021-06-03 Thread Dave Hall
Hello,

I had an OSD drop out a couple days ago.  This is 14.2.16, Bluestore, HDD +
NVMe, non-container.  The HDD sort of went away.  I powered down the node,
reseated the drive, and it came back.  However, the OSD won't start.
Systemctl --failed shows that the lvm2 pvscan failed, preventing the OSD
unit from starting.

Running the pvscan activate command manually with with verbose gave
'device-mapper: reload ioctl on  (253:7) failed: Read-only file system'.  I
have been looking at this for a while, but I can't figure out what is
read-only that is causing the problem.  The full output of the pvscan is:

# pvscan --cache --activate ay --verbose '8:48'
pvscan devices on command line.
activation/auto_activation_volume_list configuration setting not
defined: All logical volumes will be auto-activated.
Activating logical volume
ceph-block-b1fea172-71a4-463e-a3e3-8cdcc1bc7b79/osd-block-425faf92-449e-4b57-98f2-a90a7f60e2a4.
activation/volume_list configuration setting not defined: Checking only
host tags for
ceph-block-b1fea172-71a4-463e-a3e3-8cdcc1bc7b79/osd-block-425faf92-449e-4b57-98f2-a90a7f60e2a4.
Creating
ceph--block--b1fea172--71a4--463e--a3e3--8cdcc1bc7b79-osd--block--425faf92--449e--4b57--98f2--a90a7f60e2a4
Loading table for
ceph--block--b1fea172--71a4--463e--a3e3--8cdcc1bc7b79-osd--block--425faf92--449e--4b57--98f2--a90a7f60e2a4
(253:7).
  device-mapper: reload ioctl on  (253:7) failed: Read-only file system
Removing
ceph--block--b1fea172--71a4--463e--a3e3--8cdcc1bc7b79-osd--block--425faf92--449e--4b57--98f2--a90a7f60e2a4
(253:7)
Activated 0 logical volumes in volume group
ceph-block-b1fea172-71a4-463e-a3e3-8cdcc1bc7b79.
  0 logical volume(s) in volume group
"ceph-block-b1fea172-71a4-463e-a3e3-8cdcc1bc7b79" now active
  ceph-block-b1fea172-71a4-463e-a3e3-8cdcc1bc7b79: autoactivation failed.

-Dave
--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] SAS vs SATA for OSD

2021-06-03 Thread Dave Hall
Hello,

We're planning another batch of OSD nodes for our cluster.  Our prior nodes
have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
circumstances and the shortage of drives those 12TB SAS drives are in short
supply.

Our integrator has offered an option of 8 x 14TB SATA drives (still
Enterprise).  For Ceph, will the switch to SATA carry a performance
difference that I should be concerned about?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD - WAL+DB sizing.

2021-06-03 Thread Dave Hall
Anthony,

I had recently found a reference in the Ceph docs that indicated something
like 40GB per TB for WAL+DB space.  For a 12TB HDD that comes out to
480GB.  If this is no longer the guideline I'd be glad to save a couple
dollars.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu

On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri 
wrote:

>
> Agreed.  I think oh …. maybe 15-20 years ago there was often a wider
> difference between SAS and SATA drives, but with modern queuing etc. my
> sense is that there is less of an advantage.  Seek and rotational latency I
> suspect dwarf interface differences wrt performance.  The HBA may be a
> bigger bottleneck (and way more trouble).
>
> 500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with
> RGW, or as dmcache or something?
>
> Depending on your constraints, QLC flash might be more competitive than
> you think ;)
>
> — aad
>
>
> > I suspect the behavior of the controller and the behavior of the drive
> firmware will end up mattering more than SAS vs SATA.  As always it's best
> if you can test it first before committing to buying a pile of them.
> Historically I have seen SATA drives that have performed well as far as
> HDDs go though.
> >
> >
> > Mark
> >
> > On 6/3/21 4:25 PM, Dave Hall wrote:
> >> Hello,
> >>
> >> We're planning another batch of OSD nodes for our cluster.  Our prior
> nodes
> >> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
> >> circumstances and the shortage of drives those 12TB SAS drives are in
> short
> >> supply.
> >>
> >> Our integrator has offered an option of 8 x 14TB SATA drives (still
> >> Enterprise).  For Ceph, will the switch to SATA carry a performance
> >> difference that I should be concerned about?
> >>
> >> Thanks.
> >>
> >> -Dave
> >>
> >> --
> >> Dave Hall
> >> Binghamton University
> >> kdh...@binghamton.edu
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: SAS vs SATA for OSD - WAL+DB sizing.

2021-06-03 Thread Dave Hall
Mark,

We are running a mix of RGW, RDB, and CephFS.  Our CephFS is pretty big,
but we're moving a lot of it to RGW.  What prompted me to go looking for a
guideline was a high frequency of Spillover warnings as our cluster filled
up past the 50% mark.  That was with 14.2.9, I think.  I understand that
some things have changed since, but I think I'd like to have the
flexibility and performance of a generous WAL+DB - the cluster is used to
store research data, and the usage pattern is tending to change as the
research evolves.  No telling what our mix will be a year from now.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
607-760-2328 (Cell)
607-777-4641 (Office)


On Thu, Jun 3, 2021 at 7:39 PM Mark Nelson  wrote:

> FWIW, those guidelines try to be sort of a one-size-fits-all
> recommendation that may not apply to your situation.  Typically RBD has
> pretty low metadata overhead so you can get away with smaller DB
> partitions.  4% should easily be enough.  If you are running heavy RGW
> write workloads with small objects, you will almost certainly use more
> than 4% for metadata (I've seen worst case up to 50%, but that was
> before column family sharding which should help to some extent).  Having
> said that, bluestore will roll the higher rocksdb levels over to the
> slow device and keep the wall, L0, and other lower LSM levels on the
> fast device.  It's not necessarily the end of the world if you end up
> with some of the more rarely used metadata on the HDD but having it on
> flash certain is nice.
>
>
> Mark
>
>
> On 6/3/21 5:18 PM, Dave Hall wrote:
> > Anthony,
> >
> > I had recently found a reference in the Ceph docs that indicated
> something
> > like 40GB per TB for WAL+DB space.  For a 12TB HDD that comes out to
> > 480GB.  If this is no longer the guideline I'd be glad to save a couple
> > dollars.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdh...@binghamton.edu
> >
> > On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri 
> > wrote:
> >
> >> Agreed.  I think oh …. maybe 15-20 years ago there was often a wider
> >> difference between SAS and SATA drives, but with modern queuing etc. my
> >> sense is that there is less of an advantage.  Seek and rotational
> latency I
> >> suspect dwarf interface differences wrt performance.  The HBA may be a
> >> bigger bottleneck (and way more trouble).
> >>
> >> 500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with
> >> RGW, or as dmcache or something?
> >>
> >> Depending on your constraints, QLC flash might be more competitive than
> >> you think ;)
> >>
> >> — aad
> >>
> >>
> >>> I suspect the behavior of the controller and the behavior of the drive
> >> firmware will end up mattering more than SAS vs SATA.  As always it's
> best
> >> if you can test it first before committing to buying a pile of them.
> >> Historically I have seen SATA drives that have performed well as far as
> >> HDDs go though.
> >>>
> >>> Mark
> >>>
> >>> On 6/3/21 4:25 PM, Dave Hall wrote:
>  Hello,
> 
>  We're planning another batch of OSD nodes for our cluster.  Our prior
> >> nodes
>  have been 8 x 12TB SAS drives plus 500GB NVMe per HDD.  Due to market
>  circumstances and the shortage of drives those 12TB SAS drives are in
> >> short
>  supply.
> 
>  Our integrator has offered an option of 8 x 14TB SATA drives (still
>  Enterprise).  For Ceph, will the switch to SATA carry a performance
>  difference that I should be concerned about?
> 
>  Thanks.
> 
>  -Dave
> 
>  --
>  Dave Hall
>  Binghamton University
>  kdh...@binghamton.edu
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io
>  To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io