[ceph-users] What if etcd is lost

2019-07-15 Thread Oscar Segarra
Hi,

I'm planning to deploy a ceph cluster using etcd as kv store.

I'm planning to deploy a stateless etcd docker to store the data.

I'd like to know if ceph cluster will be able to boot when etcd container
restarts (and looses al data written in it)

If the etcd container restarts when the ceph cluster (osd, mds, mon, mgr)
is working and stable, everything will continue working or any component
will stop working?

Mon's will be able to regen the keys?

Thanks a lot in advance
Óscar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Returning to the performance in a small cluster topic

2019-07-15 Thread Drobyshevskiy, Vladimir
Dear colleagues,

  I would like to ask you for help with a performance problem on a site
backed with ceph storage backend. Cluster details below.

  I've got a big problem with PostgreSQL performance. It runs inside a VM
with virtio-scsi ceph rbd image. And I see constant ~100% disk load with up
to hundreds milliseconds latencies (via atop) even when pg_top shows 10-20
tps. All other resources are almost untouched - there is a lot of memory
and free CPU cores, DB fits memory but still has performance issues.

  The cluster itself:
  nautilus
  6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
  Each node: 2x Intel Xeon E5-2665 v1 (governor = performance, powersaving
disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR Infiniband.

  I've made fio benchmarking with three type of measures:
  a VM with virtio-scsi driver,
  baremetal host with mounted rbd image
  and the same baremetal host with mounted lvm partition on SM863 SSD drive.

  I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8 jobs.

  Here are some results: https://pastebin.com/TFUg5fqA
  Drives load on the OSD hosts are very low, just a few percent.

  Here is my ceph config: https://pastebin.com/X5ZwaUrF

  Numbers don't look very good from my point of view but they are also not
really bad (are they?). But I don't really know the next direction I can go
to solve the problem with PostgreSQL.

  I've tried to make an RAID0 with mdraid and 2 virtual drives but haven't
noticed any difference.

  Could you please tell me:
  Are these performance numbers good or bad according to the hardware?
  Is it possible to tune anything more? May be you can point me to docs or
other papers?
  Does any special VM tuning for the PostgreSQL\ceph cooperation exist?

  Thank you in advance!

--
Best regards,
Vladimir
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-07-15 Thread Sage Weil
On Mon, 15 Jul 2019, Kaleb Keithley wrote:
> On Wed, Jun 5, 2019 at 11:58 AM Sage Weil  wrote:
> 
> > ...
> >
> > This has mostly worked out well, except that the mimic release received
> > less attention that we wanted due to the fact that multiple downstream
> > Ceph products (from Red Has and SUSE) decided to based their next release
> > on nautilus.  Even though upstream every release is an "LTS" release, as a
> > practical matter mimic got less attention than luminous or nautilus.
> >
> 
> Speaking as (one of) the Ceph packager(s) in Fedora:
> 
> 1) I have been told (don't recall by who now) that odd numbered releases
> are experimental or short term support or red haired step child releases.

In principle that hasn't been the case since 12.2.z (luminous).  13 has 
gotten somewhat less backport attention because the people doing backports 
work for Red Hat and SUSE and neither shipped a product based on 13.

> 2) even though there have been ceph-{10,11,12}.0.z tarballs on
> http://download.ceph.com/tarballs/ in the past, there still isn't a
> ceph-15.0.0.tar.gz there. And I was told that there wouldn't be one. Yes,
> I'm aware that X.0.z are alpha releases, X.1.z are beta releases, and X.2.z
> are GA releases.

In the past we haven't published tarballs until x.1.z (rc), since the dev 
checkpoints are not meant to be used by anyone other than developers.  
(They're really just a 'pause and make sure the qa suites are nice and 
clean', nothing more.)

> 3) ceph-13 was skipped over in Fedora because I could never get it to build
> in Fedora in the limited time I make to do upstream packaging of Ceph and
> other things (because it's not my $dayjob.) And there seemed to be no
> interest on the part of anyone in the Ceph devel community to fix it
> (because see #1?)
> 
> If Octopus is really an LTS release like all the others, and you want
> bleeding edge users to test/use it and give early feedback, then Fedora is
> probably one of the better places to get that feedback.

I think the first release worth testing outside of the dev community is 
the release candidate.  I don't like the idea of having any distro carry 
an untested dev checkpoint or else someone will lose data... even the rc 
should be tested cautiously and, since it is only relevant for a week or 
two, I'm not sure that distros can play much of a role there?

sage


> FWIW, In the absence of a ceph-14.0.z.tar.gz from
> http://download.ceph.com/tarballs/ branto came up with one (possibly from
> https://github.com/ceph/ceph/releases/...???)  but I haven't had any luck
> building ceph-15 with the tarball from there.
> 
> My 2¢.
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-07-15 Thread Thore Bödecker
Hey,

On 15.07.19 09:58, Kaleb Keithley wrote:
> Speaking as (one of) the Ceph packager(s) in Fedora:

Arch Linux packager for Ceph here o/

> If Octopus is really an LTS release like all the others, and you want
> bleeding edge users to test/use it and give early feedback, then Fedora is
> probably one of the better places to get that feedback.
> 
> FWIW, In the absence of a ceph-14.0.z.tar.gz from
> http://download.ceph.com/tarballs/ branto came up with one (possibly from
> https://github.com/ceph/ceph/releases/...???)  but I haven't had any luck
> building ceph-15 with the tarball from there.

I've been watching the RSS feed of the GitHub releases and use the
GitHub release tarballs for building it.

Due to unforseen complications and lack of time (similar situation
like you) the last packaged version of ceph in the Arch Linux repos is
13.2.1 right now.
Over the last couple of weeks I have spent numerous hours and worked
with people in oftc.#ceph-devel to work on the issues I was
experiencing while building it.
Right now I'm finalizing the 14.2.1 build and have some last kinks to
work out, for some reason the zstd compression tests are failing for
me, gonna look into that over the coming days.

Seeing that Arch Linux and Fedora should have a few more things common
then, let's say Ubuntu or Debian, I would like to offer my help
however I can. I'm not that familiar with building packages on/for
Fedora but I've had my fair share of debugging building ceph with
bleeding edge components (gcc, cmake, boost, python) and could
possibly help out there.


Feel free to get in touch, maybe we can help each other out.

Cheers,
Thore


PS: I was thinking about maybe getting Arch Linux (and Fedora)
onboarded to the ceph-dev-build matrix on jenkins.ceph.com.
Thinking about the possibility to spot arising issues and/or
incompatibilities with upstream / bleeding edge compilers, helpers and
dependencies that should have common intereset with Ceph upstream.
Although I haven't talked to anyone upstream about this and am just
throwing it out here.

-- 
Thore Bödecker

GPG ID: 0xD622431AF8DB80F3
GPG FP: 0F96 559D 3556 24FC 2226  A864 D622 431A F8DB 80F3


signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-07-15 Thread Sage Weil
On Mon, 15 Jul 2019, Kaleb Keithley wrote:
> On Mon, Jul 15, 2019 at 10:10 AM Sage Weil  wrote:
> 
> > On Mon, 15 Jul 2019, Kaleb Keithley wrote:
> > >
> > > If Octopus is really an LTS release like all the others, and you want
> > > bleeding edge users to test/use it and give early feedback, then Fedora
> > is
> > > probably one of the better places to get that feedback.
> >
> > I think the first release worth testing outside of the dev community is
> > the release candidate.  I don't like the idea of having any distro carry
> > an untested dev checkpoint or else someone will lose data... even the rc
> > should be tested cautiously and, since it is only relevant for a week or
> > two, I'm not sure that distros can play much of a role there?
> >
> 
> To be clear, when I talk about packaging a new version Ceph, it starts out
> by _only_ going into Fedora Rawhide. It would be extremely foolish, IMO,
> for anyone to run Rawhide for anything that's mission critical.
> 
> Sometimes it takes a long time to work through build and packaging issues.
> The switch to gcc-8 and gcc-9 are good illustrations of the kinds of issues
> that we, as packagers, run into. Those kinds of things are why I — at least
> — like to get as early a start as I can. Even short-lived release
> candidates are useful stepping stones to the eventual GA.
> 
> Personally I'd say that any fears anyone has of people losing data by using
> an early dev checkpoint of Ceph on Rawhide are probably a teeny bit
> misplaced.

Okay, it sounds like the rc x.1.z releases are a good fit, then.  They 
should start about ~2 months before the first stable release.  I'm not so 
sure about the dev checkpoints since they are quite arbitrary.  I suspect 
less effort would be needed to just do a manual build a few times 
partway through the cycle (e.g., now), identify any issues, and open PRs 
with build fixes.

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Returning to the performance in a small cluster topic

2019-07-15 Thread Jordan Share

We found shockingly bad committed IOPS/latencies on ceph.

We could get roughly 20-30 IOPS when running this fio invocation from 
within a vm:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k 
--numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1


For non-committed IO, we get about 2800 iops, with this invocation:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k 
--numjobs=1 --size=2G --runtime=150 --group_reporting


So, maybe, if PostreSQL has a lot of committed IO needs, you might not 
have the performance you're expecting.


You could try running your fio tests with "--fsync=1" and see if those 
numbers (which I'd expect to be very low) would be in line with your 
PostgreSQL performance.


Jordan


On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:

Dear colleagues,

   I would like to ask you for help with a performance problem on a site 
backed with ceph storage backend. Cluster details below.


   I've got a big problem with PostgreSQL performance. It runs inside a 
VM with virtio-scsi ceph rbd image. And I see constant ~100% disk load 
with up to hundreds milliseconds latencies (via atop) even when pg_top 
shows 10-20 tps. All other resources are almost untouched - there is a 
lot of memory and free CPU cores, DB fits memory but still has 
performance issues.


   The cluster itself:
   nautilus
   6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
   Each node: 2x Intel Xeon E5-2665 v1 (governor = performance, 
powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR Infiniband.


   I've made fio benchmarking with three type of measures:
   a VM with virtio-scsi driver,
   baremetal host with mounted rbd image
   and the same baremetal host with mounted lvm partition on SM863 SSD 
drive.


   I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8 jobs.

   Here are some results: https://pastebin.com/TFUg5fqA
   Drives load on the OSD hosts are very low, just a few percent.

   Here is my ceph config: https://pastebin.com/X5ZwaUrF

   Numbers don't look very good from my point of view but they are also 
not really bad (are they?). But I don't really know the next direction I 
can go to solve the problem with PostgreSQL.


   I've tried to make an RAID0 with mdraid and 2 virtual drives but 
haven't noticed any difference.


   Could you please tell me:
   Are these performance numbers good or bad according to the hardware?
   Is it possible to tune anything more? May be you can point me to docs 
or other papers?

   Does any special VM tuning for the PostgreSQL\ceph cooperation exist?
   Thank you in advance!

--
Best regards,
Vladimir

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What if etcd is lost

2019-07-15 Thread Frank Schilder
Hi Oscar,

ceph itself does not use etcd for anything. Hence, a deployed and operational 
cluster will not notice the presence or absence of an etcd store.

How much a loss of etcd means for your work depends on what you plan to store 
in it. If you look at the ceph/daemon container on docker, the last time I 
checked the code it stored only very little data and all of this would be 
re-build from the running cluster if you create and run a new etcd container. 
In this framework, it only affects how convenient deployment of new servers is. 
You could easily copy the few files it holds by hand to a new server. So etcd 
is not critical at all.

You should have a look at the deploy scripts/method for checking under what 
conditions you can loose and re-build an etcd store. In the example of 
ceph/daemon on docker, a rebuild requires execution on a node with the admin 
keyring (eg. a mon node) against a running cluster with mons in quorum.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: ceph-users  on behalf of Oscar Segarra 

Sent: 15 July 2019 11:55
To: ceph-users
Subject: [ceph-users] What if etcd is lost

Hi,

I'm planning to deploy a ceph cluster using etcd as kv store.

I'm planning to deploy a stateless etcd docker to store the data.

I'd like to know if ceph cluster will be able to boot when etcd container 
restarts (and looses al data written in it)

If the etcd container restarts when the ceph cluster (osd, mds, mon, mgr) is 
working and stable, everything will continue working or any component will stop 
working?

Mon's will be able to regen the keys?

Thanks a lot in advance
Óscar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Returning to the performance in a small cluster topic

2019-07-15 Thread Paul Emmerich
You are effectively measuring the latency with jobs=1 here (which is
appropriate considering that the WAL of a DB is effectively limited by
latency) and yeah, a networked file system will always be a little bit
slower than a local disk.

But I think you should be able to get a higher performance here:
* It sometimes helps to disable the write cache on the disks: hdparm -W 0
/dev/sdX
* Sometimes this helps: sysctl -w net.ipv4.tcp_low_latency=1
* Some more esoteric things about pinning processes to NUMA nodes usually
doesn't really help with latency (but throughput).

You can run "ceph daemon osd.X perf dump" to get really detailed statistics
about how much time the OSD is spending on the individual steps.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Jul 15, 2019 at 8:16 PM Jordan Share  wrote:

> We found shockingly bad committed IOPS/latencies on ceph.
>
> We could get roughly 20-30 IOPS when running this fio invocation from
> within a vm:
> fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
> --numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1
>
> For non-committed IO, we get about 2800 iops, with this invocation:
> fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
> --numjobs=1 --size=2G --runtime=150 --group_reporting
>
> So, maybe, if PostreSQL has a lot of committed IO needs, you might not
> have the performance you're expecting.
>
> You could try running your fio tests with "--fsync=1" and see if those
> numbers (which I'd expect to be very low) would be in line with your
> PostgreSQL performance.
>
> Jordan
>
>
> On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:
> > Dear colleagues,
> >
> >I would like to ask you for help with a performance problem on a site
> > backed with ceph storage backend. Cluster details below.
> >
> >I've got a big problem with PostgreSQL performance. It runs inside a
> > VM with virtio-scsi ceph rbd image. And I see constant ~100% disk load
> > with up to hundreds milliseconds latencies (via atop) even when pg_top
> > shows 10-20 tps. All other resources are almost untouched - there is a
> > lot of memory and free CPU cores, DB fits memory but still has
> > performance issues.
> >
> >The cluster itself:
> >nautilus
> >6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
> >Each node: 2x Intel Xeon E5-2665 v1 (governor = performance,
> > powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR
> Infiniband.
> >
> >I've made fio benchmarking with three type of measures:
> >a VM with virtio-scsi driver,
> >baremetal host with mounted rbd image
> >and the same baremetal host with mounted lvm partition on SM863 SSD
> > drive.
> >
> >I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8 jobs.
> >
> >Here are some results: https://pastebin.com/TFUg5fqA
> >Drives load on the OSD hosts are very low, just a few percent.
> >
> >Here is my ceph config: https://pastebin.com/X5ZwaUrF
> >
> >Numbers don't look very good from my point of view but they are also
> > not really bad (are they?). But I don't really know the next direction I
> > can go to solve the problem with PostgreSQL.
> >
> >I've tried to make an RAID0 with mdraid and 2 virtual drives but
> > haven't noticed any difference.
> >
> >Could you please tell me:
> >Are these performance numbers good or bad according to the hardware?
> >Is it possible to tune anything more? May be you can point me to docs
> > or other papers?
> >Does any special VM tuning for the PostgreSQL\ceph cooperation exist?
> >Thank you in advance!
> >
> > --
> > Best regards,
> > Vladimir
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] enterprise support

2019-07-15 Thread Void Star Nill
Hello,

Other than Redhat and SUSE, are there other companies that provide
enterprise support for Ceph?

Thanks,
Shridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] enterprise support

2019-07-15 Thread Eddy Castillon
Hi Void,

Canonical offers a very interesting offert.

https://ubuntu.com/openstack/storage

El lun., 15 de jul. de 2019 a la(s) 14:53, Void Star Nill (
void.star.n...@gmail.com) escribió:

> Hello,
>
> Other than Redhat and SUSE, are there other companies that provide
> enterprise support for Ceph?
>
> Thanks,
> Shridhar
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 

Sincerely,

Eddy Castillon
+51 934782232
eddy.castil...@qualifacts.com

Qualifacts, Inc. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] enterprise support

2019-07-15 Thread Brady Deetz
https://www.mirantis.com/software/ceph/

On Mon, Jul 15, 2019 at 2:53 PM Void Star Nill 
wrote:

> Hello,
>
> Other than Redhat and SUSE, are there other companies that provide
> enterprise support for Ceph?
>
> Thanks,
> Shridhar
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread DHilsbos
All;

I'm digging deeper into the capabilities of Ceph, and I ran across this:
http://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/
Which seems really interesting, except...

This feature seems to require custom cluster naming to function, which is 
deprecated in Nautilus, and not all commands adhere to a passed cluster name 
parameter.

Does RBD-Mirroring still work in Nautilus?
Does RBD-Mirroring in Nautilus still depend on custom cluster names?
How does a custom cluster name get properly implemented?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] enterprise support

2019-07-15 Thread Robert LeBlanc
We recently used Croit (https://croit.io/) and they were really good.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jul 15, 2019 at 12:53 PM Void Star Nill 
wrote:

> Hello,
>
> Other than Redhat and SUSE, are there other companies that provide
> enterprise support for Ceph?
>
> Thanks,
> Shridhar
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread Paul Emmerich
No worries, that's just the names of the config files/keyrings on the
mirror server which needs to access both clusters and hence two different
ceph.conf files.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Jul 15, 2019 at 10:25 PM  wrote:

> All;
>
> I'm digging deeper into the capabilities of Ceph, and I ran across this:
> http://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/
> Which seems really interesting, except...
>
> This feature seems to require custom cluster naming to function, which is
> deprecated in Nautilus, and not all commands adhere to a passed cluster
> name parameter.
>
> Does RBD-Mirroring still work in Nautilus?
> Does RBD-Mirroring in Nautilus still depend on custom cluster names?
> How does a custom cluster name get properly implemented?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread DHilsbos
Paul;

If I understand you correctly:
I will have 2 clusters, each named "ceph" (internally).
As such, each will have a configuration file at: /etc/ceph/ceph.conf
I would copy the other clusters configuration file to something like: 
/etc/ceph/remote.conf
Then the commands (run on the local mirror) would look like this:
rbd mirror pool peer add image-pool [client-name]@ceph (uses default cluster 
name to reference local cluster)
rbd --cluster remote mirror pool add image-pool [client-name]@remote

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Sent: Monday, July 15, 2019 1:31 PM
To: Dominic Hilsbos
Cc: Ceph Users
Subject: Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

No worries, that's just the names of the config files/keyrings on the mirror 
server which needs to access both clusters and hence two different ceph.conf 
files.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Jul 15, 2019 at 10:25 PM  wrote:
All;

I'm digging deeper into the capabilities of Ceph, and I ran across this:
http://docs.ceph.com/docs/nautilus/rbd/rbd-mirroring/
Which seems really interesting, except...

This feature seems to require custom cluster naming to function, which is 
deprecated in Nautilus, and not all commands adhere to a passed cluster name 
parameter.

Does RBD-Mirroring still work in Nautilus?
Does RBD-Mirroring in Nautilus still depend on custom cluster names?
How does a custom cluster name get properly implemented?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread Michel Raabe

Hi,


On 15.07.19 22:42, dhils...@performair.com wrote:

Paul;

If I understand you correctly:
I will have 2 clusters, each named "ceph" (internally).
As such, each will have a configuration file at: /etc/ceph/ceph.conf
I would copy the other clusters configuration file to something like: 
/etc/ceph/remote.conf
Then the commands (run on the local mirror) would look like this:
rbd mirror pool peer add image-pool [client-name]@ceph (uses default cluster 
name to reference local cluster)
rbd --cluster remote mirror pool add image-pool [client-name]@remote


yes...and the same for the keyring - remote.client.admin.keyring or 
remote.rbd-mirror.keyring


Regards,
Michel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What if etcd is lost

2019-07-15 Thread Oscar Segarra
Hi Frank,

Thanks a lot for your quick response.

Yes, the use case that concerns me is the following:

1.- I bootstrap a complete cluster mons, osds, mgr, mds, nfs, etc using
etcd as a key store
2.- There is an electric blackout and all nodes of my cluster goes down and
all data in my etcd is lost (but muy osd disks have useful data)

I'd like to know if I poweron all servers, the mon's will be able to "join"
all other ceph daemons including osd's (that only use data stored in kv)

docker run -d --net=host \
--privileged=true \
--pid=host \
-v /dev/:/dev/ \
-e OSD_DEVICE=/dev/vdd \
-e KV_TYPE=etcd \
-e KV_IP=192.168.0.20 \
ceph/daemon osd


The other use case that concerns to me is the following:

1.- I bootstrap a complete cluster mons, osds, mgr, mds, nfs, etc using
etcd as a key store
2.- Etcd conainer restarts and loses all data (it is stateless)

In this scenario, will I be able to add a new osd to the cluster?

You are talking about "rebuild"... is there any documentation about this?

Thanks a lot for your help,
Óscar



El lun., 15 jul. 2019 a las 20:22, Frank Schilder () escribió:

> Hi Oscar,
>
> ceph itself does not use etcd for anything. Hence, a deployed and
> operational cluster will not notice the presence or absence of an etcd
> store.
>
> How much a loss of etcd means for your work depends on what you plan to
> store in it. If you look at the ceph/daemon container on docker, the last
> time I checked the code it stored only very little data and all of this
> would be re-build from the running cluster if you create and run a new etcd
> container. In this framework, it only affects how convenient deployment of
> new servers is. You could easily copy the few files it holds by hand to a
> new server. So etcd is not critical at all.
>
> You should have a look at the deploy scripts/method for checking under
> what conditions you can loose and re-build an etcd store. In the example of
> ceph/daemon on docker, a rebuild requires execution on a node with the
> admin keyring (eg. a mon node) against a running cluster with mons in
> quorum.
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: ceph-users  on behalf of Oscar
> Segarra 
> Sent: 15 July 2019 11:55
> To: ceph-users
> Subject: [ceph-users] What if etcd is lost
>
> Hi,
>
> I'm planning to deploy a ceph cluster using etcd as kv store.
>
> I'm planning to deploy a stateless etcd docker to store the data.
>
> I'd like to know if ceph cluster will be able to boot when etcd container
> restarts (and looses al data written in it)
>
> If the etcd container restarts when the ceph cluster (osd, mds, mon, mgr)
> is working and stable, everything will continue working or any component
> will stop working?
>
> Mon's will be able to regen the keys?
>
> Thanks a lot in advance
> Óscar
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Returning to the performance in a small cluster topic

2019-07-15 Thread Marc Roos
 

Isn't that why you suppose to test up front? So you do not have shocking 
surprises? You can find in the mailing list archives some performance 
references also. 
I think it would be good to publish some performance results on the 
ceph.com website. Can’t be to difficult to put some default scenarios, 
used hardware and performance there in some nice graphs. I take it some 
here would be willing to contribute test results of their 
test/production clusters. This way new ceph’ers know what to expect 
from similar setups.



-Original Message-
From: Jordan Share [mailto:readm...@krotus.com] 
Sent: maandag 15 juli 2019 20:16
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Returning to the performance in a small 
cluster topic

We found shockingly bad committed IOPS/latencies on ceph.

We could get roughly 20-30 IOPS when running this fio invocation from 
within a vm:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1

For non-committed IO, we get about 2800 iops, with this invocation:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=150 --group_reporting

So, maybe, if PostreSQL has a lot of committed IO needs, you might not 
have the performance you're expecting.

You could try running your fio tests with "--fsync=1" and see if those 
numbers (which I'd expect to be very low) would be in line with your 
PostgreSQL performance.

Jordan


On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:
> Dear colleagues,
> 
>    I would like to ask you for help with a performance problem on a 
> site backed with ceph storage backend. Cluster details below.
> 
>    I've got a big problem with PostgreSQL performance. It runs inside 
> a VM with virtio-scsi ceph rbd image. And I see constant ~100% disk 
> load with up to hundreds milliseconds latencies (via atop) even when 
> pg_top shows 10-20 tps. All other resources are almost untouched - 
> there is a lot of memory and free CPU cores, DB fits memory but still 
> has performance issues.
> 
>    The cluster itself:
>    nautilus
>    6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
>    Each node: 2x Intel Xeon E5-2665 v1 (governor = performance, 
> powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR 
Infiniband.
> 
>    I've made fio benchmarking with three type of measures:
>    a VM with virtio-scsi driver,
>    baremetal host with mounted rbd image
>    and the same baremetal host with mounted lvm partition on SM863 SSD 

> drive.
> 
>    I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8 
jobs.
> 
>    Here are some results: https://pastebin.com/TFUg5fqA
>    Drives load on the OSD hosts are very low, just a few percent.
> 
>    Here is my ceph config: https://pastebin.com/X5ZwaUrF
> 
>    Numbers don't look very good from my point of view but they are 
> also not really bad (are they?). But I don't really know the next 
> direction I can go to solve the problem with PostgreSQL.
> 
>    I've tried to make an RAID0 with mdraid and 2 virtual drives but 
> haven't noticed any difference.
> 
>    Could you please tell me:
>    Are these performance numbers good or bad according to the 
hardware?
>    Is it possible to tune anything more? May be you can point me to 
> docs or other papers?
>    Does any special VM tuning for the PostgreSQL\ceph cooperation 
exist?
>    Thank you in advance!
> 
> --
> Best regards,
> Vladimir
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Returning to the performance in a small cluster topic

2019-07-15 Thread Jordan Share
All "normal" VM usage is about what you'd expect, since a lot of apps or 
system software is still written from the days of spinning disks, when 
this (tens of ops) is the level of committed IOPS you can get from them. 
 So they let the OS cache writes and only sync when needed.


Some applications, like etcd, are very careful about their state (which 
is reasonable) and call sync after (basically) every IO.  The etcd docs 
talk about needing a high amount of "sequential IO", which we tested, 
and is fine.  But what they actually need is a high amount of 
*committed* (or sync'd, I'm not sure there is a general term for this) 
IO, which we did not test.


Our cluster works great (~1500-2500 IOPS) for normal VM use case 
(occasional syncs, cached writes), and thus I don't think it is 
hyperbole to say it is shocking how much lower (100x) the committed IOPS 
are.


I do agree that this could be a lot better documented, or a lot more 
clearly laid out.


Another documentation problem (which the balancer has more or less 
elminated) is that the docs tended (a couple years ago) to make you 
think you'd get more even utilization if you just added more PGs.  When 
really that just gives you a smoother curve, vs. a taller/narrower curve.


Jordan

On 7/15/2019 3:00 PM, Marc Roos wrote:
  


Isn't that why you suppose to test up front? So you do not have shocking
surprises? You can find in the mailing list archives some performance
references also.
I think it would be good to publish some performance results on the
ceph.com website. Can’t be to difficult to put some default scenarios,
used hardware and performance there in some nice graphs. I take it some
here would be willing to contribute test results of their
test/production clusters. This way new ceph’ers know what to expect
from similar setups.



-Original Message-
From: Jordan Share [mailto:readm...@krotus.com]
Sent: maandag 15 juli 2019 20:16
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Returning to the performance in a small
cluster topic

We found shockingly bad committed IOPS/latencies on ceph.

We could get roughly 20-30 IOPS when running this fio invocation from
within a vm:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=60 --group_reporting --fsync=1

For non-committed IO, we get about 2800 iops, with this invocation:
fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k
--numjobs=1 --size=2G --runtime=150 --group_reporting

So, maybe, if PostreSQL has a lot of committed IO needs, you might not
have the performance you're expecting.

You could try running your fio tests with "--fsync=1" and see if those
numbers (which I'd expect to be very low) would be in line with your
PostgreSQL performance.

Jordan


On 7/15/2019 7:08 AM, Drobyshevskiy, Vladimir wrote:

Dear colleagues,

    I would like to ask you for help with a performance problem on a
site backed with ceph storage backend. Cluster details below.

    I've got a big problem with PostgreSQL performance. It runs inside
a VM with virtio-scsi ceph rbd image. And I see constant ~100% disk
load with up to hundreds milliseconds latencies (via atop) even when
pg_top shows 10-20 tps. All other resources are almost untouched -
there is a lot of memory and free CPU cores, DB fits memory but still
has performance issues.

    The cluster itself:
    nautilus
    6 nodes, 7 SSD with 2 OSDs per SSD (14 OSDs in overall).
    Each node: 2x Intel Xeon E5-2665 v1 (governor = performance,
powersaving disabled), 64GB RAM, Samsung SM863 1.92TB SSD, QDR

Infiniband.


    I've made fio benchmarking with three type of measures:
    a VM with virtio-scsi driver,
    baremetal host with mounted rbd image
    and the same baremetal host with mounted lvm partition on SM863 SSD



drive.

    I've set bs=8k (as Postgres writes 8k blocks) and tried 1 and 8

jobs.


    Here are some results: https://pastebin.com/TFUg5fqA
    Drives load on the OSD hosts are very low, just a few percent.

    Here is my ceph config: https://pastebin.com/X5ZwaUrF

    Numbers don't look very good from my point of view but they are
also not really bad (are they?). But I don't really know the next
direction I can go to solve the problem with PostgreSQL.

    I've tried to make an RAID0 with mdraid and 2 virtual drives but
haven't noticed any difference.

    Could you please tell me:
    Are these performance numbers good or bad according to the

hardware?

    Is it possible to tune anything more? May be you can point me to
docs or other papers?
    Does any special VM tuning for the PostgreSQL\ceph cooperation

exist?

    Thank you in advance!

--
Best regards,
Vladimir

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/c

Re: [ceph-users] Natlius, RBD-Mirroring & Cluster Names

2019-07-15 Thread Jason Dillaman
On Mon, Jul 15, 2019 at 4:50 PM Michel Raabe  wrote:
>
> Hi,
>
>
> On 15.07.19 22:42, dhils...@performair.com wrote:
> > Paul;
> >
> > If I understand you correctly:
> > I will have 2 clusters, each named "ceph" (internally).
> >   As such, each will have a configuration file at: /etc/ceph/ceph.conf
> > I would copy the other clusters configuration file to something like: 
> > /etc/ceph/remote.conf
> > Then the commands (run on the local mirror) would look like this:
> > rbd mirror pool peer add image-pool [client-name]@ceph (uses default 
> > cluster name to reference local cluster)
> > rbd --cluster remote mirror pool add image-pool [client-name]@remote
>
> yes...and the same for the keyring - remote.client.admin.keyring or
> remote.rbd-mirror.keyring

With Ceph Nautlius, you actually do not need to copy / modify any
ceph.conf (like) files. You can store the mon address and CephX key
for the remote cluster within the local cluster's config-key store
(see --remote-mon-host and --remote-key-file rbd CLI options) or use
the Ceph Dashboard to provide the credentials.

> Regards,
> Michel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance IOPS

2019-07-15 Thread Christian Wuerdig
Option 1 is the official way, option 2 will be a lot faster if it works for
you (I was never in the situation requiring this so can't say) and option 3
is for filestore and not applicable to bluestore

On Wed, 10 Jul 2019 at 07:55, Davis Mendoza Paco 
wrote:

> What would be the most appropriate procedure to move blockdb/wal to SSD?
>
> 1.- remove the OSD and recreate it (affects the performance)
> ceph-volume lvm prepare --bluestore --data  --block.wal
>  --block.db 
>
> 2.- Follow the documentation
>
> http://heiterbiswolkig.blogs.nde.ag/2018/04/08/migrating-bluestores-block-db/
>
> 3.- Follow the documentation
>
> https://swamireddy.wordpress.com/2016/02/19/ceph-how-to-add-the-ssd-journal/
>
> Thanks for the help
>
> El dom., 7 jul. 2019 a las 14:39, Christian Wuerdig (<
> christian.wuer...@gmail.com>) escribió:
>
>> One thing to keep in mind is that the blockdb/wal becomes a Single Point
>> Of Failure for all OSDs using it. So if that SSD dies essentially you have
>> to consider all OSDs using it as lost. I think most go with something like
>> 4-8 OSDs per blockdb/wal drive but it really depends how risk-averse you
>> are, what your budget is etc. Given that you only have 5 nodes I'd probably
>> go for fewer OSDs per blockdb device.
>>
>>
>> On Sat, 6 Jul 2019 at 02:16, Davis Mendoza Paco 
>> wrote:
>>
>>> Hi all,
>>> I have installed ceph luminous, witch 5 nodes(45 OSD), each OSD server
>>> supports up to 16HD and I'm only using 9
>>>
>>> I wanted to ask for help to improve IOPS performance since I have about
>>> 350 virtual machines of approximately 15 GB in size and I/O processes are
>>> very slow.
>>> You who recommend me?
>>>
>>> In the documentation of ceph recommend using SSD for the journal, my
>>> question is
>>> How many SSD do I have to enable per server so that the journals of the
>>> 9 OSDs can be separated into SSDs?
>>>
>>> I currently use ceph with OpenStack, on 11 servers with SO Debian
>>> Stretch:
>>> * 3 controller
>>> * 3 compute
>>> * 5 ceph-osd
>>>   network: bond lacp 10GB
>>>   RAM: 96GB
>>>   HD: 9 disk SATA-3TB (bluestore)
>>>
>>> --
>>> *Davis Mendoza P.*
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
> --
> *Davis Mendoza P.*
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com