
I talked with the person in charge about your initial feedback and questions. 
The thought is to switch to a new setup and I was asked to pass it on and ask 
for thoughts on whether this would be sufficient or not.

Use case:
Overview: Need to provide shared storage/high-availability for (usually) 
low-volume web server instances using distributed, POSIX-compliant filesystem, 
running in Amazon Web Services. Database storage is not part of the cluster.
Logic: We know Ceph is probably overkill for our current use (and probably also 
for my future use), so why Ceph? It’s performance, when using CephFS, and its 
ability to support RBD (if we ever move to a container approach for web 
servers). I’ve tried Amazon EFS (NFS-as-a-service) and GlusterFS (both NFS and 
native client), and because of the number of small files we’re working with, 
something that takes ~15sec. in Ceph takes several minutes using other NFS or 
GlusterFS solutions.
Current Load: ~100 connected clients accessing ~20GB data of e-commerce related 
website source software.
Expected Future Load: ~5,000 connected clients access ~1TB data

Ceph Clients:
Primary Role: Web server & load balancer w/ SSL termination
Hardware Configuration: 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per 
website/domain/subdomain: 2ea t2.nano instances, load balanced behind haproxy, 
rarely manually-scaling up with new instances during expected load spikes. 
After initial “hits,” most of the website stays in local cache, resulting in 
generally-few iops against the Ceph cluster.)

Ceph Clusters:
Overall: 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability 
Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per 
Hardware Configuration (MON/MDS): r4.large instance class, 2vCPU, ~15GB ram, 
“up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root (not 
provisioned-IOPS), Ubuntu 16.04 LTS
Hardware Configuration (OSD): i3.large instance class, 2vCPU, ~15GB ram, “up to 
10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not 
provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, 
ephemeral storage for OSD (co-locating journal and data)

Proposed Layout:
AZ “A”:

  *   Server A-MM (r4.large instance):
     *   Mon.A & MDS.A for Cluster X
     *   Mon.A & MDS.A for Cluster Y
     *   Mon.A & MDS.A for Cluster Z
  *   Server A-OSD-1 (i3.large instance):
     *   OSD.0 for Cluster X
  *   Server A-OSD-2 (i3.large instance):
     *   OSD.0 for Cluster Z

AZ “B”:

  *   Server B-MM (r4.large instance):
     *   Mon.B & MDS.B for Cluster X
     *   Mon.B & MDS.B for Cluster Y
     *   Mon.B & MDS.B for Cluster Z
  *   Server B-OSD-1 (i3.large instance):
     *   OSD.1 for Cluster X
  *   Server B-OSD-2 (i3.large instance):
     *   OSD.0 for Cluster Y

AZ “C”:

  *   Server C-MM (r4.large instance):
     *   Mon.B & MDS.B for Cluster X
     *   Mon.B & MDS.B for Cluster Y
     *   Mon.B & MDS.B for Cluster Z
  *   Server C-OSD-1 (i3.large instance):
     *   OSD.1 for Cluster Y
  *   Server C-OSD-2 (i3.large instance):
     *   OSD.1 for Cluster Z

Alternative Layout:
Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per 
cluster for higher availability at the expense of disk read-write performance, 
and increase the number of clusters to 4.

Thank you for your time,


From: Christian Balzer <ch...@gol.com>
Sent: Thursday, March 16, 2017 2:30:49 AM
To: Ceph Users
Cc: Robin H. Johnson; Rich Rocque
Subject: Re: [ceph-users] Ceph Cluster Failures


On Thu, 16 Mar 2017 02:44:29 +0000 Robin H. Johnson wrote:

> On Thu, Mar 16, 2017 at 02:22:08AM +0000, Rich Rocque wrote:
> > Has anyone else run into this or have any suggestions on how to remedy it?
> We need a LOT more info.

> > After a couple months of almost no issues, our Ceph cluster has
> > started to have frequent failures. Just this week it's failed about
> > three times.
> >
> > The issue appears to be than an MDS or Monitor will fail and then all
> > clients hang. After that, all clients need to be forcibly restarted.
> - Can you define monitor 'failing' in this case?
> - What do the logs contain?
> - Is it running out of memory?
> - Can you turn up the debug level?
> - Has your cluster experienced continual growth and now might be
>   undersized in some regard?
A single MON failure should not cause any problems to boot.

"ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.

> > The architecture for our setup is:
> Are these virtual machines? The overall specs seem rather like VM
> instances rather than hardware.
There are small servers like that, but a valid question indeed.
In particular, if it is dedicated HW, FULL specs.

> > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers
> What sort of SSD are the monitor datastores on? ('mon data' in the
> config)
He doesn't mention SSDs in the MON/MDS context, so we could be looking at
something even slower. FULL SPECS.

4GB RAM would be fine for a single MON, but combined with MDS it may
be a bit tight.

> > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers
> 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
> How many OSD servers, what SSDs?
I think he means 12 individual servers. Again, there are micro servers
like that around, like:
Super Micro Computer, Inc. - Products | SuperServers | 2U 
2U Black Chassis : Backplane: BPN-SAS-217HQ: 1: 24-port 2U Twin^2 CSE-217HQ (6 
drives per node) backplane, support up to 24x 2.5-inch SAS/SATA HDD: Backplane

IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of
OS _and_ OSD is way too little for my taste and experience.


> What is the network setup & connectivity between them (hopefully
> 10Gbit).

Christian Balzer        Network/Systems Engineer
ch...@gol.com    Global OnLine Japan/Rakuten Communications
ceph-users mailing list

Reply via email to