Hi Reid,

Many thanks for this thoughtful response - very helpful and much
appreciated.

No doubt some additional experimentation will pay off as you noted.

One additional question: we currently use this heap setting:

-XX:MaxRAMFraction=2

I realize every environment and its tuning goals are different; though -
just generally - what do you think of MaxRAMFraction=2 with Java 8?

If the stateful set is configured with 16Gi memory, that setting would
allocate roughly 8Gi to the heap and seems a safe balance between
heap/nonheap. No worries if you don't have enough information to answer (as
I haven't shared our tuning goals), but any feedback is, again, appreciated.


On Mon, Nov 4, 2019 at 10:28 AM Reid Pinchback <rpinchb...@tripadvisor.com>
wrote:

> Hi Ben, just catching up over the weekend.
>
>
>
> The typical advice, per Sergio’s link reference, is an obvious starting
> point.  We use G1GC and normally I’d treat 8gig as the minimal starting
> point for a heap.  What sometimes doesn’t get talked about in the myriad of
> tunings, is that you have to have a clear goal in your mind on what you are
> tuning **for**. You could be tuning for throughput, or average latency,
> or 99’s latency, etc.  How you tune varies quite a lot according to your
> goal.  The more your goal is about latency, the more work you have ahead of
> you.
>
>
>
> I will suggest that, if your data footprint is going to stay low, that you
> give yourself permission to do some experimentation.  As you’re using K8s,
> you are in a bit of a position where if your usage is small enough, you can
> get 2x bang for the buck on your servers by sizing the pods to about 45% of
> server resources and using the C* rack metaphor to ensure you don’t
> co-locate replicas.
>
>
>
> For example, were I you, I’d start asking myself if SSTable compression
> mattered to me at all.  The reason I’d start asking myself questions like
> that is C* has multiple uses of memory, and one of the balancing acts is
> chunk cache and the O/S file cache.  If I could find a way to make my O/S
> file cache be a defacto C* cache, I’d roll up the shirt sleeves and see
> what kind of performance numbers I could squeeze out with some creative
> tuning experiments.  Now, I’m not saying **do** that, because your write
> volume also plays a roll, and you said you’re expecting a relatively even
> balance in reads and writes.  I’m just saying, by way of example, I’d start
> weighing if the advice I get online was based in experience similar to my
> current circumstance, or ones that were very different.
>
>
>
> R
>
>
>
> *From: *Ben Mills <b...@bitbrew.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, November 4, 2019 at 8:51 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: ***UNCHECKED*** Re: Memory Recommendations for G1GC
>
>
>
> *Message from External Sender*
>
> Hi (yet again) Sergio,
>
>
>
> Finally, note that we use this sidecar
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Stackdriver_stackdriver-2Dprometheus-2Dsidecar&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=EP6Ql6dsh_bz1U49OKL6IYmkd51gf4VD6m2QwaQJ0ZM&s=m9OmSlwbgoGmO8jUYlAF6b4fbWx82f8NlqqQtOqlwhQ&e=>
>  for
> shipping metrics to Stackdriver. It runs as a second container within our
> Prometheus stateful set.
>
>
>
>
>
> On Mon, Nov 4, 2019 at 8:46 AM Ben Mills <b...@bitbrew.com> wrote:
>
> Hi (again) Sergio,
>
>
>
> I forgot to note that along with Prometheus, we use Grafana (with
> Prometheus as its data source) as well as Stackdriver for monitoring.
>
>
>
> As Stackdriver is still developing (i.e. does not have all the features we
> need), we tend to use it for the basics (i.e. monitoring and alerting on
> memory, cpu and disk (PVs) thresholds). More specifically, the
> Prometheus JMX exporter (noted above) scrapes all the MBeans inside
> Cassandra, exporting in the Prometheus data model. Its config map filters
> (allows) our metrics of interest, and those metrics are sent to our Grafana
> instances and to Stackdriver. We use Grafana for more advanced metric
> configs that provide deeper insight in Cassandra - e.g. read/write
> latencies and so forth. For monitoring memory utilization, we monitor both
> pod-level in Stackdriver (i.e. to avoid having a Cassandra pod oomkilled by
> kubelet) as well as inside the JVM (heap space).
>
>
>
> Hope this helps.
>
>
>
> On Mon, Nov 4, 2019 at 8:26 AM Ben Mills <b...@bitbrew.com> wrote:
>
> Hi Sergio,
>
>
>
> Thanks for this and sorry for the slow reply.
>
>
>
> We are indeed still running Java 8 and so it's very helpful.
>
>
>
> This Cassandra cluster has been running reliably in Kubernetes for several
> years, and while we've had some repair-related issues, they are not related
> to container orchestration or the cloud environment. We don't use operators
> and have simply built the needed Kubernetes configs (YAML manifests) to
> handle deployment of new Docker images (when needed), and so forth. We have:
>
>
>
> (1) ConfigMap - Cassandra environment variables
>
> (2) ConfigMap - Prometheus configs for this JMX exporter
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_prometheus_jmx-5Fexporter&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=EP6Ql6dsh_bz1U49OKL6IYmkd51gf4VD6m2QwaQJ0ZM&s=l3csYnTFP-q25mQ57k36PlkMKj2OdN7JhM-vuSyKWh8&e=>,
> which is built into the image and runs as a Java agent
>
> (3) PodDisruptionBudget - with minAvailable: 2 as the important setting
>
> (4) Service - this is a headless service (clusterIP: None) which specifies
> the ports for cql, jmx, prometheus, intra-node
>
> (5) StatefulSet - 3 replicas, ports, health checks, resources, etc - as
> you would expect
>
>
>
> We store data on persistent volumes using an SSD storage class, and use:
> an updateStrategy of OnDelete, some affinity rules to ensure an even
> spread of pods across our zones, Prometheus annotations for scraping the
> metrics port, a nodeSelector and tolerations to ensure the Cassandra pods
> run in their dedicated node pool, and a preStop hook that runs nodetool
> drain to help with graceful shutdown when a pod is rolled.
>
>
>
> I'm guessing your installation is much larger than ours and so operators
> may be a good way to go. For our needs the above has been very reliable as
> has GCP in general.
>
>
>
> We are currently updating our backup/restore implementation to provide
> better granularity with respect to restoring a specific keyspace and also
> exploring Velero
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_vmware-2Dtanzu_velero&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=EP6Ql6dsh_bz1U49OKL6IYmkd51gf4VD6m2QwaQJ0ZM&s=70zhtNI28tFIrRscGslgaYQNrpcjuLOXKSCEuR3NoJw&e=>
> for DR.
>
>
>
> Hope this helps.
>
>
>
>
>
> On Fri, Nov 1, 2019 at 5:34 PM Sergio <lapostadiser...@gmail.com> wrote:
>
> Hi Ben,
>
> Well, I had a similar question and Jon Haddad was preferring ParNew + CMS
> over G1GC for java 8.
> https://lists.apache.org/thread.html/283547619b1dcdcddb80947a45e2178158394e317f3092b8959ba879@%3Cuser.cassandra.apache.org%3E
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_283547619b1dcdcddb80947a45e2178158394e317f3092b8959ba879-40-253Cuser.cassandra.apache.org-253E&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=EP6Ql6dsh_bz1U49OKL6IYmkd51gf4VD6m2QwaQJ0ZM&s=2myv56frHk6jkFgNvr-j11Upv8niune5BmB9GjRCd2c&e=>
> It depends on your JVM and in any case, I would test it based on your
> workload.
>
> What's your experience of running Cassandra in k8s. Are you using the
> Cassandra Kubernetes Operator?
>
> How do you monitor it and how do you perform disaster recovery backup?
>
>
> Best,
>
> Sergio
>
>
>
> Il giorno ven 1 nov 2019 alle ore 14:14 Ben Mills <b...@bitbrew.com> ha
> scritto:
>
> Thanks Sergio - that's good advice and we have this built into the plan.
>
> Have you heard a solid/consistent recommendation/requirement as to the
> amount of memory heap requires for G1GC?
>
>
>
> On Fri, Nov 1, 2019 at 5:11 PM Sergio <lapostadiser...@gmail.com> wrote:
>
> In any case I would test with tlp-stress or Cassandra stress tool any
> configuration
>
>
>
> Sergio
>
>
>
> On Fri, Nov 1, 2019, 12:31 PM Ben Mills <b...@bitbrew.com> wrote:
>
> Greetings,
>
>
>
> We are planning a Cassandra upgrade from 3.7 to 3.11.5 and considering a
> change to the GC config.
>
>
>
> What is the minimum amount of memory that needs to be allocated to heap
> space when using G1GC?
>
>
>
> For GC, we currently use CMS. Along with the version upgrade, we'll be
> running the stateful set of Cassandra pods on new machine types in a new
> node pool with 12Gi memory per node. Not a lot of memory but an
> improvement. We may be able to go up to 16Gi memory per node. We'd like to
> continue using these heap settings:
>
>
> -XX:+UnlockExperimentalVMOptions
> -XX:+UseCGroupMemoryLimitForHeap
> -XX:MaxRAMFraction=2
>
>
>
> which (if 12Gi per node) would provide 6Gi memory for heap (i.e. half of
> total available).
>
>
>
> Here are some details on the environment and configs in the event that
> something is relevant.
>
>
>
> Environment: Kubernetes
> Environment Config: Stateful set of 3 replicas
> Storage: Persistent Volumes
> Storage Class: SSD
> Node OS: Container-Optimized OS
> Container OS: Ubuntu 16.04.3 LTS
> Data Centers: 1
> Racks: 3 (one per zone)
> Nodes: 3
> Tokens: 4
> Replication Factor: 3
> Replication Strategy: NetworkTopologyStrategy (all keyspaces)
> Compaction Strategy: STCS (all tables)
> Read/Write Requirements: Blend of both
> Data Load: <1GB per node
> gc_grace_seconds: default (10 days - all tables)
>
> GC Settings: (CMS)
>
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=30000
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
>
>
>
> Any ideas are much appreciated.
>
>

Reply via email to