elasticsearch,

Elasticsearch Garbage Collection

Sep 10, 2021 · 2 mins read · Post a comment

Elasticsearch, as you might already know, is a free and open-source search and analytics engine developed in Java. In today’s topic we are going to cover the Garbage Collection part of ES. The Garbage Collection in Java is a process which performs automatic memory management. It pretty much do the same thing in Elasticsearch, so let’s see what types of GCs we got and explore their configuration.

Prerequisites

  • Elasticsearch Cluster

GC configuration

Currently, there are two types of Garbage Collectors: The concurrent Mark Sweep (CMS) and Garbage-First (G1GC) Collector. Both GC’s configuration could be found in the jvm.options file under /etc/elasticsearch directory. Let’s list the sample GC configuration in the latest version of ES 7.14, at the time of writing.

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 8-13:-XX:-UseConcMarkSweepGC
# 8-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC

As we can see, CMS is the default enabled Garbage Collector. Let’s go through each of the configuration flags respectively.

  • UseConcMarkSweepGC: Use the CMS Garbage Collector.
  • CMSInitiatingOccupancyFraction: Determines in percentage when the CMS GC will be triggered. The default value is 75, which means when the Old Generation space is 75% full, the CMS will start collection, but not neccesary so (usually it triggers before it reaches this percentage). If we need the CMS trigger to start exactly on the specified percentage, then we’ll be using the next parameter defined below.
  • UseCMSInitiatingOccupancyOnly: Use this only if you want to start the collection on the value of CMSInitiatingOccupancyFraction.
  • UseG1GC: Use the Garbage First (G1) Collector.

Keep in mind, these are just the defaults one. There are a lot more configuration flags out there, like -XX:ParallelGCThreads, -XX:ConcGCThreads, -XX:InitiatingHeapOccupancyPercent, -XX:MaxGCPauseMillis, that could be tweaked as well when tuning Elasticsearch clusters.

Tuning

Speaking of tuning, here’s are few tips:

  • Lowering the value of CMSInitiatingOccupancyFraction will trigger the collection earlier than expected, which will also lower the chance of allocation memory issue. A decent value range to test would be from 50 to 75.
  • MaxGCPauseMillis sets a maximum GC pause time defined in ms. The default one is 200ms, but this could be tested as well, in range from 200 to 400. You will be trading latency for throughput though.
  • If heap size is smaller than 8GB, a tuned CMS GC should do fine. Everything above 8GB should be using G1GC. The heap size is the amount or memory allocated to the JVM of an Elasticsearch node.

Conclusion

Tuning the ES Garbage Collection configuration is a broad topic, and I’ve just covered the basics. Messing around with the GC configuration could be fun, but when applied and tested on non production ES clusters first.
Feel free to leave a comment below and if you find this tutorial useful, follow our official channel on Telegram.