Elasticsearch, as you might already know, is a free and open-source search and analytics engine developed in Java. In today’s topic we are going to cover the Garbage Collection part of ES. The Garbage Collection in Java is a process which performs automatic memory management. It pretty much do the same thing in Elasticsearch, so let’s see what types of GCs we got and explore their configuration.
- Elasticsearch Cluster
Currently, there are two types of Garbage Collectors: The concurrent Mark Sweep (CMS) and Garbage-First (G1GC) Collector. Both GC’s configuration could be found in the
jvm.options file under
/etc/elasticsearch directory. Let’s list the sample GC configuration in the latest version of ES
7.14, at the time of writing.
## GC configuration
## G1GC Configuration
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
As we can see, CMS is the default enabled Garbage Collector. Let’s go through each of the configuration flags respectively.
UseConcMarkSweepGC: Use the CMS Garbage Collector.
CMSInitiatingOccupancyFraction: Determines in percentage when the CMS GC will be triggered. The default value is 75, which means when the Old Generation space is 75% full, the CMS will start collection, but not neccesary so (usually it triggers before it reaches this percentage). If we need the CMS trigger to start exactly on the specified percentage, then we’ll be using the next parameter defined below.
UseCMSInitiatingOccupancyOnly: Use this only if you want to start the collection on the value of
UseG1GC: Use the Garbage First (G1) Collector.
Keep in mind, these are just the defaults one. There are a lot more configuration flags out there, like
-XX:MaxGCPauseMillis, that could be tweaked as well when tuning Elasticsearch clusters.
Speaking of tuning, here’s are few tips:
- Lowering the value of
CMSInitiatingOccupancyFractionwill trigger the collection earlier than expected, which will also lower the chance of allocation memory issue. A decent value range to test would be from
MaxGCPauseMillissets a maximum GC pause time defined in ms. The default one is 200ms, but this could be tested as well, in range from
400. You will be trading latency for throughput though.
- If heap size is smaller than 8GB, a tuned CMS GC should do fine. Everything above 8GB should be using G1GC. The heap size is the amount or memory allocated to the JVM of an Elasticsearch node.
Tuning the ES Garbage Collection configuration is a broad topic, and I’ve just covered the basics. Messing around with the GC configuration could be fun, but when applied and tested on non production ES clusters first.
Feel free to leave a comment below and if you find this tutorial useful, follow our official channel on Telegram.