aws, elasticsearch,

How to resolve AWS Elasticsearch cluster in red or yellow state

Jun 30, 2021 · 3 mins read · Post a comment

Understanding the ES cluster status state is crucial, when it comes to ES troubleshooting. If your role is to keep ES cluster healthy then you have to understand all the three states that might show up. You can find more about the ES cluster state at the ES check cluster health.

This tutorial will show you how to debug your cluster status and get it back in a green state.

Prerequisites

  • AWS account
  • Full access to Elasticsearch Domain Service

Troubleshooting the ES cluster

Step 1. To check if your ES cluster is in red or yellow state due to some UNASSIGNED shards, run:

curl -XGET 'ES_Endpoint/_cat/shards?h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED

If there are UNASSIGNED shards the output should look like:

index1               1 r UNASSIGNED ALLOCATION_FAILED
index12              0 r UNASSIGNED ALLOCATION_FAILED
index11              5 r UNASSIGNED ALLOCATION_FAILED
index22              4 r UNASSIGNED NODE_LEFT
index25              2 r UNASSIGNED NODE_LEFT

Step 2. Executing the next command will identify the root cause of the cluster’s unassigned shards.

curl -XGET 'ES_Endpoint/_cluster/allocation/explain'

Possible reasons:

"details": "failed shard on node [25kOkdsad15OPSre5fsa_g]: failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[index11][1]: obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]];

allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes"

"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts

Step 3. After finding the issue, we need to figure out which indices are causing our cluster to enter red/yellow status, so we can use the following query:

  1. Red State
    curl -XGET 'ES_Endpoint/_cat/indices?v&health=red'
    

Output:

health status index                                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
red    open   index11                              9orZMYvPTA2TXR0UZIFyxA   5   0      55210        11870    302.2mb        302.2mb
red    open   index12                              1vKg-jGZSy2hA7-3H-qJrA   5   0     123580        29962    538.9mb        538.9mb
  1. Yellow State
    curl -XGET 'ES_Endpoint/_cat/indices?v&health=yellow'
    

Output:

health status index                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   index15                      xJ0KMwBiToagSwepcI663w   6   2     328022        54806    108.4mb         40.5mb
yellow open   index25                      2uqLbFH-Tl-Y5qCntiQVng   5   3      30280         5931    131.8mb         43.9mb

Step 4. Elasticsearch default value for re-allocating the shards is 5 which is not enough always, so we will have to increase it with the following query for all red and yellow indices:

curl -XPUT 'ES_Endpoint/<indexname>/_settings'
{
"index.allocation.max_retries" : 10
}

This will trigger an ES API call where the lead node will try to reallocate the shards for a specified index on the cluster.

Step 5. One more thing that you should consider in your ES cluster settings is whether the number of replica shards for a specific index is matching the configured ES number. Let’s say if you have configured "number_of_replicas": "2" for index25 in your ES cluster, and practically you have 3 or more, it will change the ES status, so you should make it as per your ES configuration.

curl -XPUT 'ES_Endpoint/<indexname>/_settings'
{
  "index" : {
    "number_of_replicas" : 2
  }
}

Conclusion

If you closely follow the above steps, your AWS Elasticsearch cluster should get back to his previous green state. Feel free to leave a comment below and if you find this tutorial useful, follow our official channel on Telegram.