Elastic container is not being removed

Sometimes when we execute tutor local stop, all containers are removed but elasticsearch container is not removed. It just timeout when executing “tutor local stop”.

We are not using the latest tutor version, we are on 3.8.0.

Any ideas on how to debug this issue? Any help will be appreciated.

Hi @cacciaresi!

  1. I understand that the elasticsearch container is not removed, but is it stopped?
  2. What do you mean by “It just timeout”? Is there an error log?

Hi @regis, thanks for the reply, about your questions:

  1. the container was not stopped, when running tutor local stop, all containers were stopped and removed, but elasstic container was not stopped.
  2. after some time the command tutor local stop finishes, but the elastic container was not stopped and I was not able to find an error log :(.

I’ve check the docker version installed, it was 17, and now is 19.03.6, with that upgrade now what is happening is that elastic container is always at 100% cpu (this was happening before too).

So maybe this inability of stopping the container was related with the version of docker that we are using.

@regis do you know if there is a way to re-index all the content and start again with elasticsearch data? I really can’t find the root cause of why elastic is always using 100% CPU.

A lot of these are shown in the elasticsearch logs:

[2020-02-25 20:30:38,799][INFO ][monitor.jvm ] [Kid Nova] [gc][old][189][63] duration [5s], collections [1]/[5.3s], total [5s]/[3.3m], memory [1.3gb]->[1.3gb]/[1.4gb], all_pools {[young] [252mb]->[258.9mb]/[266.2mb]}{[survivor] [0b]->[0b]/[33.2mb]}{[old] [1.1gb]->[1.1gb]/[1.1gb]}

We’ve changed the default configuration to

  • “ES_JAVA_OPTS=-Xms1500m -Xmx1500m”

as we were getting a lot of errors referring to “java.lang.OutOfMemoryError: Java heap space docker elasticsearch”

Any recommendations on improving the default settings? I’ve tried to check this; https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-configuration-memory.html#bootstrap-memory_lock but we are not able to set memory_lock to true, so basically this line: bootstrap.mlockall=true doesn’t seem to take effect.

Interesting. We could make the 1500m part be a tutor configuration value, like ELASTICSEARCH_HEAP_SIZE. You could then set it with tutor config save --set ELASTICSEARCH_HEAP_SIZE=1500m. Would that solve your problem?

Indeed. Actually, your comment makes me realise that none of the environment variables of the elasticsearch container are taken into account:

$ tutor local run lms curl elasticsearch:9200 | grep cluster_name
  "cluster_name" : "elasticsearch",
$ tutor local run lms curl elasticsearch:9200/_nodes/process?pretty | grep mlock
        "mlockall" : false

The cluster name should be “openedx” and “mlockall” should be true.

After investigating, I discovered this is because we run an older version of elasticsearch (1.5.2). Docker environment variables are not properly taken into account. This is a tutor issue which will be fixed in the next release. This will also address the “mlockall” issue.

EDIT: here is the proposed fix: https://github.com/overhangio/tutor/commit/83459d43d561820420d9bd7d995e5a6686e1325c

1 Like

Thanks a lot for your reply, will cherry-pick that commit and let you know how it works.

I can confirm that this fix resolves the issue that we had. Now cpu is normal :wink:

1 Like