K8s: pymongo.errors.ServerSelectionTimeoutError: mongodb:27017: [Errno -2] Name or service not known

Hello Community,

We are running tutor on our private network and deployed using Tutor + K8s.

In the LMS logs, we have observed that we are very frequently getting pymongo.errors.ServerSelectionTimeoutError: mongodb:27017: [Errno -2] Name or service not known. And this error sometimes results in 502 error for the end-users.

Traceback of LMS logs:

 Traceback (most recent call last):
  File "/openedx/venv/lib/python3.8/site-packages/mongodb_proxy.py", line 55, in wrapper
    return func(*args, **kwargs)
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 134, in find
    fp = self.fs.get(content_id)
  File "/openedx/venv/lib/python3.8/site-packages/gridfs/__init__.py", line 153, in get
    gout._ensure_file()
  File "/openedx/venv/lib/python3.8/site-packages/gridfs/grid_file.py", line 486, in _ensure_file
    self._file = self.__files.find_one({"_id": self.__file_id},
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1273, in find_one
    for result in cursor.limit(-1):
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1156, in next
    if len(self.__data) or self._refresh():
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1050, in _refresh
    self.__session = self.__collection.database.client._ensure_session()
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1810, in _ensure_session
    return self.__start_session(True, causal_consistency=False)
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1763, in __start_session
    server_session = self._get_server_session()
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1796, in _get_server_session
    return self._topology.get_server_session()
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 482, in get_server_session
    self._select_servers_loop(
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 208, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: mongodb:27017: [Errno -2] Name or service not known

We also found one thing odd about CONTENTSTORE, following is the value on the tutor server:

>>> settings.CONTENTSTORE
{
    "ENGINE": "xmodule.contentstore.mongo.MongoContentStore",
    "ADDITIONAL_OPTIONS": {},
    "DOC_STORE_CONFIG": {
        "host": "mongodb",
        "port": 27017,
        "user": None,
        "password": None,
        "db": "openedx",
    },
}

For CONTENTSTORE, values are different on the native installation:

>>> settings.CONTENTSTORE
{
    "ADDITIONAL_OPTIONS": {},
    "DOC_STORE_CONFIG": {
        "authsource": "",
        "collection": "modulestore",
        "connectTimeoutMS": 2000,
        "db": "edxapp",
        "host": "172.42.21.109",
        "password": "TEST",
        "port": 27017,
        "read_preference": "SECONDARY_PREFERRED",
        "replicaSet": "",
        "socketTimeoutMS": 3000,
        "ssl": False,
        "user": "edxapp",
    },
    "ENGINE": "xmodule.contentstore.mongo.MongoContentStore",
    "OPTIONS": {
        "auth_source": "",
        "db": "edxapp",
        "host": "172.42.21.109",
        "password": "TEST",
        "port": 27017,
        "ssl": False,
        "user": "edxapp",
    },
}

When we are getting this MongoDB error, we have also checked the STATUS of the MongoDB pod and it is up and running, not restarted once, and no error trace on MongoDB logs.

Tutor version:
tutor, version 12.2.0

Looking for help on this,

Thanks.