Tutor k8s - Discovery job keeps failing

tutor, version 10.1.0
notes==10.1.0
minio==10.1.0
lts==9.1.6 (disabled)
ecommerce==10.1.0
discovery==10.1.0

Keep getting this error when running “tutor k8s quickstart”
Have tried to stop, rebuild images, and run init, all to no avail.

Here’s the log output of the discovery-job:

(tutor) admin@As-MacBook-Pro tutor % kubectl logs --namespace=openedx --selector=job-name=discovery-job-20200812100016
  File "/openedx/discovery/course_discovery/apps/course_metadata/management/commands/refresh_course_metadata.py", line 25, in execute_loader
    loader_class(*loader_args).ingest()
  File "/openedx/discovery/course_discovery/apps/course_metadata/data_loaders/api.py", line 37, in ingest
    response = self._make_request(initial_page)
  File "/openedx/venv/lib/python3.5/site-packages/backoff/_sync.py", line 100, in retry
    if giveup(e) or max_tries_exceeded or max_time_exceeded:
  File "/openedx/discovery/course_discovery/apps/course_metadata/data_loaders/api.py", line 68, in _fatal_code
    return ex.response.status_code != 429 and ex.response.status_code != 504  # pylint: disable=no-member
AttributeError: 'NoneType' object has no attribute 'status_code'
CommandError: One or more of the data loaders above failed.
  File "/openedx/discovery/course_discovery/apps/course_metadata/management/commands/refresh_course_metadata.py", line 25, in execute_loader
    loader_class(*loader_args).ingest()
  File "/openedx/discovery/course_discovery/apps/course_metadata/data_loaders/api.py", line 37, in ingest
    response = self._make_request(initial_page)
  File "/openedx/venv/lib/python3.5/site-packages/backoff/_sync.py", line 100, in retry
    if giveup(e) or max_tries_exceeded or max_time_exceeded:
  File "/openedx/discovery/course_discovery/apps/course_metadata/data_loaders/api.py", line 68, in _fatal_code
    return ex.response.status_code != 429 and ex.response.status_code != 504  # pylint: disable=no-member
AttributeError: 'NoneType' object has no attribute 'status_code'
CommandError: One or more of the data loaders above failed.

Any points in the right direction will be appreciated!

This AttributeError exception is hiding another error: the fact that the ex.response object is None (when it should not be).

Are there logs from the lms container?

@regis, the lms container only outputs

[2020-08-16 15:15:48 +0000] [6] [INFO] Starting gunicorn 20.0.4

[2020-08-16 15:15:48 +0000] [6] [INFO] Listening at: http://0.0.0.0:8000 (6)

[2020-08-16 15:15:48 +0000] [6] [INFO] Using worker: sync

[2020-08-16 15:15:48 +0000] [9] [INFO] Booting worker with pid: 9

[2020-08-16 15:15:48 +0000] [11] [INFO] Booting worker with pid: 11

To debug this further we need to run a fork of the course-discovery project that better handles error callbacks. Can you please try to run a custom “discovery” image that includes the following patch: https://github.com/overhangio/course-discovery/commit/f7e67d38338ad0d77959ad60e931a8630255d00e

On my laptop I build the image with:

tutor images build -a DISCOVERY_REPOSITORY=https://github.com/overhangio/course-discovery.git -a DISCOVERY_VERSION=overhangio/fix-fatal-callback discovery

Since you are running on Kubernetes you will have to push this image to your custom registry.

1 Like