Video transcript upload generates 500 error

nachham · August 5, 2020, 9:22pm

I was going through the system checking if everything all right. I found that uploading a video transcript generates an error!

tutor local logs --tail=100 cms

 | 2020-08-05 21:13:32,032 ERROR 19 [django.request] [user 3] log.py:228 - Internal Server Error: /transcripts/upload
cms_1              | Traceback (most recent call last):
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/core/handlers/exception.py", line 34, in inner
cms_1              |     response = get_response(request)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/core/handlers/base.py", line 115, in _get_response
cms_1              |     response = self.process_exception_by_middleware(e, request)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/core/handlers/base.py", line 113, in _get_response
cms_1              |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
cms_1              |   File "/opt/pyenv/versions/3.5.9/lib/python3.5/contextlib.py", line 30, in inner
cms_1              |     return func(*args, **kwds)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/contrib/auth/decorators.py", line 21, in _wrapped_view
cms_1              |     return view_func(request, *args, **kwargs)
cms_1              |   File "/openedx/edx-platform/cms/djangoapps/contentstore/views/transcripts_ajax.py", line 234, in upload_transcripts
cms_1              |     file_data=ContentFile(sjson_subs),
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/edxval/api.py", line 387, in create_or_update_video_transcript
cms_1              |     video_transcript, __ = VideoTranscript.create_or_update(video, language_code, metadata, file_data)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/edxval/models.py", line 534, in create_or_update
cms_1              |     video_transcript.transcript.save(file_name, transcript_file_data)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/db/models/fields/files.py", line 87, in save
cms_1              |     self.name = self.storage.save(name, content, max_length=self.field.max_length)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/django/core/files/storage.py", line 52, in save
cms_1              |     return self._save(name, content)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/storages/backends/s3boto3.py", line 495, in _save
cms_1              |     self._save_content(obj, content, parameters=parameters)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/storages/backends/s3boto3.py", line 510, in _save_content
cms_1              |     obj.upload_fileobj(content, ExtraArgs=put_parameters)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/boto3/s3/inject.py", line 513, in object_upload_fileobj
cms_1              |     ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/boto3/s3/inject.py", line 431, in upload_fileobj
cms_1              |     return future.result()
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/s3transfer/futures.py", line 73, in result
cms_1              |     return self._coordinator.result()
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/s3transfer/futures.py", line 233, in result
cms_1              |     raise self._exception
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/s3transfer/tasks.py", line 126, in __call__
cms_1              |     return self._execute_main(kwargs)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/s3transfer/tasks.py", line 150, in _execute_main
cms_1              |     return_value = self._main(**kwargs)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/s3transfer/upload.py", line 692, in _main
cms_1              |     client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/client.py", line 317, in _api_call
cms_1              |     return self._make_api_call(operation_name, kwargs)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/client.py", line 596, in _make_api_call
cms_1              |     request_signer=self._request_signer, context=request_context)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/hooks.py", line 242, in emit_until_response
cms_1              |     responses = self._emit(event_name, kwargs, stop_on_response=True)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/hooks.py", line 210, in _emit
cms_1              |     response = handler(**kwargs)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/handlers.py", line 209, in conditionally_calculate_md5
cms_1              |     calculate_md5(params, **kwargs)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/handlers.py", line 187, in calculate_md5
cms_1              |     binary_md5 = _calculate_md5_from_file(body)
cms_1              |   File "/openedx/venv/lib/python3.5/site-packages/botocore/handlers.py", line 201, in _calculate_md5_from_file
cms_1              |     md5.update(chunk)
cms_1              | TypeError: Unicode-objects must be encoded before hashing

Thanks

regis · August 15, 2020, 2:31pm

Hi @nachham! It appears that your upload is failing because it contains non-ASCII characters, such as éàÜ… Also, the error occurs in the s3/boto layer of the code, indicating that you are uploading to an S3 bucket. Are you using the tutor-minio plugin? Can you please share your transcript file so that we can attempt to reproduce the issue?

As a general rule, you should remember to indicate the tutor version that you are running as well as the list of enabled plugins. Both informations can be obtained by running:

tutor --version
tutor plugins list

nachham · August 15, 2020, 8:33pm

Hi @regis

Thanks for reaching out!

I was (i said i was because i rolled back to Ironwood) having this issue with Tutor 10.1.0 and Minio 10.1.0. (No S3 was in use)

Yes, the srt file name had a special character … But that was not a problem in Ironwood!
What was strange, Juniper failed to import the srt file of the default video (that of the founder of openedx). Juniper generates the 500 error too…it has no special character tough!

Sorry i cannot reproduce the error (as I downgraded Tutor)… but @juansele seems to have a similar issue, he may provide the log file…

Moreover, the decision to remove the vertical navigation in course main page is “questionable”. Very disappoinintg is the least to say. See Sections and Subsections' strange behavior on Fresh Install

At least, there could have been an option admins can decide if they want to enable or disable this feature…

Thanks
HN

regis · August 15, 2020, 8:47pm

Nonetheless, can you please share the transcript file that causes the issue? I’d like to investigate this further.

nachham · August 16, 2020, 12:41am

Sent to your email box.

nguyennk92 · August 18, 2020, 9:13am

I’m having a similar error. I think there are some breaking changes when updating from python2 to python3.
This is perfectly fine with python2 but broke with python3

So I came up with a patch:
https://github.com/nguyennk92/edx-platform/commit/e70cf63180b3acd5e19032235595c880cb35bc20.patch

It basically just encode the transcript content before passing to ContentFile

Apply it to Dockerfile and rebuild image:

RUN curl https://github.com/nguyennk92/edx-platform/commit/e70cf63180b3acd5e19032235595c880cb35bc20.patch | git apply -

nachham · August 18, 2020, 2:24pm

Thanks @nguyennk92
Cheers

regis · August 19, 2020, 8:13am

Oh God I didn’t see your message @nguyennk92 and I just spent two hours figuring out the exact same solution: https://github.com/edx/edx-platform/pull/24800 Good job solving this though! The patch will be included in the next tutor release.

regis · December 14, 2020, 11:54am

FYI this problem is also discussed here: https://discuss.openedx.org/t/course-import-issue-from-ironwood-to-juniper/3811/7
It appears that the problem still occurs in Koa, so I re-opened my PR: https://github.com/edx/edx-platform/pull/24800#issuecomment-744389775