Export Library and Course error with MINIO (S3) integrated

Hi Team ,

I have integrated MINIO (S3) and the objects are correctly being stored into respective buckets. I am getting below error when exporting a library or a course, although the generated library gz file and course gz file gets saved to respective S3 bucket correctly. Similar issue with learner’s profile image as well. Please suggest on this.

<Error>
<Code>InvalidRequest</Code>
<Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<Resource>/openedx-xxxxxxxxxxxxx/user_tasks/2022/01/18/library.lzws9teu.tar.gz</Resource>
<RequestId/>
<HostId>xxxxxxxxxxxxxxxxxxxx</HostId>
</Error>
<Error>
<Code>InvalidRequest</Code>
<Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message>
<Resource>/openedx-xxxxxxxxxxx/user_tasks/2022/01/18/course.q3ljxjq1.tar.gz</Resource>
<RequestId/>
<HostId>xxxxxxxxxxxxxxxxxxxxx</HostId>
</Error>

Hi, @regis

Tutor version: 13.1.0

I confirm that I have the same error when exporting a course with the minio plugin activated

minio_1                      | API: SYSTEM()
minio_1                      | Time: 17:02:51 UTC 01/18/2022
minio_1                      | DeploymentID: 10659882-5f67-4d3b-9bb8-2a77299296a5
minio_1                      | Error: invalid semicolon separator in query (*errors.errorString)
minio_1                      |        3: cmd/auth-handler.go:110:cmd.getRequestAuthType()
minio_1                      |        2: cmd/auth-handler.go:489:cmd.setAuthHandler.func1()
minio_1                      |        1: net/http/server.go:2046:http.HandlerFunc.ServeHTTP()

@sbernesto I couldn’t fix it so far. Did u get a fix for it ?
@regis Any suggestions to fix this issue ?

Hi all,
I did not manage to reproduce your issue. However I did find a different issue which will affect only local instances (running at files.local.overhang.io):

cms-worker_1                 | [2022-01-24 08:24:15,569: ERROR/ForkPoolWorker-1] cms.djangoapps.contentstore.tasks.export_olx[3faa2aa3-e318-4c07-bc58-16264c2d9a7e]: Error exporting course course-v1:org+test101+a
lpha                                                                                                                                                                                                               
cms-worker_1                 | Traceback (most recent call last):                                                                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 542, in urlopen                                              
cms-worker_1                 |     httplib_response = self._make_request(conn, method, url,                                                                                                                        
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/connectionpool.py", line 349, in _make_request                                        
cms-worker_1                 |     conn.request(method, url, **httplib_request_kw)                                                                                                                                 
cms-worker_1                 |   File "/opt/pyenv/versions/3.8.12/lib/python3.8/http/client.py", line 1256, in request                                                                                             
cms-worker_1                 |     self._send_request(method, url, body, headers, encode_chunked)                                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 129, in _send_request                                                                               
cms-worker_1                 |     rval = HTTPConnection._send_request(                                                                                                                                            
cms-worker_1                 |   File "/opt/pyenv/versions/3.8.12/lib/python3.8/http/client.py", line 1302, in _send_request                                                                                       
cms-worker_1                 |     self.endheaders(body, encode_chunked=encode_chunked)                                                                                                                            
cms-worker_1                 |   File "/opt/pyenv/versions/3.8.12/lib/python3.8/http/client.py", line 1251, in endheaders                                                                                          
cms-worker_1                 |     self._send_output(message_body, encode_chunked=encode_chunked)                                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 157, in _send_output                                                                                
cms-worker_1                 |     self.send(msg)                                                                                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 242, in send                                                                                        
cms-worker_1                 |     return HTTPConnection.send(self, str)                                                                                                                                           
cms-worker_1                 |   File "/opt/pyenv/versions/3.8.12/lib/python3.8/http/client.py", line 951, in send                                                                                                 
cms-worker_1                 |     self.connect()                                                                                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/connection.py", line 155, in connect                                                  
cms-worker_1                 |     conn = self._new_conn()                                                                                                                                                         
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/connection.py", line 133, in _new_conn                                                
cms-worker_1                 |     conn = connection.create_connection(                                                                                                                                            
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/util/connection.py", line 88, in create_connection                                    
cms-worker_1                 |     raise err                                                                                                                                                                       
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/botocore/vendored/requests/packages/urllib3/util/connection.py", line 78, in create_connection
cms-worker_1                 |     sock.connect(sa)                                                      
cms-worker_1                 | ConnectionRefusedError: [Errno 111] Connection refused

A fix for this will be pushed shortly. (see PR)

To better understand your own issue, please provide the following details:

  1. Is https enabled?
  2. Did you enable S3 storage?
  3. Did you customize your minio settings in any way? If yes, please precisely describe how.

EDIT: I managed to replicate the issue. What was missing was that I did not click the “Download Exported Course” button after generating the course. Now investigating…

Hi,

  1. No
  2. No
  3. No

In my case it is a fresh installation, I just activated the plugin and ran the tutor local quickstart command and started testing with a course that imports

Here are my findings:

  1. The InvalidRequest error stems from the fact that the url querystring is improperly url-encoded. The argument that is causing an issue is response-content-disposition. response-content-disposition=attachment; filename="course.g7wg323t.tar.gz" triggers the error. Replacing this argument by response-content-disposition=attachment%3B+filename%3D%22course.g7wg323t.tar.gz%22 fixes the error.
  2. The response-content-disposition argument is set in the import_export.export_status_handler function: edx-platform/import_export.py at 91a4ab7c1b28904cba341d423f664d13d71f9887 · openedx/edx-platform · GitHub I would expect that the parameters argument is properly url-encoded by boto3.
  3. To reproduce the incorrect url, run:
./manage.py cms shell
from django.conf import settings
from django.core.files.storage import get_storage_class
Storage = get_storage_class(settings.COURSE_IMPORT_EXPORT_STORAGE)
s = Storage()
s.url(name="pouac.xml", parameters={'ResponseContentDisposition': 'attachment; filename="pouac.xml"'}
)

We can see that the generated url does not properly encode its querystring.

  1. This seems to be a bug in boto3. We are using boto3==1.4.8. The latest version is 1.20.41. Let’s try to upgrade… The piece of code above still returns a url that contains a querystring which is not url-encoded.
  2. Interestingly, the bug does not occur when we call the boto3 API directly:
>> s.bucket.meta.client.generate_presigned_url('get_object', Params={'ResponseContentDisposition': 'attachment; filename="pouac.xml"', 'Bucket': s.bucket.name, "Key": "pouac.xml"})
'http://files.local.overhang.io/openedx/pouac.xml?response-content-disposition=attachment%3B%20filename%3D%22pouac.xml%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=openedx%2F20220124%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220124T161250Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=e6840e6ebc2202f0aeff6180fc688a2e20511917168fb0f2f99010e2b425d846'

This means that the bug stems from django-storages.

  1. Indeed, when looking at the source code of the url function in the django-storages package, we see that it’s the _strip_signing_parameters that un-escapes the querystring:
>>> s._strip_signing_parameters(s.bucket.meta.client.generate_presigned_url('get_object', Params={'ResponseContentDisposition': 'attachment; filename="pouac.xml"', 'Bucket': s.bucket.name, "Key": "pouac.xml"}, ExpiresIn=s.querystring_expire))
'http://files.local.overhang.io/openedx/pouac.xml?response-content-disposition=attachment; filename="pouac.xml"'
  1. We are using django-storages==1.8. The latest release is 1.12.3. Let’s try to upgrade… Nope, the issue still occurs.
  2. We are only going through _strip_signing_parameters because we are using AWS_QUERYSTRING_AUTH = False. That’s because Open edX assets are supposed to be public (see this tutor-minio commit for instance).
  3. Interestingly, the ImportExportS3Storage class which is the default storage class for course import/export storage defines querystring_auth=True. (source).
  4. Let’s try to switch to the default ImportExportS3Storage class by setting COURSE_IMPORT_EXPORT_STORAGE = "cms.djangoapps.contentstore.storage.ImportExportS3Storage" in the tutor-minio plugin…
  5. For some reason, it looks like the course export generation task is completely bypassing the COURSE_IMPORT_EXPORT_STORAGE setting and is making use of the DEFAULT_STORAGE (!!!) :sob:
  6. Let’s try instead to set the USER_TASKS_ARTIFACT_STORAGE = "cms.djangoapps.contentstore.storage.ImportExportS3Storage" setting, as it is actually done in cms/envs/production.py… bingo! exporting the course fails with a different error: {'raw_error_msg': 'S3ResponseError: 403 Forbidden\n'}. This means that the course export does make use of the USER_TASKS_ARTIFACT_STORAGE setting. But we probably can’t use it because, well, it uses the outdated S3BotoStorage class.
    (But WHY for god’s sake do we keep using the outdated S3BotoStorage class in edx-platform?!?)
  7. I attempted to make boto use sigv4 with os.environ["S3_USE_SIGV4"] = "True". This changed the error from 403 to 400, so I guess it must have had some kind of effect. Here is the traceback:
cms-worker_1                 | [2022-01-24 17:10:55,601: ERROR/ForkPoolWorker-1] cms.djangoapps.contentstore.tasks.export_olx[2eb9e318-2d7a-4cce-96c9-5cf2e4d42ed9]: Error exporting course course-v1:org+test101+a
lpha                                                                                                                                                                                                               
cms-worker_1                 | Traceback (most recent call last):                                                                                                                                                  
cms-worker_1                 |   File "/openedx/edx-platform/cms/djangoapps/contentstore/tasks.py", line 322, in export_olx                                                                                        
cms-worker_1                 |     artifact.file.save(name=os.path.basename(tarball.name), content=File(tarball))                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/django/db/models/fields/files.py", line 89, in save                                                                               
cms-worker_1                 |     self.name = self.storage.save(name, content, max_length=self.field.max_length)                                                                                                  
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/django/core/files/storage.py", line 54, in save                                                                                   
cms-worker_1                 |     name = self._save(name, content)                                                                                                                                                
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/storages/backends/s3boto.py", line 425, in _save                                                                                  
cms-worker_1                 |     key = self.bucket.get_key(encoded_name)                                                                                                                                         
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/boto/s3/bucket.py", line 193, in get_key                                                                                          
cms-worker_1                 |     key, resp = self._get_key_internal(key_name, headers, query_args_l)                                                                                                             
cms-worker_1                 |   File "/openedx/venv/lib/python3.8/site-packages/boto/s3/bucket.py", line 230, in _get_key_internal                                                                                
cms-worker_1                 |     raise self.connection.provider.storage_response_error(                                                                                                                          
cms-worker_1                 | boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request 
  1. Let’s attempt to correctly configure S3BotoStorage, even though I find this utterly depressing.
  2. Let’s try a different approach. We would like to have a course import/export storage class with querystring_auth=True (like ImportExportS3Storage) but inherits from S3Boto3 Storage (unlike ImportExportS3Storage). So let’s modify the ImportExportS3Storage class.
  3. Damn, my dev environment is broken… I need to fix it (see PR).
  4. I modified the ImportExportS3Storage class to inherit from S3Boto3Storage instead of S3BotoStorage. URL generation works, and I am able to download the file.

Thus, there are two possible solutions:

a. Make it so that the tutor-minio plugin implements and uses its own ImportExportS3Storage class.
b. Push an upstream fix to migrate the ImportExportS3Storage to inherit from S3Boto3Storage instead of S3BotoStorage. As usual, this is probably the best but also the most difficult solution…

1 Like

Hi @regis ,

  1. Yes
  2. Yes
  3. No

When are you thinking to push a fix for this issue ?

As far as I am concerned, this PR fixes both this issue and this other one.

Hi @regis , Thanks for providing the fix. Two issues got fixed - export library and export course. But there is a similar third issue - not able to view uploaded photo of learner from lms. When I compare the 2 urls, url for library export has X-Amz-SignedHeaders=host&X-Amz-Signature, however these are missing for profile photo url. Below is the url for profile photo https://files.lms.xxx.com/bucketnamexxxxx/openedx/media/profile-images/5b2df33821de654464635be371881ce7_500.jpg?v=1999996638 , which gives AccessDenied error, although the object is public. Using tutor, version 13.1.3. Please suggest regarding this.

@regis, I couldn’t find a fix for this. Any suggestions on this ?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.