Problem fetching SAML IdP metadata

Hello,
I am trying to configure SAML SSO in my Tutor 11.2.3 instance.

According to Getting started with plugin development — Tutor documentation, I have installed SAML plugin, enabled it, created content file according to README on pdebruic/tutor-saml Github page(sorry, new users can have only 2 links…), saved and restarted. Plugin is enabled:

ubuntu@openedx-dev:~$ tutor plugins list
saml==0.1.0`

Generated .key and .crt with openssl req -new -x509 -days 3652 -nodes -out saml.crt -keyout saml.key. This is the content file, $(tutor plugins printroot)/saml.yml:

name: saml
version: 0.1.0
patches:
  common-env-features: |
    "ENABLE_THIRD_PARTY_AUTH": true

  openedx-lms-common-settings: |
    # saml special settings
    THIRD_PARTY_AUTH_BACKENDS = "third_party_auth.saml.SAMLAuthBackend"

  openedx-auth: |
    "SOCIAL_AUTH_SAML_SP_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----
    redacted
    -----END PRIVATE KEY-----"
    "SOCIAL_AUTH_SAML_SP_PUBLIC_CERT": "-----BEGIN CERTIFICATE-----
    redacted
    -----END CERTIFICATE-----"

Next, according to 4.23.3.2.2. Integrating with a SAML Identity Provider — Installing, Configuring, and Running the Open edX Platform documentation, I have configured my Tutor/Edx instance as Service Provider, registered my service at my IdP and added SAML IdP to Edx with the corresponding values(e.g. EntityID, Metadata URL, etc.).

Triple checked, all values correspond, yet the Metadata Ready button is still a red cross(and slug is default). When I tried to manually run with tutor local run lms ./manage.py lms saml --pull --settings=tutor.production, the command stops on Fetching https://id.myredacteddomain.com/metadata. After I pressed Control-C I got this output:

^CTraceback (most recent call last):
  File "./manage.py", line 123, in <module>
    execute_from_command_line([sys.argv[0]] + django_args)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/openedx/edx-platform/common/djangoapps/third_party_auth/management/commands/saml.py", line 32, in handle
    total, skipped, attempted, updated, failed, failure_messages = fetch_saml_metadata()
  File "/openedx/venv/lib/python3.8/site-packages/celery/local.py", line 191, in __call__
    return self._get_current_object()(*a, **kw)
  File "/openedx/venv/lib/python3.8/site-packages/celery/app/task.py", line 393, in __call__
    return self.run(*args, **kwargs)
  File "/openedx/edx-platform/common/djangoapps/third_party_auth/tasks.py", line 84, in fetch_saml_metadata
    response = requests.get(url, verify=True)  # May raise HTTPError or SSLError or ConnectionError
  File "/openedx/venv/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/connection.py", line 411, in connect
    self.sock = ssl_wrap_socket(
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 420, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/openedx/venv/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 464, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/opt/pyenv/versions/3.8.6/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
KeyboardInterrupt
Error: Command failed with status 130: docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.prod.yml --project-name tutor_local run --rm lms ./manage.py lms saml --pull --settings=tutor.production

Any idea what am I doing wrong? I tried to curl the metadata.xml and it works, its URL is publicly available.

I’m pretty sure THIRD_PARTY_AUTH_BACKENDS should be an array, and not a string. Thus:

THIRD_PARTY_AUTH_BACKENDS = ["third_party_auth.saml.SAMLAuthBackend"]

and not:

THIRD_PARTY_AUTH_BACKENDS = "third_party_auth.saml.SAMLAuthBackend"

Apart from that, I see no obvious error (but I never played with SAML before).

Thanks for reply, I’ve changed it to array but the result is still the same.

Although by running grep -r saml "$(tutor config printroot)/env/apps/openedx/settings/lms/production.py" I should be able to see changes, but the output is empty. Is that correct?

You should definitely see the “saml” string in the LMS settings after you run tutor config save. Can you confirm that you do?

I confirm that I do not see the saml string.

My config.yml is:

...

PLUGINS:
- saml

...

TUTOR_PLUGINS_ROOT: /home/ubuntu/.local/share/tutor-plugins

...

And I did:

ubuntu@openedx-dev:~/.local/share/tutor$ tutor plugins enable saml
Plugin saml enabled
Configuration saved to /home/ubuntu/.local/share/tutor/config.yml
You should now re-generate your environment with `tutor config save`.
ubuntu@openedx-dev:~/.local/share/tutor$ tutor config save
Configuration saved to /home/ubuntu/.local/share/tutor/config.yml
Environment generated in /home/ubuntu/.local/share/tutor/env
ubuntu@openedx-dev:~/.local/share/tutor$ grep -r saml "$(tutor config printroot)/env/apps/openedx/settings/lms/production.py"
ubuntu@openedx-dev:~/.local/share/tutor$ tutor plugins list
saml==0.1.0

What is the output of cat ~/.local/share/tutor-plugins/saml.yml?

The output is:

name: saml
version: 0.1.0
patches:
  common-env-features: |
    "ENABLE_THIRD_PARTY_AUTH": true

  openedx-lms-common-settings: |
    # saml special settings
    THIRD_PARTY_AUTH_BACKENDS = [ "third_party_auth.saml.SAMLAuthBackend" ]

  openedx-auth: |
    "SOCIAL_AUTH_SAML_SP_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----
    ...
    -----END PRIVATE KEY-----"
    "SOCIAL_AUTH_SAML_SP_PUBLIC_CERT": "-----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----"

Here is what I get:

$ nano saml.yml
$ tutor plugins install ./saml.yml
Plugin installed at /home/regis/.local/share/tutor-plugins/saml.yml
$ tutor plugins enable saml
Plugin saml enabled
Configuration saved to /home/data/regis/.local/share/tutor/config.yml
You should now re-generate your environment with `tutor config save`.
$ tutor config save
Configuration saved to /home/data/regis/.local/share/tutor/config.yml
$ grep -r saml "$(tutor config printroot)/env/apps/openedx/settings/lms/production.py"
# saml special settings
THIRD_PARTY_AUTH_BACKENDS = [ "third_party_auth.saml.SAMLAuthBackend" ]

I do not understand what is happening with your installation. Can you check that you don’t have multiple plugins with the same name in ~/.local/share/tutor-plugins/?

On a side note (should be unrelated to your problem), you can remove the TUTOR_PLUGINS_ROOT variable from config.yml, which is useless there. If you want to define a custom plugins root, you should set this value as an environment variable.

So the problem was that I have previously installed the plugin with pip install git+https://github.com/pdebruic/tutor-saml as it is written in README. I have uninstalled it with pip uninstall tutor-saml and then installed with tutor plugins install ./saml.yml, enabled, saved and now the grep shows same output. Cool.

But tutor local run lms ./manage.py lms saml --pull --settings-production still hangs :frowning: .

Unfortunately I cannot help you here, as I have never tried to install this plugin. I suggest you report the issue to the plugin author. If you find a solution, please post your findings here :pray:

@geckiss did you ever get this working?

1 Like

Actually, we did yesterday.

The problem was in MTU. We set the MTU in /etc/docker/daemon.json to 1442, but since Tutor is using docker-compose, the compose file and the created network did not inherit aforementioned MTU from daemon.

Se we manually created network in docker-compose.yml, docker-compose.prod.yml( not sure if docker-compose.jobs.yml also needs it):

networks:
  tutor_local_default:
      name: tutor_local_default
      driver: bridge
      driver_opts:
        com.docker.network.driver.mtu: 1442

Then we assigned containers to the network and brought up with docker-compose -f $TUTOR_ROOT/env/local/docker-compose.yml -f $TUTOR_ROOT/env/local/docker-compose.prod.yml up -d.

Then we did docker network inspect tutor_local_default and near the bottom of the output is ‘Options’ field with the correct MTU value. We did not see it before the custom network creation, with the default compose files.

Then we tried curl-ing from LMS container and it worked. The metadata were fetched correctly and we can see the green tick in SAML config in admin panel.

TIL a new word: MTU

Are you saying that the default Docker MTU is too low for SAML authentication? This would match what this guy observed.

Oh and kudos on resolving this issue :bowing_man: this must have been a seriously hardcore bug to figure out. I don’t think I would have found the issue on my own.

Although Tutor is not directly responsible for this issue, it would make sense to either write a tutorial or add some words to the docs to explain how to setup SAML. @geckiss do you think you would be able to do that?

Are you saying that the default Docker MTU is too low for SAML authentication? This would match what this guy observed.

The default MTU is too high. For Ethernet, it’s 1500, but since Docker adds some overhead, you need to lower it to 1442. I’m not sure why the guy lowered it to 1400, maybe because of his LDAP overhead(?). In our case, 1442 works fine.

Although Tutor is not directly responsible for this issue, it would make sense to either write a tutorial or add some words to the docs to explain how to setup SAML. @geckiss do you think you would be able to do that?

I’d gladly do that, but I’m not sure where exactly should I write it.

1 Like

I didn’t have to change any MTU settings to get SAML working with Tutor, I was able to pull the IdP metadata once I the data added in the django interface.

If you write the words here, or wherever suits you best (google doc, gist, wiki…) we’ll take care of putting them in the right place in the docs, or in this forum as a tutorial.

1 Like

In my opinion it will be better to support MTU docker-compose changing by default - for example via new variable in “$(tutor config printroot)/config.yml”. From time to time - not sure atm with which command - Tutor replaces our modified docker-compose.yml, docker-compose.prod.yml and docker-compose.jobs.yml with the default ones, so then I need replace them again.
Other problem without changing MTU in those config files is for example every https downloads from Tutor containers timeouts because of default 1500 MTU in docker-compose and our virtualisation platform (OpenStack) which can handle maximum 1442 MTU in VMs.
Now we fixed the problem with files below, but it would be better to support it officially without “hacking” Tutor by us. It can be problem in future for another administrators who want deploy Tutor on virtulisation with lower MTU values.

Our fixed files with 1442 MTU:

1 Like

Sorry - I am limited as newbie here to post only two links in one post, so below is the last one file:

1 Like

Hi @mlebeda! I’d rather avoid adding new config variables in Tutor for every edge case. Instead, I suggest you add your custom docker-compose contents to a docker-compose.override.yml file in ($tutor config printroot)/env/local. The changes in this file will automatically be picked up by Tutor. Kinda similar to this section from the docs.

Hi @regis . Thank you for suggestion. Works great without using my regex for generating new yml files. Here is entire file that I am using ATM: docker-compose.override.yml - Pastebin.com

Only few observations:

  • Crashes if file docker-compose.override.yml is present at start of deploying when I run $ tutor local init i will got: ERROR: Network "tutor_local_default" needs to be recreated - option "com.docker.network.driver.mtu" has changed - So I just copy docker-compose.override.yml at end of deploy with included Ansible tasks: mtu.yml - Pastebin.com

  • Same crashes and similar solution as above when I run $ tutor local quickstart. I just run same script but with delete (instead of copy) docker-compose.override.yml, then run $ tutor local quickstart and after that run again script with copy.

edit:

It is not ideal, but it works. :slight_smile: