OpenShift 4.6 (k8s 1.19) Install

michaelsteigman · November 23, 2021, 4:47pm

Hi all,

Earlier this year I explored using Tutor to install OpenEdx onto a test OpenShift cluster. I was able to get everything running after asking our cluster admin to allow containers to run as root. Encouraging but not immediately helpful as my organization would not allow this config in a real cluster.

I recently checked back and noticed updates around k8s and root containers. I decided to give it another try and report my results. My cluster is v4.6, which is k8s v1.19.

Upon re-initializing my config, I ran tutor k8s start. The only pod that came up initially was caddy. Caddy’s log, however, ended with an error, which I haven’t yet looked into:

run: loading initial config: loading new config: http app module: start:
 tcp: listening on :80: listen tcp :80: bind: permission denied

All of the replica sets that had failed to bring up a pod had a similar error:

Error creating: pods "cms-857476898f-" is forbidden: unable to validate 
against any security context constraint: 
[spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must
 be in the ranges: [1001340000, 1001349999]]

I looked through the deployment yaml and noticed all the securityContext settings. I don’t have deep expertise around security context constraints but from what I can tell, Red Hat images (pre-packaged or those built on the cluster) do not alter securityContext. OpenShift manages the UID and it just works (the acceptable range is random and per-namespace so there is no way to specify a UID that will work for everyone).

I went through the yaml and removed these settings and upon re initialization, Minio and mongodb were fine. Exim starts up but all I see in the log is

 exim: permission denied

The pods for the OpenEdx variants made it further but the application logs were reporting trouble with read-only directories. E.g.,

PermissionError: [Errno 13] Permission denied: '/openedx/data/logs'

I added emtpyDir mounts for the following to see if I could get further:

CMS variant:

/openedx/media
/openedx/data/ora2

LMS variant:

/openedx/data/logs
/openedx/media
/openedx/data/ora2

After making these changes, the LMS/CMS pods along with their worker pods all came up and, based on their logs, appeared to be ready to go.

That’s all I have for now. Look forward to feedback from the community and to helping with efforts to get Tutor/OpenEdx working smoothly on OpenShift.

Michael

regis · November 29, 2021, 10:50am

Aaaaargh, that’s really bad news! Let me summarize the two issues here:

michaelsteigman:

Upon re-initializing my config, I ran tutor k8s start. The only pod that came up initially was caddy. Caddy’s log, however, ended with an error, which I haven’t yet looked into:
run: loading initial config: loading new config: http app module: start:
 tcp: listening on :80: listen tcp :80: bind: permission denied

This means that we cannot bind a load balancer to port 80? (and I suspect 443 as well?) How are we supposed to deploy load balancers then?

michaelsteigman:

All of the replica sets that had failed to bring up a pod had a similar error:

Error creating: pods "cms-857476898f-" is forbidden: unable to validate 
against any security context constraint: 
[spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must
 be in the ranges: [1001340000, 1001349999]]

This second error is really annoying. It means that OpenShift assigns arbitrary user IDs to the running containers. I do not understand how to get this to work with our containers and with the exim container. For instance, how are we supposed to write to volumes? Containers also need write-access to their own filesystem, how can we achieve that?

michaelsteigman · December 1, 2021, 7:52pm

This means that we cannot bind a load balancer to port 80? (and I suspect 443 as well?) How are we supposed to deploy load balancers then?

I don’t think that’s the case. After some reading and examination of my config and resource declarations, I noticed that I had turned off proxy so I didn’t have the LoadBalancer declaration for caddy. I regenerated my config from scratch and tried to deploy.

➭ oc get svc/caddy
NAME    TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
caddy   LoadBalancer   172.30.179.19   <external IP>   80:30161/TCP,443:31422/TCP   17h

This appears to be correct. I don’t think there’s any issue using this approach with OpenShift.

Caddy however still fails with a similar message. I don’t know yet why it’s complaining about https now vs http before.

run: loading initial config: loading new config: http app module: start: tcp: listening on :443: listen tcp :443: bind: permission denied

If the container isn’t running with root privileges, Caddy won’t be able to bind to 80/443. I think that’s what is going on here.

LoadBalancer can specify a targetPort as well. Could we run Caddy on 8080 and 8443 and handle the forwarding in the load balancer?

Regarding the random uid, it’s discussed a bit here.

Volumes work perfectly fine out of the box. Here’s what I see in one of our other Django apps. I have not done anything in terms of permissions or mode in the volumeMounts section of my Deployment

(app-root) sh-4.4$ whoami
1001220000
(app-root) sh-4.4$ ls -lah /opt/app-root/src/<mysite>/site_media/
total 8.0K
drwxrwxr-x.  1 default root         19 Nov 10 11:27 .
drwxrwxr-x.  1 default root       4.0K Nov  8 12:07 ..
drwxrwxrwx.  2 root    1001220000    0 Oct 21 14:36 media
drwxrwxr-x. 20 default root       4.0K Nov  8 12:07 static

This is a nice writeup on some of the considerations to maintain compatibility between k8s and OpenShift. (In the case above the volume mount is actually owned by root.<uid> and not root.root. Maybe this was a typo in the write up but all volume mounts in a project look like this in terms of ownership. Certainly makes sense this way.)

Without knowing a little more about what you’ve tested and feel like you need for k8s I am hesitant to make changes but I’d like to help here. Let me know what I can do.

regis · December 14, 2021, 10:36am

I believe that the issues related to Caddy should be fixed now that this PR was merged in the nightly branch: Multiple improvements to the Caddy load balancer in Kubernetes by regisb · Pull Request #539 · overhangio/tutor · GitHub
On Openshift, Tutor should be run with ENABLE_WEB_PROXY=false and CADDY_HTTP_PORT=8080.

There remain the following items:

Make it possible to remove the securityContext.
Clarify the situation concerning uid and permissions: it is unclear to me whether writing to volumes or containers works for you on Openshift. If not, what errors are triggered?

michaelsteigman · December 20, 2021, 4:50pm

I updated and reinstalled the nightly client and regenerated my files with those variables set but the same error is there.

The issue is that the Caddy image is set up to start on port 80 and 443. I think we’re going to have to change that and then adjust the service to forward to the unprivileged ports.

I looked at the Caddy configuration but I found it a bit opaque in terms of setting the ports via the config but I will look again. If I can figure that out, I could tinker on my end and submit a PR when I have it working.

Yes, disabling securityContext would solve the issue on OpenShift. Alternatively, I believe runAsUser set to “auto” will work on OpenShift. I am not certain if any of the other security context settings will break things but perhaps setting an environment variable (e.g. RUN_AS_USER="auto" (to override a default of 1001) is an approach we could look at.

OpenShift mounts the volumes as the user that runs the pod so volumes will just work. Once we figure out the proxy configuration I will confirm but I’m not worried about this at all.

michaelsteigman · January 10, 2022, 8:03pm

Had a chance to get back to this and wanted to follow up on my message above. I added a port specification to the generated Caddyfile’s global section (this was pretty easy, don’t know why I missed it earlier…):

{
    http_port   8080
    https_port  8443
}

And altered the deployment with the new ports and Caddy now starts up fine on OpenShift.

There an error about auto-saving config.

{"level":"error","ts":1640035008.2335553,"msg":"unable to autosave config","file":"/config/caddy/autosave.json","error":"open /config/caddy/autosave.json: permission denied"}

Based on this issue in the caddy-docker repo, it appears it is not critical and could be circumvented by changing the data and config home directories. We’ll ignore for now…

I updated the service accordingly:

apiVersion: v1
kind: Service
metadata:
  name: caddy
  labels:
    app.kubernetes.io/name: caddy
    app.kubernetes.io/component: loadbalancer
spec:
  type: LoadBalancer
  ports:
    - port: 80
      name: http
      targetPort: 8080
    - port: 443
      name: https
      targetPort: 8443
  selector:
    app.kubernetes.io/name: caddy

This setup works and should be compatible with k8s and gets Caddy away from running as root. (Note: we’re running on a cloud provider that required a special annotation on the service in order to get an address from an externally routable block but this bit will be provider-specific afaik…)

I don’t want to forget that the Exim relay also has issues running as an arbitrary user in OpenShift but setting that and the securityContext concerns above aside for a minute, there’s another minor issue which I alluded in my initial post that I have realized may be easily fixable for the OpenShift side. The sub-directories of media and data underneath /openedx are never chowned in the Dockerfile. As a result, the cms and lms pods complain and the worker pods fail to start when run as a non-root user that isn’t the app user. I worked around this earlier by adding emptyDir declarations for each directory but this could be fixed in the Dockerfile. Just below the point where the data directory is created…

github.com

overhangio/tutor/blob/master/tutor/templates/build/openedx/Dockerfile#L207

    
      
              && openedx-assets npm \
              && openedx-assets webpack --env=prod \
              && openedx-assets common
          COPY --chown=app:app ./themes/ /openedx/themes/
          RUN openedx-assets themes \
              && openedx-assets collect --settings=tutor.assets \
              # De-duplicate static assets with symlinks
              && rdfind -makesymlinks true -followsymlinks true /openedx/staticfiles/
          
          
# Create a data directory, which might be used (or not)
          RUN mkdir /openedx/data
          
          
# service variant is "lms" or "cms"
          ENV SERVICE_VARIANT lms
          ENV SETTINGS tutor.production
          
          
{{ patch("openedx-dockerfile") }}
          
          
# Entrypoint will set right environment variables
          ENTRYPOINT ["docker-entrypoint.sh"]
          EXPOSE 8000

the Red Hat docs (similar to the compat docs I linked to above) would suggest:

RUN chown -R app:0 /openedx && \
    chmod -R g=u /openedx

This should be compatible with k8s as well.

system · April 10, 2022, 8:04pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.