Blue/Green Deployment

andy-thomas-83 · November 20, 2020, 3:12pm

We have been using Tutor/OpenEDX for about 3 months now and so far so good. We had some initial challenges setting it up with EKS on AWS using Teraform and S3 esp signed URLs but we managed to iron out the issues we had. We were hoping to look at Blue/Green deployments. We are using a Continous Delivery pipeline which deploys changes pretty quickly (~1 to 2 minutes) but as expected it does take the pods down and incurs downtime (somewhere between 3 to 5 minutes). Has anybody gotten blue/green working? Or tried it?

We have a few different ideas on how we might get this working but am really interested if anybody has any experience (and a working setup ).

regis · November 20, 2020, 3:25pm

Can you describe more precisely which steps cause downtime?

andy-thomas-83 · November 20, 2020, 4:07pm

It has been stuff like adding an xblock, making a change to the lms and cms files and any theme changes.

I suspect these will become less frequent as we stop finding out about new bits to turn on and use.

regis · November 20, 2020, 5:01pm

No I meant: which steps in your CI cause downtime? Do you run tutor k8s quickstart or anything like that?

Those steps should not be causing downtime, as it’s just a matter of adding/removing running containers. As far as I understand, the only things that should cause downtime are backward-incompatible database migrations.

andy-thomas-83 · November 20, 2020, 6:49pm

Sorry - the bit that we do that causes a restart is the k8s reboot:

  - if kubectl get namespace openedx; then tutor k8s reboot; else tutor k8s quickstart -I; fi
  when: 
    branch: master
    event: push

Should we be looking at doing it differently?

andy-thomas-83 · November 20, 2020, 6:58pm

I went back to look at the documentation

Could we run tutor k8s start if its running kubectl apply under the hood? I don’t know why I thought we would have to run tutor k8s stop and then start, I thought it was imperative - and this would take longer.

regis · November 21, 2020, 4:07pm

Yes, absolutely The quickstart command was designed for basic deployments, but it should not be used for more advanced scenarios, such as zero downtime deployments.