I recently spent some time upgrading our Juju environments from 2.1 to 2.3. Below are a few lessons learned aimed at other Juju enthusiasts doing the same experiment.
We run an academic cloud, HUNT Cloud, where we utilize a highly available Juju deployment, in concert with MAAS, to run things like OpenStack and Ceph. For this upgrade, we were looking forward to some of the new features such as cross model relations and overlay bundles.
How to upgrade Juju (for dummies)
Upgrading a Juju environment is usually a straightforward task completed with a cup of coffee and a couple of commands. The main steps are:
- Upgrade your Juju client (the client talking to the Juju controllers, usually on your local machine,
apt upgrade jujuor
snap refresh juju)
- Upgrade your Juju controller (the controller managing the agents,
juju upgrade-juju --model controller)
- Upgrade your Juju model (the model containing your deployed applications,
juju upgrade-juju --model <name-of-model>)
Check out the official docs for a more thorough explanation.
Our task at hand was to upgrade the Juju environment from 2.1.2 to 2.3. Step 1 was easy as can be, however the remaining steps provided a few lessons learned that might prove useful for others.
Issue No. 1
We ran the following command to upgrade our controllers:
$ juju upgrade-juju --model controller
started upgrade to 2.2.9
Now, if you look closely, the output above says 2.2.9, not 2.3.2 which was the latest version at the time and the one I actually wanted.
Well, the upgrade to 2.2.9 succeeded, so I continued upgrading once more by running
juju upgrade-juju --model controller to reach 2.3.2.
This time things didn’t go as smooth for the controllers and they got stuck upgrading which rendered the environment unusable. It did however produce some rather bleak yet humorous error messages.
2018-01-30 11:15:22 WARNING juju.worker.upgradesteps worker.go:275 stopped waiting for other controllers: tomb: dying
2018-01-30 11:15:22 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.2.9 to 2.3.2 for "machine-0" failed (giving up): tomb: dying
I was able to reproduce this in one of our larger staging areas and the bug got fixed in 2.3.3 in lp#1746265.
Issue No. 2
So, after getting stuck with the issue above, I was encouraged to try upgrading straight to 2.3.2, skipping 2.2.9 altogether.
Juju allows you to specify the target version using the
The command you end up with is
juju upgrade-juju --model controller --agent-version 2.3.2.
Sticking to good form and the rule of three, the controllers got stuck upgrading rendering the environment unusable once again. Fortunately, it was easy to reproduce both in our staging area and on local LXD deployments so this one also got fixed in 2.3.3 in lp#1748294.
Issue no. 3
We gave the upgrade a new try when version 2.3.4 rolled around late in February.
Things looked good after multiple runs in staging, so I finally upgraded one of our production controllers using
juju upgrade-juju --model controller --agent-version 2.3.4.
The upgrade process took around 15 minutes. After a lot of logspam in the controller logs and some unnerving error messages in the
juju status --model controller output, things seemed to settle.
We noticed charm agent failures and connection errors between the controllers and a small number of the applications in the main production Juju model containing our OpenStack and Ceph deployments.
After filing lp#1755155, I was recommended to push on and upgrade the Juju model even though some of the charms errored out. This approach resolved the connection errors.
The root cause was most likely lp#1697936 which was reported last year. It turned out 2.1 agents could fail to read from 2.2 and newer controllers. I did eventually find a mention of the bug in the changelog for 2.2.0, however the description did not contain the error messages leaving my searches in Launchpad coming up empty.
Upgrading the model with
juju upgrade-juju --model openstack --agent-version 2.3.4 and restarting the affected agents finally did the trick and all components were running smoothly on 2.3.4.
To be fair to the Juju team, our production model has a decent amount of different charms and therefore a decent amount of Juju agents (we are talking about OpenStack after all).
Now you might rightfully ask, Sandor, why on earth didn’t you just upgrade the model right away as described in step 3? Well, I simply became a bit wary of proceeding without any easy way to rollback after running into all the previous bugs where things got stuck.
- Always read the changelogs. Carefully.
- Always test the upgrades. This goes both for users and the dev team.
- The upgrade UX has room for improvements with everything from
apt upgrade juju,
snap refresh juju.
juju upgrade-juju --model controller,
juju upgrade-juju --model model,
- As things can go awry, it would be nice if
juju upgrade-jujuwould tell you what it will do without the
--dry-runflag as it may not pick the version you want.
- It would also be nice if there was a way to do proper dry runs or even rollback (both failed and successful) upgrades besides backing up and restoring your controllers.
- Even though the controller and the model are upgraded separately and should be able to run different versions, they can break each other.
Many thanks to Rick, Tim, John and the rest of the Juju gang from Canonical for helping out with tips, troubleshooting and fixes.