Automatic ACME SSL Certificate Rotation

Technology needs to be secure, but we also want to make it easy to use. This is the same for us engineers managing SSL Certificates and their rotation. You can get long life certificates, but why when you can get free ones generated via the Automated Certificate Management Environment (ACME protocol). This is normally due them expiring within 3 month, which you do not want to keep renewing and deploying every 3 months, especially when you have many services to maintain. Therefore, I have a pattern and design to renew certificates, which can also be adapted for any service or cloud provider.

This design uses specific technologies, but due to the makeup of it, each component can be swapped for whatever technology you are using. For example, where I use Azure Key Vault to store the certificate, this can easily be swapped for AWS Certificate Manager. This is also why it is a very good design as it can support multiple type of services, languages and providers.

For this article I am using the following technologies

TechnologyPurposeLink
Azure DevOpsDeployment Softwarehttps://azure.microsoft.com/en-us/services/devops/
Leys EncryptCertificate Providerhttps://letsencrypt.org/docs/client-options/
Azure Key VaultCertificate Storehttps://docs.microsoft.com/en-us/azure/key-vault/general/basic-concepts
Azure Virtual Machine – LinuxApplication Hosthttps://azure.microsoft.com/en-us/services/virtual-machines/linux/
Azure DNSDNS Providerhttps://azure.microsoft.com/en-gb/services/dns/#overview
Posh-ACMEAutomate Certificate Generationhttps://poshac.me/docs/v4/Tutorial/

How it works

As you can see from the design it is the Azure DevOps that does the request of the certificate. This is so there is a single source that is doing the request per domain, instead of each of the resources doing it. This can save on the number of requests and number of certificates required per domain. You can request one certificate and all the resource using that domain can reap the benefits.

Request the Certificate

This section will explain the job of Azure DevOps to get the new certificate from Lets Encrypt and store it within the Azure Key Vault.

Get a Certificate Script

We start by setting the variables used within the script that will configure how it will be used.

env is the Environment, which is used just later on to decide what Lets Encrypt Server to use. When using the Production Server, you are limited to how many requests you can do per domain per day, therefore for lower environments it makes sense to use the Staging Server where you might be requesting multiple time during deployments.

The acmeContact is an email contact that gets used for the Posh-ACME account but can be any email as long as it is formatted as an email.

The domain is the Fully Qualified Domain Name of the URL you will be requesting the certificate for.

Then finally you have the Azure Subscription name that holds the Azure DNS resource, which will be used later to get the Access Token for the request. This is if your resources are not hosted in the same Azure Subscription, but if they are then you can always just use the az cli to get the current Subscription.

$env="staging"
$acmeContact="me@email.com"
$domain="www.example.com"
$dnsSubscription="DNS-Subscription-Example"
if ($env -eq "production" -or $env -eq "staging") {
  $leServer="LE_PROD"
}else {
  $leServer="LE_STAGE"
}

We can then install the Posh-ACME PowerShell Module.

# Set Posh-ACME working directory
Write-Host "Install Module"
Install-Module -Name Posh-ACME -Scope CurrentUser -Force

Set the Lets Encrypt Server and install the Azure plugin for the script.

# Configure Posh-ACME server
Write-Host "Configure LE Server $leServer"
Set-PAServer $leServer
Get-PAPlugin Azure

When using the Posh-ACME you will need to setup an account attached to the domain for renewals, which can be auto-generated using the acmeContact email we setup earlier. The script below can also workout if you already have an account setup and if so, it will use the existing account.

# Configure Posh-ACME account
Write-Host "Setup Account"
$account = Get-PAAccount
if (-not $account) {
    # New account
    Write-Host "Create New Account"
    $account = New-PAAccount -Contact $acmeContact -AcceptTOS
}
elseif ($account.contact -ne "mailto:$acmeContact") {
    # Update account contact
    Write-Host "Set Existing Account $($account.id)"
    Set-PAAccount -ID $account.id -Contact $acmeContact
}

We then need to get the Azure DNS resources Subscription ID and Access Token to pass into the certificate generation. If your DNS resource is not hosted within your current subscription, then you can use this script to get the Subscription details and then request the Access Token. If it does, then you can remove the part where it sets the subscription name and just show the current subscriptions context (az account show –query ‘id’ -o tsv).

# Acquire access token for Azure (as we want to leverage the existing connection)
Write-Host "Get Azure Details"
$azAccount = az account show -s $dnsSubscription -o json | ConvertFrom-Json
Write-Host "Azure DNS Sub $($azAccount.name)"
$token = (az account get-access-token --resource 'https://management.core.windows.net/' | ConvertFrom-Json).accessToken

You can now request the new certificate using the Post-ACME command and the obtained settings.

# Request certificate
$pArgs = @{
  AZSubscriptionId = $azAccount.id
  AZAccessToken = $token
}
New-PACertificate $domain -Plugin Azure -PluginArgs $pArgs -Verbose
$generatedCert=$(Get-PACertificate)
Write-Host($generatedCert)

Azure DevOps setup

Now we do not want to keep running this script every time therefore we can add an extra script before to check this. It will check if the certificate exists and if so then it will check its expiry is within 14 days.

- task: AzureCLI@2
  displayName: 'Check if Cert expired in ${{ parameters.keyVaultName }}'
  name: cert
  inputs:
  azureSubscription: '${{ parameters.subscriptionName }}'
  scriptType: 'pscore'
  scriptLocation: 'inlineScript'
  inlineScript: |
    $keyVaultName="${{ parameters.keyVaultName }}"
    $certName="${{ parameters.certName}}"
    $exportedCerts = az keyvault certificate list --vault-name $keyVaultName --query "[? name=='$certName']" -o json | ConvertFrom-Json
    $expired=$false
    if ($null -ne $exportedCerts -and $exportedCerts.length -gt 0){
      Write-Host "Certificate Found"
      $exportedCert = $exportedCerts[0]
      Write-Host "Certificate Expires $($exportedCert.attributes.expires)"
      $expiryDate=(get-date $exportedCert.attributes.expires).AddDays(-14)
      Write-Host "Certificate Forced Expiry is $expiryDate"
        if ($expiryDate -lt (get-date)){
          Write-Host "Certificate has expired"
          $expired=$true
        } else {
          Write-Host "Certificate has NOT expired"
        }
    } else {
      Write-Host "Certificate NOT Found"
      $expired=$true
    }
    Write-Host "##vso[task.setvariable variable=expired;isOutput=true]$expired"

This can then be used to decide if to run the certificate requesting script or not as part of the condition for the task.

- task: AzureCLI@2
  name: acmecert
  displayName: 'Request LE Cert for ${{ parameters.domain }}'
  condition: and(succeeded(), eq(variables['cert.expired'], 'True'))
  inputs:
    azureSubscription: '${{ parameters.subscriptionName }}'
    scriptType: 'pscore'
    scriptLocation: 'inlineScript'
    inlineScript: |
      $env="${{ parameters.environment }}"
      $acmeContact="me@email.com"
      $domain="${{ parameters.domain }}"
      $dnsSubscription="Reform-CFT-Mgmt"
      
      if ($env -eq "production" -or $env -eq "staging") {
          $leServer="LE_PROD"
      }else {
          $leServer="LE_STAGE"
      }
      
      # Set Posh-ACME working directory
      Write-Host "Install Module"
      Install-Module -Name Posh-ACME -Scope CurrentUser -Force
      
      # Configure Posh-ACME server
      Write-Host "Configure LE Server $leServer"
      Set-PAServer $leServer
      Get-PAPlugin Azure
      
      # Configure Posh-ACME account
      Write-Host "Setup Account"
      $account = Get-PAAccount
      if (-not $account) {
          # New account
          Write-Host "Create New Account"
          $account = New-PAAccount -Contact $acmeContact -AcceptTOS
      }
      elseif ($account.contact -ne "mailto:$acmeContact") {
          # Update account contact
          Write-Host "Set Existing Account $($account.id)"
          Set-PAAccount -ID $account.id -Contact $acmeContact
      }
      
      # Acquire access token for Azure (as we want to leverage the existing connection)
      Write-Host "Get Azure Details"
      $azAccount = az account show -s $dnsSubscription -o json | ConvertFrom-Json
      Write-Host "Azure DNS Sub $($azAccount.name)"
      $token = (az account get-access-token --resource 'https://management.core.windows.net/' | ConvertFrom-Json).accessToken
      
      # Request certificate
      $pArgs = @{
          AZSubscriptionId = $azAccount.id
          AZAccessToken = $token
      }
      New-PACertificate $domain -Plugin Azure -PluginArgs $pArgs -Verbose
      
      $generatedCert=$(Get-PACertificate)
      
      Write-Host($generatedCert)
    
      # chain.cer, chain0.cer and chain1.cer are outputted files, which chain1.cer contains only the intermediate
      $intermediatePath=$($generatedCert.ChainFile -replace 'chain.cer','chain1.cer')
      Write-Host "##vso[task.setvariable variable=certPath;isOutput=true]$($generatedCert.CertFile)"
      Write-Host "##vso[task.setvariable variable=intermediatePath;isOutput=true]$intermediatePath"
      Write-Host "##vso[task.setvariable variable=privateKeyPath;isOutput=true]$($generatedCert.KeyFile)"
      Write-Host "##vso[task.setvariable variable=pfxPath;isOutput=true]$($generatedCert.PfxFullChain)"
      Write-Host "##vso[task.setvariable variable=pfxPass;isOutput=true;issecret=true]$pfxPassword"


Store the Certificate

Finally, once we have the certificate generated we can export that and put it within the Azure Key Vault by using the az cli to import the generated certificate.

- task: AzureCLI@2
  displayName: 'Import Certificate into ${{ parameters.keyVaultName }}'
  condition: and(succeeded(), eq(variables['cert.expired'], 'True'))
  inputs:
    azureSubscription: '${{ parameters.subscriptionName }}'
    scriptType: 'pscore'
    scriptLocation: 'inlineScript'
    inlineScript: |
    
      $keyVaultName="${{ parameters.keyVaultName }}"
      $certName="${{ parameters.certName}}"
      $password="$(acmecert.pfxPass)"
      $pfxPath="$(acmecert.pfxPath)"
      az keyvault certificate import --vault-name $keyVaultName -n $certName -f $pfxPath --password $password

Install Certificate

For this stage we are assuming the above has been done, so the certificate is generated, valid and imported into the Azure Key Vault.

We will also assume for the Linux Virtual Machines (VMs) you have install the Azure CLI and have a Azure Managed Identity (MI) attached to them VMs for authentication.

For the storing of certificates on the VMs we are also using Keytools, which is used with Java applications on Linux Machine.

In the script below we are getting all the variables and logging into the Azure CLI.

miClientId="${managedIdentityClientId}"
az login --identity --username $miClientId

keyVaultName="${keyVaultName}"
certName="${certName}"
domain="${domain}"

jksPath="/usr/local/conf/ssl.jks"
jksPass="${certPassword}"

We will then get the list of the certificates and generate the date we deem as expired, which is the certificates expiry date minus 14 days.

expiryDate=$(keytool -list -v -keystore $jksPath -storepass $jksPass | grep until | sed 's/.*until: //')

echo "Certificate Expires $expiryDate"
expiryDate="$(date -d "$expiryDate - 14 days" +%Y%m%d)"
echo "Certificate Forced Expiry is $expiryDate"
today=$(date +%Y%m%d)

If today’s date is less than the expiry date then we will not try get the new certificate, but if the certificate does not exist or its expiry date is less than today then we will renew.

To do this we will download the certificate from the Key Vault, but as it downloads it without a password, we are using the open SSL CLI to import/export the certificate with a password.

This then generates a new PFX, which we import into the Keytools after we delete the existing certificate.

if [[ $expiryDate -lt $today ]]; then
    echo "Certificate has expired"
    downloadedPfxPath="downloadedCert.pfx"
    signedPfxPath="signedCert.pfx"

    rm -rf $downloadedPfxPath || true

    az keyvault secret download --file $downloadedPfxPath --vault-name $keyVaultName --encoding base64 --name $certName
    
    rm -rf $signedPfxPath || true
    openssl pkcs12 -in $downloadedPfxPath -out tmpmycert.pem -passin pass: -passout pass:$jksPass
    openssl pkcs12 -export -out $signedPfxPath -in tmpmycert.pem -passin pass:$jksPass -passout pass:$jksPass

    keytool -delete -alias 1 -keystore $jksPath -storepass $jksPass
    keytool -importkeystore -srckeystore $signedPfxPath -srcstoretype pkcs12 -destkeystore $jksPath -deststoretype JKS -deststorepass $jksPass -srcstorepass $jksPass
else
    echo "Certificate has NOT expired"
fi

You can then put this on a daily cron job to check if the certificate is valid.

The only issue that this does come up against is the overlap of the pipeline schedule and the renewal script schedule from above. If you put them both on a daily schedule, one to renew the certificate and one to get the new certificate, you may be the certificate pulling schedule run before it has been renewed. Although this is not ideal, as we renew 14 days in advance you would still have 13 days for it to catch up.

Setup Certbot for Azure Virtual Machines

Certbot is a method of automating the renewal of Automated Certificate Management Environment (ACME protocol) SSL certificates and can be a handy tool to install on your Azure Virtual Machines. However, currently as of writing this there is a little snag with the Azure plugin for Certbot, as it is not yet been merged into the official library. I have gone through the process to learn how to get this setup correctly.

I would suggest reading up on Certbot from their website https://certbot.eff.org/ and also information on the Azure Plugin here https://certbot-dns-azure.readthedocs.io/en/latest/

In this article I have assumed them to be Linux Virtual Machines and using Bash to do the scripting.

Overview of Actions

  1. Generate Authentication file
  2. Install Certbot and Certbot Azure Plugin
  3. Generate Certificates
  4. Save Certificates into store
  5. Renew Certificate
  6. Automate Renewal

Generate Authentication File

First, we need to setup the authentication so Certbot can do the DNS challenge against Azure. The below is a setup for a single DNS Zone and using the more secure method of a Managed Identity for authentication, however there is also other methods to do this and how to setup multiple DNS Zones, outline here https://certbot-dns-azure.readthedocs.io/en/latest/#configuration

The below takes in the Managed Identity Client ID, Tenant ID, DNS Zone Resource Group and the DNS Zone name. It will then create the directory/file to store these details in the correct format and permissions.

 #!/bin/bash

clientId="98f63a12-3192-4999-a369-570c4a2f506c"
tenantId="2d38d338-e6a8-4d09-a4c0-0d1dc35ede1d"
dnsZoneResourceGroup="/subscriptions/99800903-fb14-4992-9aff-12eaf2744622/resourceGroups/rgName"
dnsZoneName="example.com"

filePath=".secrets/certbot/azure.ini"

mkdir ".secrets"
mkdir ".secrets/certbot"
touch $filePath

sudo chmod 755 $filePath

echo "" > $filePath
echo "dns_azure_msi_client_id = $clientId" >> $filePath
echo "dns_azure_tenant_id = $tenantId" >> $filePath
echo "" >> $filePath
echo "dns_azure_zone1 = $dnsZoneName:$dnsZoneResourceGroup" >> $filePath

sudo chmod 400 $filePath

Install Certbot and Certbot Azure Plugin

We then can setup the installation of Certbot and the Azure Plugin, which following the Certbot description we would use the tool `snap`. However, as the Azure Plugin is not added to the official list, we cannot install this plugin via the snap tool. Therefore, the only way to install the Azure Plugin is using Python and so we also need to install Certbot in the same method for them to work together.

Below we will install Python 3, Certbot, Certbot Azure Plugin and some modules as they need to be capped at certain versions for them to work with Certbot.

# Install certbot and certbot-dns-azure
sudo apt-get -y install python3-pip
sudo pip3 install certbot certbot-dns-azure 
sudo pip3 install -Iv zope.interface==5.4.0
sudo pip3 install -Iv cryptography==2.5

Generate Certificates

By this time, we have got the authentication file setup and Certbot installed. Below we can run the command that will run Certbot on the Virtual Machine. This declares it will use the configuration file that we setup previously, it will do a DNS challenge to check it has ownership on the domain and create the certificate against the URL provided. This is assuming you have the DNS Zone, DNS record, Virtual Machine and associated Managed Identity attached.

This is all automated to run non-interactively, so this can be run via a command on the VM or a pipeline task during deployment.

# Install while running
url="www.example.com"
supportEmail="example@example.com"

sudo certbot certonly \
--authenticator dns-azure \
--preferred-challenges dns \
--noninteractive \
--agree-tos \
--email ${supportEmail} \
--dns-azure-config .secrets/certbot/azure.ini \
-d ${url}

Save Certificates into store

Once we now have the certificate generate, the below script can now save the certificate into the Keytools on the Virtual Machine. This will retrieve the certificate from the auto-generated location, resign the certificate and then store it within the Keytools.

Keytool is a tool for Java applications, but this tool can be swapped for other tools and setups. For more information on Keytools you can read here https://jenkov.com/tutorials/java-cryptography/keytool.html

url = "www.example.com"
certPassword = "***"

certPath="/etc/letsencrypt/live/${url}/fullchain.pem"
privateKeyPath="/etc/letsencrypt/live/${url}/privkey.pem"
certDir="/var/lib/waagent/"
pfxName="myPfxFile.pfx"

cd $certDir

openssl pkcs12 -inkey $privateKeyPath -in $certPath -export -out $pfxName -passin pass: -passout pass:${certPassword}

keytool -importkeystore -srckeystore $pfxName -srcstoretype pkcs12 -destkeystore /usr/local/store/ssl.store.jks -deststoretype JKS -deststorepass ${certPassword} -srcstorepass ${certPassword}

Renew Certificate

This is now all great as the setup is all automated and dynamically setup to generate the certificate. However, these certificates only have a 3 month life, therefore we would like this to be automatically renewed as well.

The below script assumes that you have saved the script above, to save the certificate into the Keytool, into a shell script file for example `save-certificate.sh`. When we run the Certbot command it will create an order, which the below will renew that order and generate a new certificate.

#!/bin/bash

# Test cert update
sudo certbot renew -q

sudo /home/certbot/save-certificate.sh

Automate Renewal

We can then save this to a file for example `renew-certbot.sh` and with the following command we will create a cron job to automatically run the renewal script from above.

# Auto Renewal > https://eff-certbot.readthedocs.io/en/stable/using.html#setting-up-automated-renewal
SLEEPTIME=$(awk 'BEGIN{srand(); print int(rand()*(3600+1))}'); echo "0 0,12 * * * root sleep $SLEEPTIME && sudo /home/certbot/renew-certbot.sh" | sudo tee -a /etc/crontab > /dev/null

Terraform Code Quality

Terraform is like any other coding language, there should be code quality and pride in what you produce. In this post I would like to describe why you should have code quality and detail some of the aspects you should be consistently doing every time you produce code.

Code Quality is like when you are learning to drive, you don’t indicate for yourself, you indicate for others, so they know what you are doing and where you are going. It is the same with your code, as it should be easy enough to follow, that another develop would be able to come along and understand what is produced. With Infrastructure as Code, this extends the quality to not just down your code, but also the resources it will produce will have continuity and consistency that admins of the product can understand it as well.

This isn’t something you should only do for production-ready work, you should encompass code quality in your Proof of Concepts and even learning projects, it creates habits and routines, so they become second nature.

What follows here are some formatting and organisation practices I employ when writing code which you may find beneficial to adopt when writing Terraform code. 

Files

Although with Terraform you can call the files anything you would like, I feel you should have a structure and a pattern to how you name files. It can give every reader understanding of knowing where they need to go to find what they need without having to hunt through the files. Below are the standard files I always use, which ensures you always have a base level of consistency. Past these files, it is generally down to your company/team standards to what files you create.

Main

The ‘main.tf’ file is a standard named file even Hashicorp use in their examples, so is great to have as the starting point for your code journey. With this file as a starting point all others developer will naturally go to this file to see where the resources start. Within this file, I do not put everything in this file like you might do for a smaller project, instead it normally contains templated ‘local’ variables. These can be things like resource prefixes, environment name manipulation or converting variables into booleans. I might also have some shared data resources and even a resource group (if we are talking Azure, where all the resources tend to live in a resource group), basically all the artifacts that will be utilised across the other files and provides the reader the base information.

Providers

The ‘providers.tf’ is what it says on the tin. It contains all the providers, their versioning and their customised features. These should only ever be referenced once here in this file, so that the versioning can flow downstream and not cause dependency issues with other providers.

Variables

The ‘variables.tf’ should only contain the variables attributes, with no local variables or modules within it. This keeps it clean and a single purpose for the file.

Output

There might be an ‘output.tf’ file for the resource properties that you would like to output, but you should only output data, even if it is not sensitive, if you have too. The less information you output then more secure the resources are, so you can consider this file optional.

Tfvars

I like to place these within a folder called ‘environments'(more on that below) and then call each file by its environment name, for example ‘dev.tfvars’. You can then also have a ‘local.tfvars’ file for local testing and working.

Override

The ‘override.tf’ file will be something you will exclude from your git repository (via .gitignore) to avoid checking in sensitive data. This can be where you configure your remote state for doing plan testing, without the need for adding values into the checked in files or via the CLI. 

Folders and File Patterns

This tends to be driven by the size of your project, company and restrictions within that company. For larger companies that have a lot of independent projects, a standard approach will be to have a collection of shared modules in their own repository. This makes flexible and configurable modules that keep a certain standard across the company. You should ensure the module has a big enough purpose though or you could be creating a lot of modules for little impact. An example I would give is a database, for which you would create the server, users, security, networking and possibly the databases themselves, having a module for this makes sense.

For smaller companies or projects, creating modules might not make sense requiring too much effort to maintain for minimal impact. For example, for a small single product company it would mean one change to a database will cause multiple repository changes. However, you can keep to this maintainable and flexible pattern by putting your module local. For this I suggest having a parent folder called ‘modules’, then if you have multiple providers like Azure and AWS, create a folder for each of them. If you don’t have multiple different providers like this, then just keep it to the modules folder level. In this you can then add a folder for each module naming it relative to the purpose, containing the Terraform file as per above.

Example:

> modules
>> azure
>>> postgresSql
>>>> main.tf
>>>> output.tf
>>>> variables.tf

Some people prefer to not use modules and to then have everything flat within the root directory. This is not a bad thing, but you still need to give a journey to the reader to make finding resources easy and not have very large Terraform content within the file. The consistent thing to do is to split your resources into multiple files, so each file then has its own purpose. Depending on how big the content is within the files depends how much your split it down. For example, if you just have 2 storage accounts being created then you might keep them in one file, but if you have a Virtual Network then multiple Subnets then you might want to split them into more files.

The standard would be to just drop these into the root and leave them named as per their resource, but I feel this has no order to the files as it would put them alphabetically, resulting in the files being in a random order. To combat this, I have then seen people prefix the file names with numbers, so they create a order.

Example:

00-main.tf
01-variables.tf
02-storage-account.tf

One challenge with this is if you now want to add a file in between 00 and 01, you need to rename all the following files, which causes a lot of work and pain.

My preferred approach would be to use a pattern that merges both ideas together, by prefixing all resource files with ‘tf’ so all the resources are then in one group together. I then follow it with an acronym of the resource and ‘main’ for the root file of the resource for example ‘tf-kv-main.tf’ for a Key Vault. Then if I would like to add another file for certificate generation then I would call it ‘tf-kv-cert.tf’. This results in all the resource files kept together, which subsequently keeps each related resource together and finally some kind of indication what each file does.

Example:

main.tf
tf-kv-main.tf
tf-kv-cert.tf

Variables

I feel sometimes variables get overlooked as the person writing the code knows what they are, what they are for and see that Terraform will handle a lot for them. But what you want to ensure is that when someone else comes along to look at your variables, they’ll actually be able to make sense of them.

Naming is key,  and variables should have a descriptive name, if you see it throughout the files, you know what it is and its purpose. They should be lowercase, using under scores and have a pattern to the name. I prefer to prefix the name with the resource type and then the variable name for example Storage Account Name would be either ‘storage_account_name’ or to make it more compact ‘sa_name’

Resource Type is one that gets ignored, as Terraform can and will interpret what data you push in, but it is worth describing this so reader know what type it is and how they might be able to expand on it, especially if it is an object or list of objects. I have seen variables without type added and then battled with what type I can pass it and then will work downstream with the different functions used on the data like count vs for_each.

Descriptions don’t need to be war and peace, but it does give a easy human readable text to what the resource is for, plus you can add information that would be helpful about the resource like limited values. This can be even more impactful if used with something like TerraDoc, which will use these descriptions to produce a ReadMe file into your project.

Conditions take some work, but they can also make life easier later down the line as it can make sure users do not put values that are not going to work, or you don’t want them to use. A great example for this is in Azure the SKU value for resources. Not only might you want to restrict the string type to certain values that match valid SKUs, but also restrict what SKUs can be used. This can validate the user only used the sku you want without them having to keep attempting with failure or creating resources for an admin to tell them to rebuild it.

variable "mysql_sku_name" {
	type = string
	description = "MySQL Server SKU. Limited Values are B_Gen4_1,  B_Gen4_2,  B_Gen5_1,  B_Gen5_2"
	default = "B_Gen4_1"
	validation {
    condition     = contains(["B_Gen4_1", "B_Gen4_2", "B_Gen5_1", "B_Gen5_2"], var.mysql_sku_name)
    error_message = "MySQL Server SKU are limited to B_Gen4_1,  B_Gen4_2,  B_Gen5_1,  B_Gen5_2."
  }
}

Resources

This is about the resources in general within each file. Each file should have a pattern and flow to how each resource connects.

I always put the local variables to the top, so they are easy to file and most of the time when you use these, they are setting up data for usage in the following resources. Next should be your resources, starting with the parent and working down to the child. For example, you start with your Storage Account and then within the Storage Containers, so there is a flow like starting with the big box and going down to the boxes that fit in them.

Naming of the terraform resources and the deployed resources should be the same as the variables. There should be a pattern, with consistency and convention to them. Terraform resources should be lowercase, using underscores as breaks and have purpose to their names. The resource name is already comprised of the provider and resource, so you should not duplicate this in the custom name. You should give it an alias that describes what it is or just use an acronym for example:

Azure Storage Account for exports I would name ‘exports’, so the full name would be used ‘azurerm_storage_account.exports’, then for something like a single Azure Resource Group I would name it ‘rg’ to produce ‘azurerm_resource_group.rg’.

You should then have a pattern for the deployed resources as well, so they also have a naming convention to follow. There is no strict rule to this as it might depend on the company policy and the resource limitation. In general, I would prefix everything with the project, then the environment and then the resource type for example CMP Project, in Development Environment and a Resource Group would be ‘cmp-dev-rg’. This can then easily group the resources and keep consistency between all of them, however some resources don’t allow characters and have maximum number of characters. Therefore, you need to think about how many characters you put in so it doesn’t hit the limit, some resources might end up looking like ‘cmpdevsa’.

locals {
	resource_prefix = "cmp-${var.env}"
	resource_prefix_no_dash = replace(local.resource_prefix,"-","")
}
 
resource "azurerm_resource_group" "rg" {
	name = "${local.resource_prefix}-rg"
	location = var.location
}
 
resource "azurerm_storage_account" "exports" {
	name="${local.resource_prefix_no_dash}sa"
	resource_group_name=azurerm_resource_group.rg.name
	location=azurerm_resource_group.rg.location
	account_tier="Standard"
	account_replication_type="GRS"
}

Ending Remarks

These are all guidelines to have a well written Terraform project, but they will vary depending on your setup. The key point is to have consistancy, naming conventions and a journey to make it easier to read, write and develop with.

Lastly, I’d always recommend using the Terraform command ‘fmt’ before checking in code to keep the style consistent as well.

What you want in a DevOps Engineer

DevOps and DevOps engineers are not a new concept, but there is still a little unknown as to what they are and what they cover. Therefore, it can be hard to gauge what you should look for when recruiting a DevOps Engineer and the skill set that is required. Not all the skills they need are technical either because DevOps is a culture and a community that needs a certain kind of attitude.

What is a DevOps Engineer?

First, we start with what a DevOps Engineer is, as this is not an obvious title due to it containing the principal name DevOps. DevOps itself is not a role, it is a culture, a principal and a methodology for developing and then deploying software, therefore you cannot really be DevOps. However, the term DevOps Engineer is widely used to describe the role of a person that helps nurture DevOps culture within the company, with both interpersonal and technical skills. DevOps as you will know is the combination of Development and Operations team members, historically Developers would build the code then throw it over the fence to the Operations team to deploy. This would introduce lots of challenges when releasing software which ultimately slowed the software development lifecycle. Think of DevOps as the gel in-between that helps enable the Developers to self-service, and the Operations team build the tools to self-serve. The ultimate goal of DevOps is to provide business value by allowing businesses to release high-quality software more frequently.

In my experience, it is more often those working on the Operations side that become the DevOps Engineers as they will know best how to get the code from A to B and have most of the skill sets required. They will use their technical skills to automate the process of packaging, deploying and then monitoring the code. They will also use their soft skills to work with the Developers and other team members on how best to action this, as all projects work differently. A DevOps Engineer should also use these skills to help the wider company adopt DevOps by encouraging small change releases frequently and continuous monitoring for new features. There are other roles that have grown from the DevOps Engineer like a Site Reliability Engineer (SRE), which is like the aftercare role that maintains the website/software’s uptime with monitoring, automation and alerting.

DevOps Technical Skills

Think of a DevOps Engineer as a jack of all trades. As they are in contact and building automation with application code, networking, infrastructure and monitoring, therefore they need to know a bit about everything. This does not mean they need to be masters of all as the saying goes:

“A jack of all trades is a master of none, but oftentimes better than a master of one.”

As DevOps Engineer you will be working in teams where some will be stronger in areas than others, which complements each other as well. As a broad skill range, you would be looking to have someone who knows these types of areas:

  • Application development like Dotnet Core, Java and/or NodeJs.
  • Database engineering on technologies like MySQL, PostgresSQL and/or MongoDb.
  • Network engineering knowledgeable in Load Balancers, Virtual Networking and Private connections.
  • Security Principles for code scanning, network security and firewall protection.
  • Pipeline automation with tools like Github, Azure DevOps and/or Jenkins.
  • Infrastructure tooling like Terraform, Ansible and CloudInit.
  • Architecture Patterns like Hub Spoke, Landing Zones and Microservices.

As you can tell this is a lot to ask of any engineer, which is why you would want someone who knows of all of these but may not be a master of them. I think the two key areas that they would need to know, especially at this time, is very efficient in the cloud platform they are working with and proficient with the Command Line Interfaces (CLI).

Cloud platforms change a lot, so trying to be great at all of them would be a near-impossible task. Therefore, my advice would be to a specialist in one platform then the other skills will be able to develop further. For example, trying to understand the application setup in AWS compared to GCP compared to Azure would drive you a little mad, but understanding it in a single cloud platform means you will start to know it inside and out.

The CLI is a DevOps Engineers best friend as it is a single point where you can run all your tools and communicate to remote tools. You will use the CLI in most, if not all aspects of the role, so knowing your way around and even knowing the commands from memory will make them a more reliable and efficient engineer. This is the sort of knowledge that you gain with experience on the job that can be invaluable.

Soft Skills

As well as technical skill the candidate should have soft skills, so they can communicate with all parties within a project. Communication is a big part of DevOps culture, in any software development project there will be project managers, developers, business analysts, solution architects and more. Therefore, they should be good at being able to describe complex and technical solutions to others without that same background or knowledge level.

As well as talking to these members they also need to work well with them a DevOps Engineer won’t just work with other DevOps Engineers but also with all others included in the projects, as they will be heavily involved in the end-to-end development lifecycle of the product.

Another key soft skill is problem-solving, as this will sometimes be the main part of the job when things go wrong. DevOps Engineers often come across new technologies or tools which they aren’t knowledgeable about, this is where they should be able to use their problem-solving skills to be able to understand and learn, allowing them to investigate any issues. This is also how they can learn more of what they are not as skilled in, continual learning is certainly something most DevOps Engineers will be familiar with and expect.

Career Background

If you’re looking to recruit a DevOps Engineer what skills should you look for? Well ideally, you’d want a candidate that has previously been either a DevOps Engineer or SRE as they should have the same skill set you are looking for. However, even now this is a developing practice and role, so you would normally get a lot of candidates that are transferring over from a previous skilled role. Therefore, the next best thing would be recruiting someone that has come from a close practice like developers, database engineers, network engineers and operations. They should though have high interest and some experience in the other skill sets, as a demonstration that they know what this role change would be. Lastly, I’d suggest they should have some Cloud experience, much of the work they will likely be doing will be in the Cloud, and having a good understanding of your target platform will be very beneficial.

Azure DevOps loop a complex object

This is a subject that I have not found much or any documentation on, so I wanted to share what I did. When creating the parameters to pass into a Template it can get very long, plus when two arrays depend on each other it can get very complex, so ironically a complex object makes this simple.

For my example I have a template where I would like to process Virtual Machines(VM) and its associated Disk if it has one. Therefore I need the VMs name and the Disk name for each, which can be done in many other ways, but this is for a specific example.

You could pass in the parameters as lists for VM names and Disk names as per below example, but then if a VM doesn’t have a disk or has many disk, the indexes would not line up.

	- Name: virtualMachineNames
	    Type: object
	    Default: ['vm1','vm2']
	- Name: diskNames
	   Type: object
       Default: ['vm1Disk1','vm1Disk2','vm2Disk1']

Instead the object can be just that, but the catch I found is you can’t define a strict format. Therefore, I would suggest adding a comment to the file to demo a example format. In the below example I have added a default version just for this demo.

- Name: virtualMachines
    Type: object
    Default: 
	    - Vm:
		    Name: 'vm1'
		    Disks: ['vm1Disk1','vm1Disk2']
	    - Vm:
		    Name: 'vm2'
            Disks: ['vm2Disk1'']

You can keep all of you properties on one level, which removes the ‘vm’ part and still do the looping below, but for prettiness and as close to a JSON object, I like doing it like this.

We then can loop through these properties just like we would a list of items, and then we access the properties like an object.

- ${{ each vm in parameters.virtualMachines }}:
      - task: AzureCLI@2
        displayName: Check ${{ vm.name }} Disks