As a part of a project to automate TLS ceritificates for Azure Application Gateways, I ran into an unexpected issue with Azure Functions and managed identities. Specifically, the tool I was using was attempting to fetch a managed identity token, but wasn’t able to. I had to review the code of the tool, and many pages of documentation to discover that Azure Functions provide environment variables that contain the correct endpoint information which differ ever so slightly from the standard endpoint.
Editor’s Note: Please don’t take any of this in a negative light. The functionality that would normally be provided by an SDK, couldn’t be used, and an uncommon enough implementation to be expected that a tool such as acme.sh that covers so many different providers, would have this tiny edge-case covered. It is a great tool, and I’m very happy with it.
I firmly believe that all TLS certificates should be automated, and the project I’m working on is to automatically request certificates from Let’s Encrypt/ZeroSSL and store them in Azure Key Vault so that Azure Application Gateway can use them. This is the first in what will likely be a series of posts documenting this journey.
Overview
My idea was to run acme.sh in an Azure Function, using Azure’s managed identity to authenticate with Azure Key Vault. This would allow the certificates to be stored directly in Key Vault for App Gateway to use.
Since the Azure Function would run on a schedule, this would ensure certificates are always renewed well before expiration. Fresh certificates, without any manual intervention, and no need to worry about expiration dates.
The Problem
Installing acme.sh in the azure function was quick enough, but when trying to run it with dns validation using a managed identity, I ran into the issue of being unable to fetch a token.
The error was that it was unable to connect to the managed identity token service. In most Azure environments, services access the managed identity token service via a link-local address (typically 169.254.169.254
). This endpoint is well-documented and widely used. In most cases acme.sh script would’ve been using the correct endpoint.
After SSH’ing into the Azure Function itself, I was able to confirm that it wasn’t able to connect to the metadata endpoint. It was only after reading through the python SDK docs that I discovered that I should be using the IDENTITY_ENDPOINT
environment variable to get the correct endpoint.
The Solution
Now that I discovered the correct endpoint (along with the additional header to be used for authentication), I was able to fetch the managed identity token successfully with curl. Then I had to search through acme.sh’s code to discover how it was attempting to fetch the token. I discovered that it made the same assumption I had about the hardcoded endpoint, and so I was able to quickly throw together a code change to fix the issue.
In the spirit of open-source, and to thank the authors for their work on acme.sh, I was able to send that change as a PR to fix the issue so that no one else would have to go through the same debugging process.
What’s Next?
In the series of blog posts that I plan to write about the system design for auto-renewal of TLS certs using ACME for Azure App Gateways, I’ll dive deeper into the actual implementation of the certificate automation process, and how I deeply integrated it into the Azure ecosystem.
Stay tuned if you’re interested in learning more about how to automate TLS certificates for Azure Application Gateways.