Authenticating with Azure Databricks Service Principal

While working with Azure Databricks, I have discovered a few different ways to authenticate with the Service Principal. Some of these are documented and some I have found through a lot of searching, but none I have found a quick easy guide on how to action them all. 

Setup 

Step one is of course make sure you have your Service Principal created, which I won’t walk you through as it is down to your policy how it is created. Here is then general guide by Microsoft Azure: Register a Microsoft Entra app and create a service principal – Microsoft identity platform | Microsoft Learn 

Assuming that you already have your Databricks Workspace and Account created, then we can login to the Accounts portal by going to https://accounts.azuredatabricks.net. We can now give the Service Principal access to the Account by adding it to the Account Console and setting it as `Account Admin`. 

Reference: Manage service principals – Azure Databricks | Microsoft Learn 

  1. As an account admin, log in to the account console
  2. In the sidebar, click User management
  3. On the Service principals tab, click Add service principal
  4. Under Management, choose Databricks managed or Microsoft Entra ID managed
  5. If you chose Microsoft Entra ID managed, under Microsoft Entra application ID, paste the application (client) ID for the service principal. 
  6. Enter a name for the service principal. 
  7. Click Add

You may then need to add the Service Principal to the Workspace as well, but if it has Contributor rights on the Workspace then it will automatically be added to the `admin` group in that Workspace. This can save you manually add it and gives it the most rights required for deployments. 

Authenticating 

These are some of the techniques to then authenticate with the Databricks CLI and REST API. 

Databricks CLI 

With Databricks, you can authenticate with the CLI using the `.databricksconfig`. With this file you are describing the authentication, which the CLI will automatically pick up. 

This is the format used for Service Principal authentication from Azure, but it can also support other user type techniques and authentication methods. 

[<profile name>] 
Host = <databricks URL> 
Azure_tenant_id = <Tenant GUID> 
Azure_client_id = <Service Principal Client ID> 

In the example `profile name` is a user defined name for the target you are authenticating against. You can specify which is the target when running the CLI commands with `-p <profile name>`. If you set it to `DEFAULT` then this would be the profile used if no profile is added in the command. 

The other parameters should be obvious what the values should be. I have used a federated authentication for the Service Principal, so there has been not requirement to add the client secret. If you are not using that then you will need to add `azure_client_secret`, but this will be stored in the file in plain test so you will need to be aware of security. 

If you are accessing the Account Console instead then instead of the Databricks Workspace URL in `host` you would put `https://accounts.azuredatabricks.net` 

Reference: configure command group – Azure Databricks | Microsoft Learn 

Databricks REST API 

To authenticate with the REST API, you can use the Azure CLI/REST API to create a PAT token. First you will need to have a method to authenticate with the Azure CLI or REST API with the Service Principal created before. This would be a good place to start Sign in with Azure CLI using a service principal | Microsoft Learn. 

Once signed in then you can request an authentication token using the Databricks Service ID as per

$accessToken = $(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv)

Once generated you can use it in the REST API using it as a Bearer Token. In the below example we are listing the clusters within a target Databricks Workspace. 

$accessToken = $(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv) 

$headers = @{ 
    Authorization = "Bearer $accessToken" 
} 

$databricks_url = "<databricks URL>" 
$uri = "api/2.0/clusters/list" 

$response = Invoke-RestMethod -Uri "$databricks_url/$uri" -Headers $headers -Method Get 

$response 

Reference: Get Microsoft Entra ID tokens for Microsoft Entra ID service principals by using the Azure CLI – Azure Databricks | Microsoft Learn 

Databricks Terraform Provider 

Databricks provides a Terraform Provider to access either the Workspace or the Access Console. Each can be setup at the same time with their respective parameter. To authenticate with this, you can use the .databricksconfig as mentioned above, pass the parameters like the configuration file into Terraform or simply sign into the Azure CLI using the preferred Service Principal before calling the Terraform. 

I have gone for the latter, as I would be signing into the Azure CLI for the Terraform run either way, so it saves me from further authentication requirements part of the deployment. 

Once authentication is done you can setup the provider. In the below example, we are assuming the Databricks Workspace is setup in another Terraform run, so we  are using the data resource from the azurerm provider to get the Workspace details. 

If you would like to authenticate to the Account Console then you will also require to get the Account ID. You can get this from signing into the Account Console, and then in the top right clicking on your icon it will show the Account ID (Manage your Azure Databricks account – Azure Databricks | Microsoft Learn)

terraform {  
  required_providers {  
    azurerm = {  
      source = "hashicorp/azurerm"  
      version = "=3.114.0"  
    }  
    databricks = {  
      source = "databricks/databricks"  
      version = "1.50.0"     
    }     
  }  
}  

provider "azurerm" {   
}  

data "azurerm_databricks_workspace" "this" {  
  name = "<workspace name>" 
  resource_group_name = "<workspace resource group name>"     
}  

provider "databricks" {  
  alias = "workspace"  
  host = data.azurerm_databricks_workspace.this.workspace_url  
}  

provider "databricks" {  
  alias = "account"  
  host = "https://accounts.azuredatabricks.net" 
  account_id = "<account ID>"  
}  

Reference: Docs overview | databricks/databricks | Terraform | Terraform Registry 

Published by Chris Pateman - PR Coder

A Digital Technical Lead, constantly learning and sharing the knowledge journey.

Leave a message please

This site uses Akismet to reduce spam. Learn how your comment data is processed.