Understanding Databricks RBAC: Grants, Permissions, and Entitlements

Documentation on Databricks Grants, Permissions and Entitlement can get complex fast, especially switching between AWS, Azure and Databricks versions. When I was then looking into implementing the RBAC design for Databricks, I started looking into the different access approaches for groups. I found you needed to be a full functioning Data Engineer using this product to understand the difference and how/when you implement these as a DevOps Engineer. This guide is for the Azure DevOps Engineer to understand how to implement these using Terraform with best practices, but the concept can still be used across other Databricks hosting platforms.

One thing you will notice is a lot of Terraform and Databricks CLI terms reference the old names for Databricks components compared to the UI (At least in Azure). For example you will notice in Databricks UI there is Workflows, but in Terraform and the CLI it is called Jobs. This is in a few examples below as well.

Access Types

Let’s start of by explaining the different between Entitlements, Grants, and Permissions.

Entitlements are Boolean permissions for users and groups on their access and high level permissions in the workspace. This covers access, ability to create unrestricted clusters, create instance pools and access to SQL. One thing to note, as of now as this is written, in the Terraform the property on the resource for entitlement is `allow_cluster_create` which in Databricks it translates to allow create unrestricted clusters, not to allow general creation.

Permissions are the different options a user or group can do at the Workspace level. These are things list Clusters, Workflows and Secrets. The key with these is they are scoped to the Workspace, so anything applied stays in that particular Workspace.

Grants are then the other side are permissions to items stored in the Metastore and so the they are applied at the Account Console level. This has impact that access and permissions given to items in the Metastore carry over to every Workspace you have access to. There are some differences to the effect of this, for example grants on the External Locations means you will see all of them in every Workspace, but grants on Catalogs you will only see if the Catalog is bound to that Workspace.

Access Requirements

The goal we have is to keep Environment, Workspace and Group isolation. This will keep users from having cross permissions which would give them access they should not have. To do this I used groups as the foundation to assign access and set these up per environment. This results in a group per persona, per environment and per workspace. One of the catches of the Grants is you can only attach them to the Account Level groups, so I wanted to keep the Workspace Permissions separate from the Account Level Grants. To do this I further split the groups to have one for Account Level and one for Workspace Level. The final isolation I had was for the Workspaces. In some organizations you might have multiple Workspaces in the same environment and all Account Level groups live in the Account Console, so breaking these up gives clear sign to which Workspace they align to.

The above isolation brings me to the next naming convention for the groups.

<environment name>_ <workspace name>_<level indicator>_<persona name> 

For example with the Development environment, Workspace called pateman and the DevOps persona we would get these two groups.

dev_pateman_acc_devops
dev_pateman_ws_devops

dev = environment Development
pateman = Workspace name
ws/acc = Workspace/Account
devops = Persona

Also note that permission on the Account Level groups carries over to all Workspaces. Hence, if a Catalog is visible in two Workspaces and that particular group has permissions set on the Account Level, then they can do the same actions in both Workspaces.

Implementation of Groups

First we should setup both the providers for the Workspace level and the Account level. I would recommend authenticating with either Azure CLI login before running Terraform or the databricks.config file using the Service Principal.

Reference: Docs overview | databricks/databricks | Terraform | Terraform Registry

provider "databricks" {   
  alias         = "workspace" 
  host          = data.azurerm_databricks_workspace.this.workspace_url 
} 

provider "databricks" {
  alias         = "accounts"
  host          = "https://accounts.azuredatabricks.net"
  account_id    = "00000000-0000-0000-0000-000000000000"
} 

Variables

This is an example of how you can format the `groups` variable containing all the settings plus other supporting variables. I have not added the properties for Entitlement, Grants or Permissions, as these would depend on how you would like to split them and apply, as I will describe later.

variable "groups" { 
  description = "Groups and their access rights" 
  default = [] 
  type = list(object({ 
    name = string 
    entitlements = object({ 
      <entitlement-name> = string
    })
    grants = object({ 
      <grant-name> = string 
    })
    permissions = object({ 
      <permission-name> = string 
    }) 
  })) 
} 

variable "environment" { 
  description = "Environment Name" 
  type = string 
  default = "dev" 
}

Workspace Groups

The Workspace groups are easy to implement using the `databricks_group` resource and setting the provider for the `workspace`. We have assumed to get the Workspace details from a data resource using the AzureRM provider.

resource "databricks_group" "workspace" { 
  provider = databricks.workspace 
  for_each  = { for group in var.groups : group.name => group} 

  display_name               =  "${var.environment}_${data.azurerm_databricks.workspace.this.name}_ws_${each.value.name}" 
} 

Account Groups

With the Account groups, we need to first create them with the accounts provider and then bring them into the Workspace. When they are created they do not have a link into the Workspace, so if you was to go to the Workspace UI you would not be capable of searching for them in the Workspace.

resource "databricks_group" "account" { 
  provider = databricks.accounts 
  for_each  = { for group in var.groups : group.name => group} 

  display_name               =  "${var.environment}_${data.azurerm_databricks.workspace.this.name}_acc_${each.value.name}" 
} 

resource "databricks_mws_permission_assignment" "account_to_workspace" { 
  provider = databricks.workspace 
  for_each  = databricks_group.account 

  workspace_id = data.azurerm_databricks.workspace.this.workspace_id
  principal_id = each.value.id  
  permissions  = ["USER"]
}

Assign Access

You would have noticed in the group variable I left the 3 different types of access blank. This is because it depends on your access model and what you are building.

For example if you don’t have SQL Warehouse then you don’t need to give permissions to this. The other difference is if you are doing a flat access on all resources or direct access per resource. For example if you are creating 3 Catalogs, are you giving the group all the same Grants on them all, or are you having different Grants on each.

Either way to this, I would recommend creating a group for your Service Principals and then granting them all the access required. Most of the Grants you can go down the route of giving the `All PRIVIALLGES`.

I would also recommend a group is assigned as the Owner to the Databricks resources like Catalogs and External Locations. There can be times you need to use the CLI or changes in the UI to correct deployment issues. If you are not Admin this will stop you from running certain commands. You can also give a group Metastore Admins, which will be even more power across all workspaces, but setting the Owner keeps it isolated to each Workspace.

Published by Chris Pateman - PR Coder

A Digital Technical Lead, constantly learning and sharing the knowledge journey.

Leave a message please

This site uses Akismet to reduce spam. Learn how your comment data is processed.