Emergency Access Done Right: AWS Break Glass Policy Explained

Emergency Access Done Right: AWS Break Glass Policy Explained

Emergency Access Done Right: AWS Break Glass Policy Explained

Krzysztof Wiatrzyk

Picture this…

It's 2 AM, most people are asleep, and your phone starts shaking like crazy!

Oh no! It looks like the production database has been deleted (it should never happen like that, but you never know!). No worries, though! We can restore it from a snapshot, but you'll need the necessary AWS permissions. Typically, database creation/modification/deletion should be handled through our Infrastructure as Code (IaC) Continuous Integration/Continuous Deployment (CI/CD) tools, like Atlantis for Terraform, which have the required AWS permissions granted through an IAM role.

As pipelines take on a more prominent role in the software development lifecycle in a DevOps model, the necessity for extensive human access to environments decreases. Human users should be granted minimal access necessary for their role, which is usually read-only access that does not allow any modifications or access to sensitive data. For experimentation which is typically hands-on and exploratory, teams should be granted access to sandbox environments which are isolated from system workloads.

~ AWS Well Architected Framework: Limit human access with just-in-time access

However, IaC with CI/CD requires both a pull request (PR) creator and a PR approver - always the two, like a master and an apprentice.


If you are the only person awake at night, making fixes through CI/CD becomes impossible (no one can approve a PR, <panic-mode-on>). So, what can be done in this situation?

If an incident occurs in the middle of the night and we lack the necessary permissions and cannot reach anyone for assistance, a break-glass policy comes into play. It’s like breaking the glass to access a fire extinguisher during an emergency. In this case, "breaking the glass" means using an account or role with elevated privileges/permissions.

In AWS, "break glass" refers to a method of granting emergency access to resources when standard access methods are unavailable or insufficient. It's a security measure designed to provide temporary, elevated access during critical situations like security incidents, system failures, or unavailability of key personnel. This is typically achieved by assuming a predefined "break glass" role with specific permissions, often through a trust policy, that allows access to otherwise restricted resources. 

So you "broke the glass" and used this powerful role to restore the database. You're the hero, but what's next?


Post Mortem

It is essential to clearly explain the reasons for using this role and the actions performed with it. This is necessary for auditability, as this role can have the capabilities to DELETE data or CREATE additional access. Therefore, it should only be utilized in emergency situations.

It’s important to avoid using this role for everyday tasks. Mistakes can occur; for example, you might think you are working on the staging AWS account and accidentally delete a database, only to realize later that you were actually in the production environment (have you heard how Gitlab did it once?)

In this blog post, you will learn how to establish a break-glass policy and set up Slack alerts for its usage. The remainder of the blog post will delve into technical details.

It's worth noting that implementing Break Glass access is one of the foundational elements of the AWS Well-Architected Framework.

Emergencies or unforeseen circumstances might necessitate temporary access beyond regular permissions for day-to-day work. Having break-glass procedures helps ensure that your organization can respond effectively to crises without compromising long-term security. During emergency scenarios, like the failure of the organization's identity provider, security incidents, or unavailability of key personnel, these measures provide temporary, elevated access beyond regular permissions.

~ AWS Well-Architected Framework: Implement break-glass procedures


How to implement Break Glass in AWS

Case A: IAM Users

If you use IAM users to access your AWS accounts, stop using them right now. Configure SSO for your account, and get rid of long-lived credentials.

Case B: SSO

In the Single Sign-On (SSO) context, a Permission Set refers to an Identity and Access Management (IAM) role created on target AWS accounts with specific permissions. This role is intended for users on those AWS accounts.


For example, a single user may have the following roles available for use on the "production" account:

  • devops

  • devops-break-glass

Each role comes with a different set of permissions. For instance, the "devops" role may have read-only access to databases:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "rds:Describe*",
        "rds:List*",
        "rds:View*"
      ],
      "Resource": "*"
    }
  ]
}

The "devops-break-glass" permission allows users to create, update, or delete RDS databases:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "rds:*",
      "Resource": "*"
    }
  ]
}


Although removing the database in case of an emergency may not sound like something you would do, you can imagine any other scenario: EC2 SSM shell access, purging the SQS queue, removing a file from the S3 bucket, or even reading files from the S3 bucket.

Notifications

It's crucial to configure notifications when the break-glass role is used. Those notifications can be used to implement a process requiring a person using that role to explain the need to use it.

There are many ways to achieve that; I would like to show you one based on EventBridge and API Destinations - both using AWS UI and Terraform (resources from Terraform module).

  1. Create Eventbridge Connection to Slack API

For Authorization key use Slack Bot Token. Eventbridge will store this key in AWS Secrets Manager.

# Terraform
resource "aws_cloudwatch_event_connection" "this" {
  name               = var.name
  description        = "Slack"
  authorization_type = "API_KEY"

  auth_parameters {
    api_key {
      key   = "Authorization"
      value = "Bearer ${var.slack_bot_token}"
    }
  }

  lifecycle {
    ignore_changes = [
      auth_parameters[0].invocation_http_parameters,
    ]
  }
}

  1. Create API destination

# Terraform
resource "aws_cloudwatch_event_api_destination" "this" {
  name                             = var.name
  description                      = "A break glass alert to Slack"
  invocation_endpoint              = "https://slack.com/api/chat.postMessage"
  http_method                      = "POST"
  invocation_rate_limit_per_second = 300
  connection_arn                   = aws_cloudwatch_event_connection.this.arn
}

For the connection ARN, use the connection created in step 1. It will automatically match the authorization method configuration in EventBridge Connection.

  1. Create an EventBridge rule to catch all usages of Break Glass roles.

Event Pattern:

{
  "detail": {
    "eventName": ["AssumeRole", "AssumeRoleWithSAML"],
    "eventSource": ["sts.amazonaws.com"],
    "requestParameters": {
      "roleArn": [{
        "wildcard": "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/eu-west-1/AWSReservedSSO_break-glass_*"
      }]
    }
  },
  "detail-type": ["AWS API Call via CloudTrail"],
  "source": ["aws.sts"]
}
# Terraform
resource "aws_cloudwatch_event_rule" "this" {
  name        = var.name
  description = "Detect when provided roles were assumed"

  event_pattern = jsonencode({
    source      = ["aws.sts"]
    detail-type = ["AWS API Call via CloudTrail"]
    detail = {
      eventName   = ["AssumeRole", "AssumeRoleWithSAML"]
      eventSource = ["sts.amazonaws.com"]
      requestParameters = {
        roleArn = [
          for role_arn in var.roles_arns : {
            "wildcard" = role_arn
          }
        ]
      }
    }
  })
}

Target:

  • use API destination Type and select the API Destination created in step 2

# Terraform
resource "aws_cloudwatch_event_target" "this" {
  rule      = aws_cloudwatch_event_rule.this.name
  target_id = "Slack"
  arn       = aws_cloudwatch_event_api_destination.this.arn
  role_arn  = aws_iam_role.this.arn

  input_transformer {
    input_paths = {
      account = "$.account"
      region  = "$.detail.awsRegion"
      role    = "$.detail.requestParameters.roleArn"
      time    = "$.detail.eventTime"
      user    = "$.detail.requestParameters.principalTags.UserName"
    }
    input_template = templatefile("${path.module}/data/slack_message.json", {
      environment   = var.environment
      slack_channel = var.slack_channel
    })
  }

  // The plan shows changing empty structs to null, which is not a valid change.
  lifecycle {
    ignore_changes = [
      http_target,
    ]
  }
}

Input Transformer (very important! Here we are preparing a message for Slack):

{
    "channel": "#break-glass-audit",
    "blocks": [
        {
            "type": "header",
            "text": {
                "type": "plain_text",
                "text": "🚨 Break Glass Role Assumed 🚨",
                "emoji": true
            }
        },
        {
            "type": "section",
            "fields": [
                {
                    "type": "mrkdwn",
                    "text": "*Account:*\n<account>"
                },
                {
                    "type": "mrkdwn",
                    "text": "*Environment:*\nREPLACE_ME"
                },
                {
                    "type": "mrkdwn",
                    "text": "*Region:*\n<region>"
                },
                {
                    "type": "mrkdwn",
                    "text": "*Role:*\n`<role>`"
                },
                {
                    "type": "mrkdwn",
                    "text": "*User:*\n<user>"
                },
                {
                    "type": "mrkdwn",
                    "text": "*Time:*\n<time>"
                }
            ]
        },
        {
            "type": "divider"
        },
        {
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": "📣 *This role grants elevated permissions and should only be used in emergencies.*"
            }
        },
        {
            "type": "context",
            "elements": [
                {
                    "type": "mrkdwn",
                    "text": "📝 *Action required:* <user>, please provide a brief justification for using this role."
                }
            ]
        }
    ]
}
# Terraform
# Store above file in the module in `data/slack_message.json`

And click save.


Now, whenever the "Break Glass" role is assumed, we will receive a Slack notification, which looks like this:

Thanks to using EventBridge, we also get monitoring of how many times a Break Glass role was assumed in the past:


Wait, there's more!

At this point, you might identify a critical security concern: a single individual can assume an overprivileged role without any oversight or approval. The system only generates audit logs after the role has been assumed, leaving you to investigate and request justification retroactively.

This approach follows the "it's easier to ask forgiveness than permission" philosophy - a pattern that may introduce unacceptable risk depending on your organization's security posture and compliance requirements.

A more robust solution requires an approval workflow before granting access to these highly privileged roles. Following the principle that "with great power comes great responsibility," we plan to enhance our break-glass mechanism:

  1. Request and Notification: When a user requests break-glass role access, the system automatically sends a notification to a designated Slack channel.

  2. Approval Workflow: The Slack notification includes an approval button. Only authorized approvers - such as designated managers or security team members - can approve the request. Once approved, the user gains temporary access to the break-glass role.

This approach maintains emergency access capabilities while adding a critical approval layer that improves auditability, accountability, and security governance.

Learn more


Appendix #1 - Terraform Module

The following files can be added to the above ones to create a fully working Terraform module:

# variables.tf
variable "slack_bot_token" {
  description = "Slack Bot Token for API Destination"
  type        = string
  sensitive   = true
}

variable "name" {
  description = "Name of the Break Glass Alert"
  type        = string
}

variable "roles_arns" {
  description = "List of ARNs of roles to monitor for break glass usag, wildcards are supported"
  type        = list(string)
}

variable "environment" {
  description = "The environment name"
  type        = string
}

variable "slack_channel" {
  description = "Slack channel to send break glass alerts to"
  type        = string
  default     = "#break-glass-audit"
}
# role.tf
# Execution role for Eventbridge
resource "aws_iam_role" "this" {
  name = var.name

  assume_role_policy = data.aws_iam_policy_document.assume.json
}

resource "aws_iam_role_policy" "api_destination" {
  name   = "ApiDestination"
  role   = aws_iam_role.this.id
  policy = data.aws_iam_policy_document.permissions.json
}

data "aws_iam_policy_document" "assume" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["events.amazonaws.com"]
    }
  }
}

data "aws_iam_policy_document" "permissions" {
  statement {
    effect = "Allow"
    actions = [
      "events:InvokeApiDestination"
    ]
    resources = [
      aws_cloudwatch_event_api_destination.this.arn
    ]
  }
}

And this module can be instantiated like this:

data "aws_ssm_parameter" "slack_bot_token" {
  name = "/break-glass-slack-bot-token"
}

module "break_glass_alert" {
  source = "../../modules/aws/break-glass-alert"

  name = "break-glass-alert"

  environment     = var.environment
  slack_bot_token = data.aws_ssm_parameter.slack_bot_token.value

  roles_arns = [
    "arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_break-glass_*",
  ]
}

Want to expand the topic?

Want to expand the topic?

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free
consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.