DevOps Access Controls for CI/CD, GitOps, and More

DevOps access controls

Over the last few years, the term DevOps and DevSecOps (which stand for Developer Operations and Developer Security Operations respectively) have become synonymous with companies trying to become more agile and less monolithic.

For decades, software development and IT operations have remained isolated silos in companies across the globe. So when companies need to launch new software products, large IT initiatives, update existing applications, etc. it would take weeks and months of planning and coordination between all these disparate groups. Then, during that deployment (which always seems to happen over a long weekend!), any number of issues could arise that might derail and set back the whole project. Ah! Back to the drawing board and re-planning for another release in a few weeks or months! This cycle of long planning, long releases, and more still goes on in organizations today. Maybe your organization is still in the midst of ‘agile transformation’ and suffers from these types of processes.

As you can see, this method doesn’t work too well in today’s hyper-agile world where startups can come and go within days. Scrappy startups can spin up hundreds of VM’s, Kubernetes clusters, and large-scale databases within minutes on cloud providers like AWS, Azure, GCP, Oracle Cloud, and more. If a company is still locked into the old-school way of deploying and managing resources, they are quickly being lapped by newer companies or older companies that have embraced the DevOps style of thinking.

Once the culture of DevOps has permeated a company and once their processes start to migrate from a monolithic/slow deployment model to continuous integration and continuous deployment (the CI/CD in DevOps), it’s easy to operate faster — deploy faster, release faster, make changes faster, etc.

The data shows that companies that are more agile and deploy more rapidly are able to manage change and get their products/services to market faster. The DevOps Research & Assessment (DORA) group (which is a part of Google Cloud) has been conducting a State of DevOps report for a number of years and has been surveying companies and putting this data into the State of DevOps report. It has shown that companies embracing the concept of DevOps and DevSecOps are seeing dramatic improvements across the board. This, in turn, enables those organizations to respond to market changes faster and therefore makes for happier customers.

This table illustrates how DORA defines Elite vs. Low-performing teams when it comes to a software delivery performance metric (this data is from their 2021 State of DevOps report).

Elite Low
Deployment frequency On-demand (multiple deploys per day) Fewer than once per six months
Lead time for changes Less than one hour More than six months
Time to restore service Less than one hour More than six months
Change failure rate 0%-15% 16%-30%

Comparing the two teams, DORA has found that Elite performers have 973x more frequent code deployments, and 6570x faster time to recover from incidents. That is a huge gap and shows why organizations are striving to integrate more agile processes into their DevOps lifecycle.

So what if your team has integrated tools like Teleport to help manage access and control to your databases, Kubernetes clusters, virtual machines, and more? How can you continue this DevOps/DevSecOps way of thinking without causing speedbumps into your process? Often when a new tool gets put in place, it can disrupt current processes and cause all kinds of issues! Even when teams implement a tool like Teleport, they might not escape these disruption issues.

Let’s talk about this and go through a few ideas on how you can go faster yet still continue to provide compliance and auditing for your team using Teleport. In the end, proper implementation of a tool like Teleport should help your team remain agile and increase your security.

First up, a conversation around bastion hosts.

Why you should eliminate those home-grown bastion hosts

If you are not using a bastion or jump host to mitigate and secure remote access to your environments, then stop reading right now. Go here and set up your first Bastion host to start isolating and securing your resources now!

Seriously though, organizations need to have some sort of bastion environment setup to gate access to end resources, applications, and servers. Bastion hosts are great for narrowing the exposure of your servers and applications. For DevOps teams, restricting access to end devices by using Bastion hosts lowers the potential exposure of systems and applications and makes life a little bit easier. You can control access to the bastion, and from there, you can limit who can see what on the other side. The problem with bastion hosts is that they can be challenging to scale, especially for fast-growth companies. Every bastion you stand up now has to be patched, have its keys rotated, vulnerabilities scanned, etc. The bigger your network becomes, the more bastions you potentially need to deploy. And how do you gate access to your bastion hosts? VPN? Whitelist IP addresses? Home-grown access solutions? In my day-to-day conversations with SRE’s, sysadmins, and security teams, I hear a number of stories of how they have outgrown their bastion setups and are looking for better ways to manage/gate access to remote resources.

Leveraging a tool like Teleport reduces the need for all these home-grown bastion hosts. Teleport proxy nodes, which act like bastion hosts, are stateless and can scale up and down as needed. No worries about local users, keys, etc. Access to these resources is handled by the Auth server and x509 or SSH certificates for short-term access. On top of that, when you tie your Teleport cluster into your identity provider (IDP) or SSO provider (such as Okta, Google, Azure AD, etc.), you now can pull all the user information from a single source of truth and manage granular access using detailed RBAC (Role-Based Access Control) roles and permissions.

Take a look at our architecture documents and see how we’ve designed Teleport to help teams manage access and eliminate all the legacy bastion hosts.

Automate your Infrastructure as Code (IaC)

Let’s segue into our second topic: Infrastructure as Code (IaC). Infrastructure as Code involves using software tools to define your infrastructure through a declarative process. As your environments grow and scale out, it becomes more critical to start leveraging IaC processes to help manage these resources. If you are manually configuring VMs, VPCs, users, and environments across your different cloud environments (or even on your on-prem environments), then you need to adopt some form of automation! There is a bit of a learning curve when picking up these tools if you have never used them before, but once you start using them consistently, you’ll wonder why you never did it sooner. Many organizations are using IaC along with GitOps to make things even more automated and seamless!

When it comes to using IaC, there are a large number of open source projects and commercial tools that can help you accomplish automating your infrastructure. Below is just a small selection of some of the more popular tools in no particular order:

If you are new to IaC and automation, the sheer abundance of available tools out there can be overwhelming, but when you pick a tool that fits your organization and use cases, then you really cannot go wrong with what you have selected. My recommendation? Pick a small project and dive into any or all the different tools and try to automate something small. Get a feel of how those tools work and see what level of technical expertise or learning you might have to pick up to be successful at using that tool.

Now how does Teleport fit in with IaC? One way is to leverage the Teleport Terraform plugin. Our team here at Teleport heavily relies on Terraform, from managing internal users to provisioning Teleport clusters, to providing examples for users to stand up their own Teleport instances. If you want to see how we leverage Terraform and IaC in one way internally, then check out this blog post by Travis Gary our Director of IT about using Terraform for Okta Directory security hardening.

The Teleport Terraform plugin was built to help teams manage their Teleport resources and is constantly being improved upon. You can leverage the plugin to configure roles, local non-interactive users (CI/CD bots, etc.), provision tokens, and more. To get started, read our documentation here.

With good IaC practices, you can store your roles and other Teleport work within GitHub (or any other Git-based version control system) and leverage the Terraform module to update your cluster. No more manual editing of YAML or updating each entry from the CLI one at a time! This helps accelerate your adoption of Teleport and makes it easier for change requests to come through as well as providing an audit trail from your VCS on deployments, pull requests, and more.

Lastly, let’s look at leveraging Teleport for tools like CI/CD!

Access controls using non-interactive users for CI/CD processes

One of the big tenets of using DevOps/DevSecOps principles is that you need to stop doing manual builds, integrations, and deployments. Having someone manually kick off a build, run tests, push to UAT, etc. is time-consuming and prone to human error. Automating these processes can rapidly speed up your deployment times and render the development process more agile.

The question that we do hear from customers is: Now that we’re using Teleport, how can we integrate Teleport into our CI/CD processes?

We use the idea of impersonation within Teleport and create non-interactive users that can then be leveraged with CI/CD tools like Jenkins, CircleCI, Bamboo, GitHub Actions, and more.

If you would like to dig into our documentation around impersonation then start here. In our documentation, we use Jenkins as the example, but you can easily transition this to pretty much any CI/CD platform or any system that requires non-interactive users.

For my example here, I’m going to be illustrating this using GitHub Actions.

What we are doing is creating a unique non-interactive user with limited roles and permissions granted and assigned by Teleport. We will export out a short-lived certificate (a 30-day or less TTL is recommended) and integrate that within our CI/CD build process. Then as our CI/CD build fires off, it will leverage this user and validate their certificate. If it’s valid, then the process will continue and Teleport will audit/log this information. If the certificate is invalidated, then your CI/CD process will fail until a new cert is created and the build system has been updated with the new credentials.

Security note: For the purposes of illustrating how Teleport works with impersonation, my examples below are probably not 100% adhering to GitHub Actions best practices and security. Please review the Security hardening for GitHub Actions documentation if you intend to use this in production or with any publicly facing repositories.

Ok, let’s jump right in!

You have a Teleport user that has a defined impersonation role. This role allows the user to create non-interactive users. In my example here, my user allen, has the impersonator role assigned already.

Impersonator example

When I look at the impersonator role (YAML) in the Teleport UI, you can see what a user with the impersonate role can and cannot do:

Impersonator example

This user has the ability to impersonate the githubactions user and role.

Let’s look at the githubactions role YAML:

GitHub Actions role example

With this role there, are a few things I want to point out:

  1. This user can ssh in using the `ubuntu` login. For our demo purposes with GitHub Actions, I’m just going to show how that works. In your scenario, you would want to set this to how your CI/CD systems work/deploy, etc.
  2. I have a 240h TTL. The certificate would be valid for 10 days at the most.

Next, I need to export the certificate for the githubactionsuser so that I can incorporate it into my CI/CD workflow. I run the following command to export out the necessary certificates:

tctl auth sign --user=githubactions --out=githubactions --format=openssh --ttl=240h

(The tctl tool is the Teleport cluster administration tool. You can learn more about it here.)

With this command, I am exporting out an identity file for the user githubactions with the name of githubactions and with the openssh format.

You should see the following in the directory where you exported out the identity files:

CI/CD impersonation

The githubactions private key and the signed SSH certificate. You can view the details of the certificate with the following command:

ssh-keygen -L -f githubactions-cert.pub -text

The output will show information about the certificate, extensions, principals, validity date, etc.

Now that we have our 10-day certificate (240 hours), we need to get this up to our CI/CD system. For GitHub Actions, I have it set as a secret/variable. You can do this from within the GitHub UI or using their command-line tool (gh).

CI/CD impersonation

The variable names can be whatever you want. For my examples, I just created ones that were easily understandable from a glance.

If you are using the gh secret method, you can set and even update the secrets using the following command:

gh secret set TELEPORT_CERT < <path_to_cert.pub>
gh secret set TELEPORT_PRIVATE_KEY < <path_to_key>

More information on the GitHub CLI tools can be found here.

Ok! We have our 10-day certificate exported out from Teleport and imported into GitHub Actions. Let’s test it out!

In your repository, you will need to modify your GitHub Actions workflow YAML file. In my instance, it looks like this:

...SNIP...
- run: |
         echo "Configuring the environment."
         mkdir ~/.ssh
         cat >>~/.ssh/config <<EOF
         Host *
           StrictHostKeyChecking no
           UserKnownHostsFile=/dev/null
         EOF
     - run: 'echo "$SSH_CERT" > ~/.ssh/githubactions-cert.pub'
       shell: bash
       env:
         SSH_CERT: ${{ secrets.TELEPORT_CERT }}
     - run: 'echo "$SSH_KEY" > ~/.ssh/githubactions'
       shell: bash
       env:
         SSH_KEY: ${{ secrets.TELEPORT_PRIVATE_KEY }}
     #- run: ls ~/.ssh
     - run: |
         echo "SSHing into remote node..."
         eval $(ssh-agent)
         chmod 400 ~/.ssh/githubactions
         ssh-add ~/.ssh/githubactions
         ssh -J [email protected]:3023 -p 3022 [email protected] 'teleport version'
     - run: echo "It works!"   

Again, this is not production-quality code! So bear that in mind if you copy and paste this as an example! How I have it set here is not necessarily how you would set it up in your environment.

When my GitHub Actions workflow runs, it will SSH into one of my existing Teleport managed nodes and get the current version of Teleport and echo it back into the GitHub Actions logs. This all happens when I run a git commit and git push to my upstream GitHub repository.

You can see where my runner is pulling down the exported identity file. This allows the runner to SSH into one of my Teleport nodes (teleport-demo-app-server) and runs a teleport version command.

Here is the successful output:

Successful output

Yes! We see that the node is running Teleport v7.3.2!

So what happens if the certificate is invalid? What will we see?

From GitHub Actions you would see the following because the certificate is expired:

Successful output

And you’ll probably get an email letting you know the build failed:

GitHub Actions email

Now if you have access to the Teleport UI and look at the audit logs, you will see something like this:

Teleport UI example

If you open up the audit entry in the Teleport UI, you will see some more details:

Teleport UI audit entry

Let’s summarize and wrap things up.

With Teleport, you can impersonate non-interactive users to leverage tools like CI/CD which are critical in DevOps environments. You leverage short-lived certificates to manage what those non-interactive users can and cannot do and you get audit and compliance because you’re running this all through Teleport.

Teleport cybersecurity blog posts and tech news

Every other week we'll send a newsletter with the latest cybersecurity news and Teleport updates.

In conclusion: Leveraging Teleport can be a powerful addition to your DevOps/DevSecOps lifecycle

We talked about how you can use Teleport for replacing your home-grown bastion hosts, using Infrastructure as Code for managing Teleport resources, and then dove a little further into using Teleport with CI/CD.

Hopefully, this showcases how flexible and powerful Teleport can be and how you can leverage its features to strengthen your DevOps and DevSecOps processes!

Integrating Teleport into your DevOps and DevSecOps processes isn’t overly complicated but does involve some re-thinking and architecting potentially new processes and workflows. Once done, then you’ll have a system that can be easily managed and provides added benefits of auditing and compliance.

If you have questions about anything covered here, then please join our GitHub discussions or our Slack community! I would love to hear from DevOps and DevSecOps teams on how they are using and integrating Teleport into their day-to-day work.

Here are a few links to get started:

Related Posts

security engineering
 

Try Teleport today

In the cloud, self-hosted, or open source

View Developer Docs

This site uses cookies to improve service. By using this site, you agree to our use of cookies. More info.