Governance in Infrastructure as Code

Governance with respect to cloud providers has become a rather mature set of principles. Much of this began with the adoption of SaaS solutions, like Office 365; in particular, Microsoft Teams. The necessity for governance emerged when it was identified that we have consistently repeated the same playbook leading to negative consequences. Many of our tools in the past have led to issues of sprawl and manageability. File shares were one of the first instances that have been of consequence. Any need to change how file sharing is done faces many roadblocks based on technical debt. Users get to create folders that live within the framework of permissions we have established. We lack many details about the life of these files. This pattern also existed with Exchange Public Folders and Sharepoint sites (Sharepoint sites still suffer from this when they’re not used as merely a component of Microsoft Teams). We also faced this challenge with virtual machine sprawl.
Microsoft Teams provided us with this opportunity to address these concerns. Controls can be introduced. Use cases can be identified. Processes to allow users to get what they need while living with the controls can be introduced. The lifecycle of the teams can be managed.
These concepts were carried forward to our IaaS/PaaS workloads with the cloud providers. Establishing tagging and naming conventions to maintain situational awareness with respect to the resources is an easy win. Establishing controls related to costs is necessary and valuable. Planning architectures for the overall platform that introduce security concerns provide confidence and comfort in daily operations. On and on, these practices allow for the platform to be effectively and efficiently wielded by customers to achieve their goals.

Stumbling

Infrastructure as Code (IaC) has offered a promise that has been difficult for many organizations to fully realize. We can model our desired infrastructure as code that becomes a self-documenting historian for our infrastructures. We can track the changes, attribute them to the actors and reasons the actions were taken. We can do whatever it is that our imaginations can muster.
The problem seems to be that the staircase leading to this ideal is missing a step and organizations stumble. Adopting these practices is very attractive and getting started is not an insurmountable task. But as with many initiatives, the steam is lost and the priority for the roadmap slips.

IaC Needs Governance

Just like the cloud platforms that are deployed and managed by IaC, the platforms used in the practice of IaC also need governance. Many poor practices are introduced in the beginning with massive monorepos (some types of monorepos are good, others are horrible). For instance, establishing a repo for an application and that repo contains everything regarding the application: the underlying infrastructure, the compute workloads, and the applications. A singular repository for this practice now becomes a problem. A pipeline for the repository executes the tooling that deploys and manages the infrastructure. Who can contribute to the repository can assume the privileges of the pipeline. The pipeline assumes a specific identity which means anyone contributing code is assuming that identity.
An application necessarily will require multiple repositories. If the underlying infrastructure, like the network, subnets, route tables, and access controls, is defined within the same repository as the application code, then the application owner has the ability to change the various controls imposed by the underlying infrastructure. An application needs multiple repositories that work together to achieve the deployment. Each repository should have a core mission and only allow commits by the necessary parties. This is an IaC component of least privilege… we need to separate the repositories by areas of concern. Also, rate of change is another principle that can contribute to how many repositories constitute the definition of an application deployment. The various repositories become layered and the rate of change should decrease closer to the foundation and increase towards the apex.

What About Trust?

One argument against this maybe be based in complexity or effort. This comes up frequently. I have done worked in the defense industry which holds many of the principles regarding security necessarily higher than other industries. Different enclaves would be established to isolate entire networks from each other and the discussion would inevitably move to establishing a new Active Directory forest in the enclave. People would start arguing that it is such an administrative burden to have these additional forests. However, the purpose of the forest is to reduce administrative burden. Without the forest, each system would be managed independently within is surely a heavier burden than doing so through Active Directory. This same principle holds true when breaking application deployments into multiple repositories.
However, if the discussion cannot move past the perceived increase to administrative burden, that is often used along with an argument of trust. We trust our application owners, right? We only hire trustworthy developers with good intentions. We use background checks! We have worked with these people for years! They’re not just our colleagues, they’re our friends! We broke down the wall between developers and operations!
This is also not a valid argument. We establish the controls not because we trust the intentions, but that we also protect from accidents and breaches. We also need legally defensible attribution. If a breach occurs and we don’t introduce such controls, we will have a more difficult time defending our friends from the possibility that they were at fault.