Traditionally, CMSs have been built on top of SQL and NoSQL database backends. However, the reality is that the primary capabilities of databases don’t completely align with modern CMS requirements. For example, much of today’s digital content, like images and videos, is file-based, unlike the highly structured tabular data typically found in databases.
Moreover, databases don’t support standard CMS requirements for enterprise environments today, such as sophisticated versioning, distributed and parallel multi-environment workflows, and branching. In addition, databases can be difficult to scale, distribute and secure. Shortcomings such as these are a major issue for building today’s dynamic, multichannel, high-throughput digital experiences and managing multi-disciplinary teams.
Git is the world’s leading system for source code management and has become increasingly popular for enabling CMS solutions to manage and publish content. Unlike databases, Git was explicitly designed to handle many of the gaps that databases produce when underpinning a CMS.
In this post, we explore the capabilities of Git and the many advantages Git brings as a content repository backend for a CMS. These include eliminating content freezes and other bottlenecks, providing auditing support, integrating with CI/CD tools and processes, cloud-native elastic scalability, and the DevContentOps processes.
We also highlight CrafterCMS, an open-source, Git-based and API-first headless CMS that major enterprises use to run large-scale personalized websites, global intranets, e-commerce experiences, OTT video platforms, mobile apps, and other digital experiences.
Read on and you will learn:
- The differences between Git-based and database-oriented CMS architectures
- Different approaches for using Git for content management
- How to improve developer and content publishing workflows with a Git-based headless CMS
- Integration with CI/CD tooling and DevOps processes
- How CrafterCMS leverages Git
Git vs. Database for a CMS Architecture
Most CMS platforms rely on a database architecture. While some attempts have been made to create different types of content-specific repositories, there has been limited success. One of the main reasons is that many content management systems have been too tightly coupled with specific technologies or designed by committees. As a result, SQL and NoSQL databases are the dominant choice of storage for a CMS backend.
The History of CMS
The early content management systems were rudimentary file-based platforms that could only manage and bake HTML. Following the dot com bubble, well-established systems emerged that would begin to be classified as the CMS like the open-sourced Drupal. However, all of these were SQL database-backed.
By 2007, the scale of many enterprise sites had pushed CMS systems to decouple authoring from delivery so that dynamic and personalized content could be rendered effectively at scale and allow each layer to scale independently. This meant read-only replication from the authoring servers to the delivery servers via databases and file shares for most CMS platforms.
By the mid-2010s, the need for proper multichannel support was becoming much more critical. At the same time, CMS vendors were hitting their stride with built-in tooling for marketing departments that included rich support for managing targeting, metrics, experimentation, marketing automation, etc.
By 2018 it had become clear to everyone developing for and around CMS that some significant architecture change was badly needed. Headed DXP solutions could no longer support innovation around the emerging channels, so the headless CMS took center stage.
Despite Recent Innovation, CMS Architectures Remain Fundamentally the Same
The introduction of the headless CMS architecture changed the presentation tier of the CMS, yet there has been almost no significant change in the backend since the early 2000s. From an open-source CMS like Drupal to most modern headless CMS, whether self-hosted or SaaS, the backend is unchanged.
By definition, a headless CMS focuses on the frontend, and theoretically, the approach has been possible since the introduction of REST in 2000. However, headless architecture didn’t become notable until 2018, when the market shifts required it. This includes the need for rich responsive websites, the emergence of new mediums like video, and the fact that connected, native experiences are easier to build, as well as the emergence and wide-scale adoption of digital assistants, wearables, and other multichannel opportunities.
When the Traditional CMS Finally Broke for Developers
Traditional CMS and DXP platforms were conceived and built to manage content for websites. As a result, there are two primary problems:
- Managing content intended for a variety of digital channels other than website pages had to be forced and hacked.
- These types of CMS platforms dictate the presentation tier with proprietary, website-specific development frameworks. This presents a major challenge when your goal is to support a device that doesn’t operate a web browser or doesn't have a screen (like a digital assistant such as an Alexa device).
Headless CMS Becomes Mainstream
A headless CMS allows you to design a content type, defining objects and giving them properties. The content type is leveraged to render content entry forms and other content entry interfaces. The captured content is stored in a repository backend. Once that content is stored, we can request it back as a rest API in a format like JSON without presentation, which frees the consumer to present it however they like.
What a Headless CMS Doesn’t Solve
A headless CMS solves many of the problems of traditional CMS platforms, allowing our current multichannel world to move faster and faster. However, there are still some challenges. Now that developers are free to use any frontend framework and language they want, an entirely new roadblock stands in the way of development efficiency: DevOps, specifically concerning content.
Support for Modern Development Processes
The content in our headless CMS platforms is trapped in a database that doesn’t fit into modern development CI/CD processes and environments. Moving content between environments to support development has never been straightforward – and headless architecture does nothing to improve its outlook.
Databases are difficult to scale out. Even if we rely on SaaS so that scale-out is not our immediate problem, we need to know that the SaaS vendor can handle our scale and distribution needs. This is yet another issue of optimization. We can build new channels and capabilities we want, but we haven’t achieved much if we can’t roll them out at scale.
Collaboration Among Content Authoring and Software Development
A headless CMS leaves us with much of the same friction we have long felt between authoring and development teams, with individuals and processes interfering with and stepping on each other’s toes with cumbersome but required activities like content freezes.
Although faster than it used to be, we also see that the development process efficiently is still not what is expected based on the rest of the industry and the gains we see elsewhere in software development with better modern CI/CD support.
Another lingering issue is that databases still can’t version the way we need them to -- the versioning that maps to our content management problem domain. Versioning needs to span many objects and be auditable. The more connected our journeys, the more channels and touchpoints we add, and the more these capabilities become necessary.
Also, we still find ourselves with clunky processes like double publishing when we’re supporting redesigns and other similar use cases.
The Rise of Git-based Headless CMS
The problems that remain with a headless CMS are on the back end. Over the last several years, there has been significant growth in using Git to replace or supplement databases as the backend for content management, versioning, and DevOps requirements.
Much of the research available seeks to separate the Git-based and API-based platforms. However, they aren’t mutually exclusive. There’s nothing about Git-based CMS systems that says they can’t also have a server component and an API.
Where Does Git Fit Into a Headless CMS?
To understand where Git fits in a CMS, we need to look at a high-level overview of content management. First, we have the authoring layer, where authors and developers create and edit, manage content, perform workflow, and collaborate.
We also have the content delivery layer that handles customer-facing content delivery requirements like serving static and/or dynamic and personalized content, executing searches, enabling shopping, capturing reviews, etc.
Typically authoring and delivery services are “connected” to one another by publishing and deployment processes. Git provides the most value on the content authoring side, where authoring workflow, versioning, and DevOps workflows occur. Here we can see Git as the underlying store behind the authoring services. On the delivery store, where authoring workflow, versioning, and DevOps are not relevant, we see simple storage like disk and S3 can be used behind the delivery services.
Drawbacks of Database-Oriented Content Versioning
Our work products for a release are usually 10s or 100s of objects in the content management space. There are no database-backed CMSs that allow us to manage this kind of inter-object inter-file-based relations together in a single version. It isn’t easy to build this kind of versioning with standard SQL and NoSQL database technology because it was never meant to support it.
Benefits of Git-based CMS
Git, however, was born to version artifacts precisely in the ways we need to meet this requirement.
- Time Machine: Each write captures the change like a time machine, very efficiently capturing the state of the entire repository at that exact moment.
- Rollbacks: A git-based content repository can very effectively handle a rollback of deployments or a full audit of the precise state of the repository and the site or apps at any moment in time.
- Repository Sync: Because the versioning system is like a time machine, it enables the sync between repositories natively. Sync enables the decentralization of the repository and branching.
- Improved Performance: Git-based systems are designed at a fundamental level to be distributed. They innately understand how to synchronize changes over time and deal with conflicts.
- Distribution: Git-based systems can help you deploy content globally in a reliable fashion. There’s no such thing as a partial git pull or push. Further, any change, no matter how small, is tracked because of how the version system works. This can be leveraged to detect and correct consistency issues or even defacements.
How Git Handles Multiple Environments (Dev, QA, Production, and More)
Another thing we need to think about is environments, or what some organizations call landscapes. No one does any significant amount of development in production. We develop and test in lower environments. With traditional NoSQL and SQL databases moving code and content between environments means exports and imports, and performing exports and imports require content freezes and are very time-consuming and are all or nothing.
Git supports moving content back from production natively and moving code forward from lower environments to production natively, so there are no more content freezes. You can merge work so you don’t have to blow away work in the target environment.
Site Redesign and Branching
Rebrands are done with copies of environments; however, environments are silos, requiring double publishing. Branching allows you to isolate changes within the same repository and merge changes when appropriate. Branching will enable us to perform our daily content work in the main branch. Then, all rebranding or redesign work can be done in a child branch. We can almost think of it as a feature branch for content authors.
Improving Developer and Content Workflows
Git offers real solutions to long-standing, fundamental problems that most all other CMSs still have and that, as the pace of innovation increases, is becoming a big problem. In recent years we’ve made a lot of progress with automation, containerization, and development and operations practices. Still, content is a significant piece of the puzzle with CMS, but because of architecture, it has been left out of the DevOps equation.
DevOps brings developers and operations together to eliminate the friction and efficiency loss between them. We need authors, developers, and operations to work together as equals without friction or loss of efficiency. This approach, known as DevContentOps, can be facilitated by Git, which provides the repository with the foundation for collaboration. Authors can use UI-based tools to create; developers can use the tools and processes they want, and operations have the APIs and tools to automate. All of this makes managing and delivering content faster & easier.
CrafterCMS: A Git-based, API-first Headless CMS for the Enterprise
CrafterCMS is ushering in a new era of content management to help enterprises deliver digital experiences faster and more efficiently, without the limitations of traditional or headless-only platforms. CrafterCMS offers API support for both the authoring and delivery layers, support for DevContentOps processes and automation, delivery of dynamic and personalized content, elastic scalability in the cloud, and easy integration with various third-party tools and services.
And for the content authors, CrafterCMS provides best-in-class WYSIWYG content authoring tools for in-context editing, drag/drop experience building, real-time multi-channel preview, approval workflows, and both real-time and scheduled publishing to any number of digital channels.
To learn more, check out our recorded webinar that was hosted by the Linux Foundation: How a Git-based Headless CMS Solves the Challenges of a Traditional Database-Oriented CMS