Data Protection Challenges for Kubernetes Databases

Hailey Mai
Data Protection Challenges for Kubernetes Databases

In a car interview at KubeCon Europe, Gaurav Rishi, VP of Product and Cloud Native Partnerships at Kasten by Veeam, discusses the data protection challenges that customers face as databases on Kubernetes-based applications grow exponentially in variety and complexity. Kasten provides Kubernetes application backup and recovery, disaster recovery, and application mobility. Gaurav's role involves managing product and partnerships that allow their technology pieces to come to life in a simple way.


Early Kubernetes Focused on Stateless Systems

In the early days of Kubernetes, the focus was on stateless systems and the "pets vs cattle" analogy, according to Gaurav. This emphasized simplicity and dispensability of individual containers. However, databases require the state to store data generated by applications. Cloud-native applications also leverage a variety of data services, beyond a single relational database.

Gaurav notes that as Kubernetes has evolved, databases have become the most popular workload running on the platform. This shift toward stateful systems and the rise of "polyglot persistence" - using multiple data services - has created new data protection challenges for Kubernetes. 

In the monolithic, on-prem world, enterprises had a single relational database to protect. Now, cloud-native applications may utilize several different databases ranging from SQL to NoSQL with their own native tools and best practices for backups. Eventual consistency models also complicate backups, requiring database vendors to define how to achieve consistency. All of this variety and state has made data protection an important issue for Kubernetes workloads. What started as a platform for simple, stateless containers must now deal with the complexities of stateful systems and databases at scale.

This transition illustrates how quickly Kubernetes has evolved from its early days and the shifting realities that platform vendors now face in providing solutions for data-centric workloads. The rise of databases as the dominant Kubernetes workload shows how state has become inescapable - and must be managed - within cloud-native environments.


Backups at the Storage and Database Layers

Within Kubernetes clusters, there are two main layers where backups can be performed:

  1. The storage layer through snapshots

Taking snapshots of persistent volumes is a simple way to backup data. Snapshots provide a point-in-time copy of the raw data blocks. However, snapshots may not capture data that is cached in memory but not flushed to disk. This can lead to data loss during recovery if changes were made between the snapshot and system failure.

  1. The database layer through logical backups

Logical backups use native tools provided by each database to backup the data in a consistent state. This captures data that is in memory as well as on disk. However, since there are over 300 different databases supported on Kubernetes, there are over 300 different logical backup tools. This adds complexity for platforms aiming to provide database backups across a variety of workloads.

Eventual consistency models further complicate backups by requiring careful handling to achieve a consistent state. This often necessitates using the native tools and best practices defined by the database vendor.

The combination of these factors - applications as the unit of atomicity (that includes Kubernetes objects), snapshots vs logical backups, hundreds of database-specific tools, and eventual consistency - makes data protection for Kubernetes databases an increasingly complex challenge.


Platforms must balance:

  1. Providing a unified interface that hides complexity
  2. Integrating with each database's native backup tools
  3. Following best practices for different databases and consistency models

In the end, platforms that can achieve simplicity through flexibility by integrating well with diverse database tools may succeed in this space. But the variety of backup techniques, tools and features represents a minefield that database-as-a-service platforms must navigate. Kasten's approach is to provide extensible templates that use native database tools, while giving freedom of choice. Kasten works with storage, Kubernetes distributions, and security partners to make their platform work across diverse data protection needs.


Backup/Recovery, Disaster Recovery, and Mobility Solutions

Kasten K10 provides application backup and recovery for Kubernetes environments. It discovers applications running in clusters, defines policies for how often backups should occur, and specifies retention periods. K10 intelligently selects between snapshot-based or logical database backups to achieve consistency. During recovery, K10 rehydrates microservices in the correct order based on application requirements. For complex environments with stateful applications and databases, K10 automates the entire backup/recovery lifecycle.

Kasten also provides disaster recovery solutions to ensure business continuity. K10 covers disaster recovery through snapshot replication, storage mirroring and native cloud DR capabilities. Kasten helps replicate backups across different storage types and cloud/on-prem environments for increased redundancy.

Finally, Kasten enables application mobility through data portability across Kubernetes environments and clouds. K10 allows customers to backup applications and restore them to different Kubernetes distributions or clouds. Kasten works across managed and self-managed Kubernetes deployments, hybrid clouds, and edge environments.



As data protection needs for Kubernetes databases grow increasingly complex, platforms that can manage this complexity through simplicity and flexibility will determine success. 

Data protection platforms like Kasten K10 that integrate native database tools, provide templates, and allow flexibility can help simply yet powerfully address the complex challenges of protecting applications on Kubernetes. Complementary open source projects such as Ceph and Kanister, can accelerate innovation to manage state and offer a variety of persistence and protection techniques within these dynamic environments.

Full video at: KBE Insider Amsterdam


Learn more about Kasten:

Follow us: @kubebyexample

Leave anonymous feedback

Join the KBE community forum