Advertisement

Scaling Data Access Governance

By on
Read more about author Nong Li.

The rise of data lakes and adjacent patterns such as the data lakehouse has given data teams increased agility and the ability to leverage major amounts of data. However, this ability comes with potential consequences. Constantly evolving data privacy legislation and the impact of major cybersecurity breaches has led to the call for responsible data use, requiring new approaches to data privacy and security. But how do CISOs address this looming imperative and find a method that enables them to use data responsibly to accelerate innovation and protect the business?

The Two Most Common Approaches to Data Security and Their Tradeoffs

There are two common approaches to implementing data security: curated copies and implementing fine-grained access control in each data platform. While popular, they are not best practices, as they both have significant tradeoffs.

The most frequently used approach is to curate copies, where sensitive data is removed or transformed. For example, the golden record or authoritative data could include data for all countries and include customer information in-the-clear. Organizations can then copy data into smaller sets, one for each geographic region where personally identifiable information (PII) data is obfuscated according to the rules for that region. However, this approach is expensive, complex, error-prone, and less adaptable to changing requirements. 

Curated copies require bigger support, security, and data engineering teams. And while cloud storage is cheap, it is not free; copies of big data equals more big data, which racks up cloud compute costs pretty quickly. Maintaining curated copies also consumes compute resources. Companies don’t want to tie up their Databricks clusters or Snowflake data clouds running routine data pipelines. Their compute power is much more valuable when applied to analytics and data science projects.

The second most common approach is to implement fine-grained access control (FGAC) at the data platform level. At first glance this looks ideal. All data consumers work off authoritative datasets. There’s no lag, as all users get access to the authoritative data as it arrives – no need to wait for the curated copy to be posted. And it’s a great reduction of burden on the DataOps or data engineering team. So far, so good, right?

The irony is that as more vendors adopt FGAC, the problem becomes worse. Why? Silos. It’s always silos. Why reproduce the same policy in two or more platforms? Or better yet, is it even possible to reproduce the exact same policy in more than one data platform? And can they be maintained as requirements change? 

To address data access governance in a way that delivers truly sustainable business value (SBV), CISOs have to think outside the capabilities of any single vendor. Organizations used to maintain user login IDs and passwords separately for each business application, but no professional IT team would ever do that today. For the same reasons that organizations standardize on external identity management systems, they need an external data access governance system that addresses three common issues: clarity, consistency, and economy of scale.

Challenges: Clarity, Consistency, and Scale

Three of the most pressing concerns with data security are establishing data access control policies and procedures that are clear to all data stakeholders, consistent across the organization, and can be enforced with little effort as data, user communities, and regulatory requirements come and go. To ensure a successful data access governance initiative, CISOs should start by addressing these issues head-on:

Clarity: Here’s a common question. What does it mean to de-identify PII data? Does it mean masking, tokenizing, or returning NULL (empty) data? Even more fundamental, how does each organization classify PII data? What happens when requirements change, either because new regulations require a change in policy, or a stakeholder recommends a better approach? For example, maybe the data platform team recommends that it’s not necessary to dedicate computing power to dynamically tokenize phone numbers. Or a business analyst complains that masking email addresses reduces the data’s utility and therefore they want a token that maintains referential integrity. Many teams have a stake in how data access control is defined and governed, and the easier it is for non-technical data stakeholders to participate in the development and validation of those policies, the faster the organization will move. Clarity is key.

Consistency: It is rare for an enterprise to standardize on a single data analytics platform. But should the data protection officer (DPO) or CISO care that it’s hard to enforce data privacy controls consistently in a data lake like Amazon S3, a data lakehouse like Databricks, and a data cloud like Snowflake? Vendors provide varying levels of support for data access control and the complexity for each one can vary significantly as well. Without a common framework for defining and enforcing data access control policies, teams have to work harder. Without a common framework they are also less capable of identifying and closing security gaps. Policies that are not usable across platforms, or that don’t provide consistent results, generate doubt and undermine your efforts to increase data literacy across the organization. 

Scale: Organizations can run into issues of scale in a variety of ways. Without clarity and consistency, data access governance becomes much harder as organizations scale up users, applications, data use cases, and so on. There are additional issues as well. It’s common to see data access controls defined for specific resources. For example, a rule or permission might restrict users access to a named column in a named table in a named database. Resource-level rules do not scale. 

Four Steps to Scale Data Access Governance for Sustainable Business Value

Modern data access governance solutions are emerging to help organizations clearly and consistently enforce data access controls with minimal effort. The key building blocks for a scalable data access governance solution are:

  1. Automated Data Classification: It’s important to start by establishing clear taxonomies for data classification, and who has authority to define those classifications. Organizations can use automated tools to discover and automatically classify data. Combine technology with proper oversight from data stewards to validate those classifications to build the foundation for clear, consistent, and scalable data access governance.
  1. Universal Policy Management: Separate the data policy from the data platform. Policies should be abstracted so they are understandable by non-technical data stakeholders and can be enforced on a variety of platforms. Write once leveraging metadata like user and data attributes, then simply register the data and user communities that should be governed by the policy.
  1. Dynamic Policy Enforcement: Monitoring and detecting a data access violation is not modern data security. Policies must be enforced at the point of each query: filter, hide, mask, and tokenize sensitive data dynamically to make sure every request is properly authorized.
  1. Data Usage Intelligence: When data access governance is external to the underlying data platforms and centralized, auditing is easier, and so is demonstrating regulatory compliance. With a holistic view, security and compliance teams can easily discover who has access to sensitive data, and how and when they used it.

Finding the Balance

The key to finding the balance between data security and business agility is to learn how to scale data access governance. Data, user communities, and applications are always evolving and so are regulatory requirements. Insider attacks are on the rise, which can lead to devastating consequences for the business, its customers, and partners. With data access governance that scales and adapts to change, CISOs can continue to deliver value from data while at the same time protecting sensitive data from misuse or abuse. 

Leave a Reply