Risk, Utility and Computational Cost in Data Privacy by eXate
Updated: Apr 26, 2022
Gaining insight from data is increasingly important for companies - however achieving privacy while gaining business insight is a challenge that can consume large amounts of resources and represent a significant cost! Read the article, written by our friends at eXate, on risk, utility and computational cost in data privacy.
Gaining insight from data is increasingly important for companies in a broad range of sectors. The business insight generated can be key to being competitive within a market. At the same time, there are increasing pressures from both regulators and consumers for high levels of privacy when it comes to their data. Loss of confidential information has damaged companies irreparably by incurring fines and losing their customer’s confidence. Achieving privacy while gaining business insight is also a challenging problem that can consume large amounts of resources and represent a significant cost.
So how can a company use their data to both inform their business units while simultaneously meeting regulatory requirements to keep customers’ data private in the most cost-effective manner?
Achieving data privacy while retaining utility requires the application of various Privacy Enhancing Technologies (PETs). Through the correct application of PETs data sets can be anonymised, allowing them to be shared with much greater ease and analysed by data scientists without risk of exposing personal information.
PETs come in many shapes and sizes and while they are sometimes conceptually simple, choosing the right PET for the job can be incredibly difficult. The application of any single PET does not guarantee the anonymity of a data set and can often leave large proportions of the data vulnerable to attack. Likewise, many PETs can remove large amounts of valuable information from a data set meaning little business insight can be gained. This leads to a trade-off between the risk of exposure and utility that needs to be determined not only for each use case but for individual data sets.
Data utility is highly dependant on the analysis that is to be completed on the data and can be described by many different metrics but some key principles are common across different applications. Consider for instance an attribute of age. This is an identifying feature so will often need protecting. By applying a generalisation PET, an individual’s age can be transformed into an age range. Broadly speaking, the greater the age range the higher the privacy, as the specificity of this identifying feature is reduced. At the same time, granularity in the data is lost for analysts. This simple example gives an idea of the trade-offs faced when trying to anonymise data.
Unfortunately, this is not a problem for which a “right” answer can be found. This means a third factor must be considered in our trade-off; computational cost. The computational cost not only refers to the cost of applying a PET to a data set but also the cost of assessing the performance of that PET, or any combination of PETs, in terms of risk and utility. While hardware resources are important and can often be limited, the time required to complete the computation can also be prohibitive. Having to wait for three hours to access a data set would be completely unacceptable in most cases but could be a reality if this trade-off is not properly balanced.
PETs can offer effective anonymisation of data sets which allows for insight to be gained, however achieving this requires a careful balance between utility, risk, and computation cost for each individual application.