When considering privacy and data protection, there is no data that is more important than personal data, whether it is medical, financial or even social. The discussions about accessing our data, or even our metadata, are about who knows what, and whether my personal data is secure. Today’s announcement, between Intel, Microsoft and DARPA, is a program designed to keep information secure and encrypted, but still use it to build better models or to provide better statistical analysis without compromising the disclose actual data. It is called Fully Homomorph Encryption, but it is so computationally intense that the concept is almost useless in practice. This program between the three enterprises is the driving force for the provision of IP and silicon to accelerate the computer, enabling a safer environment for collaborative data analysis.
Note your data
Data protection is one of the most important aspects for the future of computers. The amount of personal data is constantly increasing, as well as its value, and the number of legal protections required. This makes the processing of personal, private and confidential data difficult, which often leads to dedicated data silos, as any processing requires data transfer along with encryption / decryption, which involves trust that is not always possible. All that is needed is for one key in the chain to be lost or leaked, endangering the dataset.
There is a way around this, known as Fully Homomorphic Encryption (FHE). FHE introduces the ability to take encrypted data, transfer it to where it needs to go, perform calculations on it and get results without ever knowing the exact underlying data set.
Take, for example, the analysis of medical data records: when a researcher has to process a specific data set for an analysis, the traditional method is to encrypt the data, send the data, decrypt and process the data – but the researcher giving access to the details in the records may not be legal, or may be faced with regulatory challenges. With FHE, the researcher can take the encrypted data, perform the analysis and get a result, without ever knowing any details of the dataset. This could involve combined statistical analyzes of a population across multiple encrypted data sets, or take the encrypted data sets and use them as additional inputs to train machine learning algorithms, which increase accuracy by having more data. The researcher must, of course, trust that the data given is complete and genuine, but this is probably a different topic than enabling calculation of encrypted data.
One of the issues why it matters is because the best insights from data come from the largest datasets. This includes being able to train a neural network, and the best neural networks face issues of not having enough data, or are faced with regulatory barriers when it comes to the sensitive nature of the data. This is why Fully Homomorph Encryption, the ability to analyze data without knowing its contents, is important.
Fully homomorphic encryption has existed as a concept for several decades, but the concept has only been realized in the last twenty years. A number of partially homomorphic encryption schemes were offered in that initial timeframe, and since 2010 several PHE / FHE designs have been developed that can process basic operations on encrypted data or ciphers, with a number of libraries developed according to industry standards. Some of this is open source. Many of these methods are computationally complex for obvious reasons due to the handling of encrypted data, although efforts are being made with SIMD-like packaging and other features to speed up processing. Although FHE schemes are accelerated, they are not the same as decryption, because the math does not decrypt the data – because the data is always in an encrypted state, it can (probably) be used by unreliable third parties as the underlying information is never exposed. (One could argue that an adequate data set can reveal more than intended, even though it is encrypted.)
Today’s announcement: Custom silicone for FHE
When you measure the performance of FHE calculation, the result is compared with the same analysis compared to the regular version of the data. Due to the complexity of FHE calculations, the current calculation methods are significantly slower. Encoding methods to enable FHE can increase the size of the data by 100-1000x, and then calculate that the data is 10000x to 1 million times slower than the normal calculation. This means that the calculation of the raw data can take one second from 3 hours to 12 days.
Whether it means combining medical records in a hospital across a state, or customizing a personal service using personal metadata collected on the user’s smartphone, FHE on that scale is no longer a viable solution. Enter the DARPA DPRIVE program.
- DARPA: Defense Advanced Research Projects Agency
- DPRIVE: Data Protection in Virtual Environments
Intel has announced that as part of the DPRIVE program, it has signed an agreement with DARPA to develop custom IP that leads to silicon to enable faster FHE in the cloud, specifically with Microsoft in both Azure and JEDI cloud, initially with the US government. As part of this multi-year project, expertise from Intel Labs, Intel’s Design Engineering and Intel’s Data Platforms Group will work together to create a dedicated ASIC to reduce FHE computing costs over existing CPU-based methods. The press release states that the goal is to reduce the processing time by five orders of magnitude of the current methods, to reduce the calculation times from days to minutes.
Intel already has a foot in the door when it comes to FHE, with a research team in Intel Labs dedicated to the issue. It was primarily on the side of software, standards and regulatory barriers, but will now also move into hardware design, cloud software stacks and collaborative implementation within Azure and JEDI cloud for the US government. Other target markets are healthcare, insurance and finance.
During the Intel Labs Day in December 2020, Intel gave some explanations of the direction in which this work is already going, along with standards and development of parallel traditional encryption, but on an international scale given the additional regulatory barriers. Microsoft will now be part of the discussion with the DPRIVE program, along with Intel’s continued investments at the academic level.
Apart from the element ‘five orders of magnitude’, today’s announcement does not go beyond creating definite goals, nor does it offer a time frame, but says it is a ‘multi-year’ agreement. It will be interesting to see how much Intel or their academic commitments discuss the subject further than today, beyond the standardization of the work.