Habu has always provided a 100% interoperable place for partners on different cloud platforms to share data without concerns about exposing sensitive first-party data or compromising security. Habu uses end-to-end encryption techniques to ensure data is protected during processing – we encrypt the data, run the query, return the results, and tear down the ephemeral environment. But since we do the encryption, this requires both partners to have a certain level of trust in Habu.
Today we announced that our fully interoperable hybrid clean room pattern has been upgraded to support confidential computing. Collaboration partners can participate in cross-cloud, cross-region data sharing with protections against unauthorized access to data across partners, cloud providers, and even Habu.
Confidential computing is a critical component of the so-called Zero Trust Architecture (ZTA) that is becoming more pervasive for network access and the processing of data. ZTA is especially important for customers in highly regulated industries such as financial services or healthcare, where there are usually strict rules governing the use of cloud platforms to store or process data.
But what is confidential computing, and what are some of the hardware components and software constructs that unlock the ZTA?
Data exists in three states: in transit when it is moving through the network; at rest when stored; and in use, as it’s being processed. Strong encryption methods such as AES and RSA have long-protected data at rest. Transport Layer Security (TLS), which is the protocol behind HTTPS, protects data in transit. Protecting data in use has been a gap in the data protection cycle.
Organizations that handle sensitive data, such as Personally Identifiable Information (PII), financial data, or health information, want to mitigate data-in-use threats that target the confidentiality and integrity of either the application or the data in the system memory. They want guarantees that the administrators, data center insiders, or other third parties cannot access the data without the customer’s consent. And they also want assurance that the cloud infrastructure itself doesn’t contain exploits that can lead to hackers abusing them.
Some of the more common attack vectors for data-in-use include:
- Device firmware
- Device drivers
- Operating systems
Two techniques have emerged to address the challenge of protecting data in use.
Homomorphic encryption is an extension of public-key cryptography with the additional capacity to perform computations on encrypted data (ciphertext) without access to the secret key (i.e., it does not need to decrypt data to process it). It is software-based, allowing infrastructure providers to support the processing of encrypted data on current, commoditized hardware. This makes it easy to adopt, but the lack of specialized hardware for compute-intensive tasks makes it unpractically slow with limited real-world applicability.
Confidential computing, in contrast, is hardware-based and uses a portion of the processor and memory to provide an isolated container for the data, called a Trusted Execution Environment (TEE). TEEs are tampering-resistant environments that safely run sensitive workloads inside of a secure enclave. Confidential computing using TEEs protects workloads from unauthorized entities, including the host or hypervisor, system administrators, service providers, other VMs, and other processes running on the host.
Confidential computing technology keeps data encrypted and strongly isolated inside a TEE at all times, and only processes it once the cloud environment is verified. This prevents data access from cloud operators, malicious admins, and the attack vectors listed above.
The foundation for TEE includes:
- Hardware Root-of-Trust (RoT) to ensure data is protected and anchored into the silicon. Trust is rooted in the hardware manufacturer so even cloud operators cannot modify the hardware configurations.
- Memory isolation and hardware-based encryption ensure data is protected while processing to prevent unauthorized viewing of data, even with physical access in the data center.
- Remote attestation for customers to directly verify the integrity of the environment. Companies can verify that both the hardware and software on which their workloads run are approved versions and secured before allowing them to access data.
- Trusted launch ensures virtual machines (VMs) boot with only authorized software and uses remote attestation to defend against rootkits, bootkits, and malicious firmware.
- Secure key management ensures that keys stay encrypted during their lifecycle and release only to the authorized code.
Hardware Support for TEE
In 2016, AMD introduced Secure Encrypted Virtualization (SEV), the first x86 technology designed to isolate VMs from the hypervisor. With SEV, individual VMs could be assigned a unique AES encryption key that is used to automatically encrypt their in-use data. When a component such as the hypervisor attempts to read this data, it is only able to see the encrypted bytes. AMD introduced SEV-ES in 2017 to encrypt the register state on each hypervisor transition so the hypervisor cannot see the data actively being used by the VM.
SEV-SNP (Secure Nested Paging) is the next generation of SEV that builds upon SEV and SEV-ES functionality while adding new hardware-based security protections. SEV-SNP adds strong memory integrity protection to help prevent malicious hypervisor-based attacks such as data replay, memory re-mapping, and more to establish the TEE. The RoT provides support for remote attestation, including issuing an attestation report which may be used by a relying party to verify that the utility VM has been created and configured on a genuine AMD SEV-SNP CPU. These integrity enhancements elevated SEV-SNP to a level where the Confidential Computing Consortium (CCC) was able to declare it ready for the stringent requirements of confidential computing.
Attestation (aka Trust, But Verify!)
Establishing that the underlying cloud infrastructure or cloud service is in a secure state is paramount in a confidential computing environment. This security check, or attestation, allows a software environment to prove that a specific program is running on specific hardware. Attestation can be performed by the TEE environment when it loads an application running in the TEE (such as a data clean room) that can initiate attestation to establish a secure channel and retrieve the secrets by using the tooling available for specific TEEs.
Some of the common scenarios for attestation are:
- Ensure a Confidential VM is running in a TEE on a confidential hardware platform (e.g., backed by AMD SEV-SNP).
- Ensure a Confidential VM is in a desired good state with properties such as vTPM and secure boot enabled to protect against rootkits and malware that target firmware, the bootloader, and the kernel itself.
- Ensure that a relying party is presented evidence that a Confidential VM is running on a confidential hardware platform and in a good state to engage with for making sensitive transactions.
Microsoft Azure Attestation (MAA) is a cloud-based service that provides secure attestation of TEEs running on Azure infrastructure. It uses a combination of hardware- and software-based attestation to verify the identity and integrity of TEEs, ensure they are running on trusted hardware that has not been compromised, verify the integrity and authenticity of software running on VMs or containers, and ensure keys are only released to trusted software components.
Application for Data Clean Rooms
Data Clean Rooms using TEE technology are particularly valuable for privacy-preserving analytics which require a high level of security and data protection. In privacy-preserving analytics, TEEs enable organizations to analyze large datasets in a way that keeps PII confidential. This is accomplished by providing a secured memory-encrypted environment, remote cryptographic verification of the initialized environment, and secure key release.
Habu’s data collaboration platform delivers a highly secure environment for multiple parties to work with sensitive data, ensuring confidentiality and protection. By utilizing multi-party computation, the platform enables customers to share data between partners without revealing any sensitive information. Additionally, the data is safeguarded using the customer’s data encryption key (DEK) and key encryption key (KEK), further enhancing security and privacy throughout the collaboration process.
We also provide an extra layer of security with a secure key release (SKR) through either Azure Key Vault (AKV) or Azure Managed Hardware Security Module (mHSM) and MAA. AKV allows customers to safely store and manage cryptographic keys, certificates, and other secrets used for encryption and authentication. This reduces the risk of key compromise, demonstrating our commitment to offering advanced security features.
Our platform is designed for high-performance computing while maintaining the confidentiality and integrity of sensitive data. This is particularly beneficial for organizations that handle large volumes of confidential information and need secure computing environments for effective analysis.
Sample Use Case
A three-way MPC can be used to securely compute a lookalike modeling use case where three parties are involved.
- The first party brings in the lookalike modeling algorithm or model.
- The second party brings in user data, which contains hashed user IDs and associated features such as gender, income, purchasing preferences, etc.
- The third party brings in CRM data, which contains hashed user IDs and corresponding user attributes.
The goal of the MPC is to find matched users between the user data and CRM data and output those users who are mathematically similar in the provided attributes to those in the CRM data. This use case can be beneficial for businesses that want to target users who are likely to purchase their products, but don’t have a user list to target. For example, a personal beauty product brand can use this MPC-based lookalike modeling algorithm to find users who are most willing to purchase their product(s). The first party, which owns the IP for the lookalike modeling algorithm, can use data gathered from the second and third parties to compute the matched users (i.e., model inference) in a secure and private manner, without any party having access to the other parties’ data.
Habu’s data collaboration platform combines confidential multi-party computation, Azure confidential computing, AKS, AKV, and MAA to enable secure and private data analysis across multiple parties. This combination of technologies empowers businesses to unlock insights from sensitive data without sacrificing security, privacy, or performance.
We’re proud to be working with Microsoft at the forefront of this innovation and are excited to see the innovative use cases our customers will develop using this cutting-edge technology. Contact us to learn how your organization can benefit from Habu’s integration with Microsoft Azure.