Usage with Opaque SQL¶
MC2 offers Opaque SQL, a secure analytics engine built on top of Apache Spark SQL, as a compute service that users can run. Opaque SQL provides a Scala and Python interface for users to express their desired SQL-like computation. MC2 Client integrates directly with Opaque SQL, and enables users to start the Opaque SQL service as well as encrypt and decrypt data in a format readable by Opaque SQL.
First, install Opaque SQL by following this guide.
Next, to use MC2 Client for Opaque SQL, you’ll need to specifically modify several sections of the configuration: the start
, upload
, run
, and download
sections. Once you’ve finished configuration, look at the Quickstart guide on how to securely run a query.
Start¶
In the start
section, you must specify the command to launch the Opaque SQL service on the head node. This is usually one of three things, depending on how you want to start Opaque SQL. The section should look something like this:
start:
# Commands to run on head node
head:
# To run Opaque SQL locally (Scala)
- cd /path/to/opaque-sql; build/sbt run
# Or to run a standalone Spark cluster (Scala)
- cd /path/to/opaque-sql; build/sbt assembly
- cd /path/to/opaque-sql; spark-submit --class edu.berkeley.cs.rise.opaque.rpc.Listener <Spark configuration parameters> --deploy-mode client ${MC2_HOME}/target/scala-2.12/opaque-assembly-0.1.jar
# Or to run a standalone PySpark cluster (Python)
- cd /path/to/opaque-sql; build/sbt assembly
- cd /path/to/opaque-sql; spark-submit <Spark configuration parameters> --deploy-mode client --jars ${MC2_HOME}/target/scala-2.12/opaque-assembly-0.1.jar --py-files ${MC2_HOME}/target/python.zip ${MC2_HOME}/target/python/listener.py
# Commands to run on worker nodes
workers: []
Upload¶
In the upload
section, you should tell MC2 Client that you want to encrypt data in sql
format, the format readable by Opaque SQL. Along with the data, you should specify the path to the data schema. More on the schema format can be found here.
The section should look something like this:
upload:
# Whether to upload data to Azure blob storage or disk
# Allowed values are `blob` or `disk`
# If `blob`, Azure CLI will be called to upload data
# Else, `scp` will be used
storage: disk
# Encryption format to use
# Options are `sql` if you want to use Opaque SQL
# or `xgb` if you want to use Secure XGBoost
format: sql
# Files to encrypt and upload
src:
- ${MC2_CLIENT_HOME}/quickstart/data/opaquesql.csv
# If you want to run Opaque SQL, you must also specify a schema,
# one for each file you want to encrypt and upload
schemas:
- ${MC2_CLIENT_HOME}/quickstart/data/opaquesql_schema.json
# Directory to upload data to
dst: /mc2/data
Run¶
In the run
section, you should tell MC2 Client that you’re running Opaque SQL, and specify an Opaque SQL script written in Scala. This section should look something like this:
run:
# Script to run
script: opaque_sql_demo.scala
# Compute service you're using
# Choices are `xgb` or `sql`
compute: sql
# Attestation configuration
attestation:
# Whether we are running in simulation mode
# If 0 (False), we are _not_ running in simulation mode,
# and should verify the attestation evidence
simulation_mode: 0
# MRENCLAVE value to check
# MRENCLAVE is a hash of the enclave build log
mrenclave: NULL
# Path to MRSIGNER value to check
# MRSIGNER is the key used to sign the built enclave
mrsigner: ${MC2_CLIENT_HOME}/python-package/tests/keys/mc2_test_key.pub
# The client consortium. Each username is mapped to a public key and
# release policy
consortium:
- username: user1
public_key: keys/user1.pub
result_release: true
Download¶
In the download section, you should tell MC2 Client that the results you are retrieving are encrypted by Opaque SQL. This section should look something like this:
download:
# Whether to download data from Azure blob storage or disk
# Allowed values are `blob` or `disk`
# If `blob`, Azure CLI will be called to download data
# Else, `scp` will be used
storage: disk
# Format this data is encrypted with
format: sql
# Directory/file to download
src:
- /mc2/opaque_sql_result
# Local directory to download data to
dst: results/
Example¶
All together, the configuration file should look something like the following when running Opaque SQL.
# User configuration
user:
# Your username - username should be specified in certificate
username: user1
# Path to your symmetric key - will be used for encryption/decryption
# If you don't have a symmetric key, specify a path here
# and run `MC2 init` to generate a key
#
# `MC2 init` will not overwrite anything at this path
symmetric_key: ${MC2_CLIENT_HOME}/quickstart/keys/user1_sym.key
# Path to your keypair and certificate
# If you don't have a keypair / certificate, specify paths here
# and run `MC2 init` to generate a keypair
#
# `MC2 init` will not overwrite anything at this path
private_key: ${MC2_CLIENT_HOME}/quickstart/keys/user1.pem
public_key: ${MC2_CLIENT_HOME}/quickstart/keys/user1.pub
certificate: ${MC2_CLIENT_HOME}/quickstart/keys/user1.crt
# Path to CA certificate and private key
# Needed if you want to generate a certificate signed by CA
root_certificate: ${MC2_CLIENT_HOME}/quickstart/keys/root.crt
root_private_key: ${MC2_CLIENT_HOME}/quickstart/keys/root.pem
# Configuration for launching cloud resources
launch:
# The absolute path to your Azure configuraton
# This needs to be an absolute path
azure_config: ${MC2_CLIENT_HOME}/quickstart/azure.yaml
# Whether to launch a cluster of VMs
cluster: true
# Whether to launch Azure blob storage
storage: true
# Whether to launch a storage container
container: true
# Commands to start compute service
start:
# Commands to run on head node
# This command is used to start the Opaque SQL service on the head node locally
head:
- cd /mc2/opaque-sql; build/sbt run
# Commands to run on worker nodes
# For this quickstart there is only one node - no worker nodes
workers: []
# Configuration for `MC2 upload`
upload:
# Whether to upload data to Azure blob storage or disk
# Allowed values are `blob` or `disk`
# If `blob`, Azure CLI will be called to upload data
# Else, `scp` will be used
storage: disk
# Encryption format to use
# Options are `sql` if you want to use Opaque SQL
# or `xgb` if you want to use Secure XGBoost
format: sql
# Files to encrypt and upload
src:
- ${MC2_CLIENT_HOME}/quickstart/data/opaquesql.csv
# If you want to run Opaque SQL, you must also specify a schema,
# one for each file you want to encrypt and upload
schemas:
- ${MC2_CLIENT_HOME}/quickstart/data/opaquesql_schema.json
# Directory to upload data to
dst: /mc2/data
# Computation configuration
run:
# Script to run
script: opaque_sql_demo.scala
# Compute service you're using
# Choices are `xgb` or `sql`
compute: sql
# Attestation configuration
attestation:
# Whether we are running in simulation mode
# If 0 (False), we are _not_ running in simulation mode,
# and should verify the attestation evidence
simulation_mode: 0
# MRENCLAVE value to check
# MRENCLAVE is a hash of the enclave build log
mrenclave: NULL
# Path to MRSIGNER value to check
# MRSIGNER is the key used to sign the built enclave
# This key should be used for testing purposes only,
# and is not secure for production purpose.
mrsigner: ${MC2_CLIENT_HOME}/python-package/tests/keys/mc2_test_key.pub
# The client consortium. Each username is mapped to a public key and
# release policy
consortium:
- username: user1
public_key: keys/user1.pub
result_release: true
# Configuration for downloading results
download:
# Whether to download data from Azure blob storage or disk
# Allowed values are `blob` or `disk`
# If `blob`, Azure CLI will be called to download data
# Else, `scp` will be used
storage: disk
# Format this data is encrypted with
format: sql
# Directory/file to download
# FIXME: If storage is `blob` this value must be a file
# Need to investigate whether we can use directories in Azure blob storage
src:
- /mc2/opaque_sql_result
# Local directory to download data to
dst: results/
# Configuration for stopping services
stop:
# Configuration for deleting Azure resources
teardown:
# Whether to terminate launched VMs
cluster: true
# Whether to terminate created Azure blob storage
storage: true
# Whether to terminate created storage container
container: true
# Whether to terminate specified resource group
resource_group: true