Query submission via the Spark Driver#

https://mc2-project.github.io/opaque-sql/opaque-diagram.svg

Overview of Opaque SQL running in insecure mode#

Starting Opaque SQL#

This page goes through running Opaque SQL with the Spark driver located on the client.

Warning

This mode should not be used in any context where the full security of hardware enclaves is required. Remote attestation is disabled, and the Spark Driver has access to the key the worker enclaves use to encrypt/decrypt data. This is still offered to play around with the project and explore its API.

This is Opaque SQL in insecure mode, and is normally only used for testing functionalities.

Running the interactive shell#

  1. Package Opaque into a fat JAR. The reason we need a fat JAR is because the JAR produced by build/sbt package does not include gRPC dependencies needed for remote attestion.

    cd ${OPAQUE_HOME}
    build/sbt test:assembly
    
  2. Launch the Spark shell with Opaque.

    Scala:

    spark-shell --jars ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar
    

    Python:

    pyspark --py-files ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar  \
      --jars ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar
    

    (we need to specify –py-files because the Python functions are placed in the .jar for easier packaging)

  3. Alternatively, you can also run queries in Scala locally using sbt.

    build/sbt console
    
  4. Inside the Spark shell, import Opaque SQL’s DataFrame methods and its query planning rules.

    Scala:

    import edu.berkeley.cs.rise.opaque.implicits._
    edu.berkeley.cs.rise.opaque.Utils.initOpaqueSQL(spark, testing = true)
    

    Python:

    from opaque_sql import *
    init_opaque_sql(testing=True)
    

Encrypting a DataFrame with the Driver#

Note

The Opaque SQL methods shown in this section are only supported in insecure mode, since the driver needs the key for encryption/decryption.

  1. Create an unencrypted DataFrame.

    Scala:

    val data = Seq(("foo", 4), ("bar", 1), ("baz", 5))
    val df = spark.createDataFrame(data).toDF("word", "count")
    

    Python:

    data = [("foo", 4), ("bar", 1), ("baz", 5)]
    df = sqlContext.createDataFrame(data).toDF("word", "count")
    
  2. Create an encrypted DataFrame from the unencrypted version. In insecure mode, this is as easy as calling .encrypted.

    Scala:

    val dfEncrypted = df.encrypted
    

    Python:

    df_encrypted = df.encrypted()
    
  3. Perform any operations in the list of supported functionalities.

  1. Call .collect or .show to retreive and automatically decrypt the results.

    Scala:

    dfEncrypted.show
    // +-----+-----+
    // |word |count|
    // +--99-+-----+
    // |  foo|    4|
    // |  bar|    1|
    // |  baz|    5|
    // +-----+-----+
    

    Python:

    df_encrypted.show
    # +-----+-----+
    # |word |count|
    # +--99-+-----+
    # |  foo|    4|
    # |  bar|    1|
    # |  baz|    5|
    # +-----+-----+