Usage¶

This section goes over how to manipulate an encrypted DataFrame in either client or insecure mode.

Saving a DataFrame¶

Save the encrypted DataFrame to local disk. The encrypted data can then be uploaded to cloud storage of your choice for easy access.

Scala:

dfEncrypted.write.format("edu.berkeley.cs.rise.opaque.EncryptedSource").save("dfEncrypted")
// The file dfEncrypted/part-00000 now contains encrypted data

Python:

df_encrypted.write.format("edu.berkeley.cs.rise.opaque.EncryptedSource").save("df_encrypted")

Using the DataFrame interface¶

Users can load the previously persisted encrypted DataFrame.

Scala:

import org.apache.spark.sql.types._
val dfEncrypted = (spark.read.format("edu.berkeley.cs.rise.opaque.EncryptedSource")
.schema(StructType(Seq(StructField("word", StringType), StructField("count", IntegerType))))
.load("dfEncrypted"))

Python:

df_encrypted = spark.read.format("edu.berkeley.cs.rise.opaque.EncryptedSource").load("df_encrypted")

Given an encrypted DataFrame, construct a new query. Users can use explain to see the generated query plan.

Scala:

val result = dfEncrypted.filter($"count" > lit(3))
result.explain(true)
// [...]
// == Optimized Logical Plan ==
// EncryptedFilter (count#6 > 3)
// +- EncryptedLocalRelation [word#5, count#6]
// [...]

Python:

result = df_encrypted.filter(df_encrypted["count"] > 3)
result.explain(True)

Using the SQL interface¶

Users can also load the previously persisted encrypted DataFrame using the SQL interface.

spark.sql(s"""
  |CREATE TEMPORARY VIEW dfEncrypted
  |USING edu.berkeley.cs.rise.opaque.EncryptedSource
  |OPTIONS (
  |  path "dfEncrypted"
  |)""".stripMargin)

The SQL API can be used to run the same query on the loaded data.

val result = spark.sql(s"""
  |SELECT * FROM dfEncrypted
  |WHERE count > 3""".stripMargin)
result.show