Usage#
This section goes over how to manipulate an encrypted DataFrame in either client or insecure mode.
Saving a DataFrame#
Save the encrypted DataFrame to local disk. The encrypted data can then be uploaded to cloud storage of your choice for easy access.
Scala:
dfEncrypted.write.format("edu.berkeley.cs.rise.opaque.EncryptedSource").save("dfEncrypted")
// The file dfEncrypted/part-00000 now contains encrypted data
Python:
df_encrypted.write.format("edu.berkeley.cs.rise.opaque.EncryptedSource").save("df_encrypted")
Using the DataFrame interface#
Users can load the previously persisted encrypted DataFrame.
Scala:
import org.apache.spark.sql.types._ val dfEncrypted = (spark.read.format("edu.berkeley.cs.rise.opaque.EncryptedSource") .schema(StructType(Seq(StructField("word", StringType), StructField("count", IntegerType)))) .load("dfEncrypted"))
Python:
df_encrypted = spark.read.format("edu.berkeley.cs.rise.opaque.EncryptedSource").load("df_encrypted")
Given an encrypted DataFrame, construct a new query. Users can use
explain
to see the generated query plan.Scala:
val result = dfEncrypted.filter($"count" > lit(3)) result.explain(true) // [...] // == Optimized Logical Plan == // EncryptedFilter (count#6 > 3) // +- EncryptedLocalRelation [word#5, count#6] // [...]
Python:
result = df_encrypted.filter(df_encrypted["count"] > 3) result.explain(True)
Using the SQL interface#
Users can also load the previously persisted encrypted DataFrame using the SQL interface.
spark.sql(s""" |CREATE TEMPORARY VIEW dfEncrypted |USING edu.berkeley.cs.rise.opaque.EncryptedSource |OPTIONS ( | path "dfEncrypted" |)""".stripMargin)
The SQL API can be used to run the same query on the loaded data.
val result = spark.sql(s""" |SELECT * FROM dfEncrypted |WHERE count > 3""".stripMargin) result.show