************ Benchmarking ************ Running on your own cluster ########################### Opaque supports a command-line interface for benchmarking against plaintext Spark. The following steps show you how to build and submit benchmarking jobs to a Spark cluster. 1. Create the benchmarking data: .. code-block:: bash build/sbt data 2. Create a fat jar that contains both source and test classes: .. code-block:: bash build/sbt test:assembly 3. For usage and to see a list of available flags, specify ``--help`` to the benchmarking class: .. code-block:: bash build/sbt 'test:runMain edu.berkeley.cs.rise.opaque.benchmark.Benchmark --help' # Available flags: # --num-partitions: specify the number of partitions the data should be split into. # Default: spark.default.parallelism # --size: specify the size of the dataset that should be loaded into Spark. # Default: sf_001 # Supported values: sf_001, sf_01, sf_1 # Note: sf_{scalefactor} indicates {scalefactor} * 1GB size datasets. # --filesystem-url: optional arguments to specify filesystem master node URL. # Default: file:// # --log-operators: boolean whether or not to log individual physical operators. # Default: false # Note: may reduce performance if set to true (forces caching of # intermediate values). # --operations: select the different operations that should be benchmarked. # Default: all # Available operations: logistic-regression, tpc-h # Syntax: --operations logistic-regression,tpc-h # Leave --operations flag blank to run all benchmarks Alternatively, you can look at `Benchmark.scala `_ 4. Submit the job to Spark: .. code-block:: bash spark-submit --class edu.berkeley.cs.rise.opaque.benchmark.Benchmark \ \ ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar \ For more help on how to submit jobs to Spark, see `Submitting applications `_. For a complete list of values possible in ````, see `Spark properties `_ Our TPC-H results ################# We used a 3 node cluster with 4 cores and 16GB memory per node. 1. Our ``spark-defaults.conf``: .. code-block:: bash spark.driver.memory 3g spark.executor.memory 11g spark.executor.instances 3 spark.default.parallelism 36 spark.task.maxFailures 10 2. The command we used to submit the benchmark: .. code-block:: bash spark-submit --class edu.berkeley.cs.rise.opaque.benchmark.Benchmark --master spark://:7077 \ --deploy-mode client \ ${OPAQUE_HOME}/target/scala-2.12/opaque-assembly-0.1.jar \ --filesystem-url hdfs://:9000 \ --size sf_1 \ --operations tpc-h \ 3. Final results: .. csv-table:: TPC-H Query Results :file: ../resources/tpch-results.csv :header-rows: 1 :class: gridtable