Python Package Introduction#

This document gives a basic walkthrough of the Secure XGBoost python package. There’s also a sample Jupyter notebook at demo/python/jupyter/e2e-demo.ipynb.

List of other Helpful Links

Install Secure XGBoost#

To install Secure XGBoost, follow instructions in Installation Guide.

To verify your installation, run the following in Python:

import securexgboost as xgb

Data Interface#

The Secure XGBoost python module is able to load data from:

  • LibSVM text format file

  • Comma-separated values (CSV) file

The data is stored in a DMatrix object.

  • To load a libsvm text file or a Secure XGBoost binary file into DMatrix:

    dtrain = xgb.DMatrix('train.svm.txt')
    dtest = xgb.DMatrix('test.svm.buffer')
    
  • To load a CSV file into DMatrix:

    # label_column specifies the index of the column containing the true label
    dtrain = xgb.DMatrix('train.csv?format=csv&label_column=0')
    dtest = xgb.DMatrix('test.csv?format=csv&label_column=0')
    

    Note

    Secure XGBoost does not support categorical features.

Setting Parameters#

Secure XGBoost can use either a list of pairs or a dictionary to set parameters. For instance:

  • Booster parameters

    param = {'max_depth': 2, 'eta': 1, 'silent': 1, 'objective': 'binary:logistic'}
    param['nthread'] = 4
    param['eval_metric'] = 'auc'
    

Training#

Training a model requires a parameter list and data set.

num_round = 10
bst = xgb.train(param, dtrain, num_round, evallist)

Methods including update and boost from securexgboost.Booster are designed for internal usage only. The wrapper function securexgboost.train does some pre-configuration including setting up caches and some other parameters.

Prediction#

A model that has been trained or loaded can perform predictions on data sets.

dtest = xgb.DMatrix('test.svm.txt')
ypred = bst.predict(dtest)