Tutorial¶
Building and training your first network¶
Let’s build our first MLP with dropout on the MNIST example. to run this example, just do:
cd /yadll/examples
python model_template.py
We will first import yadll and configure a basic logger.
import os
import yadll
import logging
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
Then we load the MNIST dataset (or download it) and create a
yadll.data.Data
instance that will hold the data. We call a loader function to retrieve the data and fill the container.
# load the data
data = yadll.data.Data(yadll.data.mnist_loader())
We now create a yadll.model.Model
, that is the class that contain
the data, the network, the hyperparameters and the updates function. As a file
name is provided, the model will be saved (see Saving/loading models).
# create the model
model = yadll.model.Model(name='mlp with dropout', data=data, file='best_model.ym')
We define the hyperparameters(see Hyperparameters and Grid search) of the model and add it to our model object.
# Hyperparameters
hp = yadll.hyperparameters.Hyperparameters()
hp('batch_size', 500)
hp('n_epochs', 1000)
hp('learning_rate', 0.1)
hp('momentum', 0.5)
hp('l1_reg', 0.00)
hp('l2_reg', 0.0000)
hp('patience', 10000)
# add the hyperparameters to the model
model.hp = hp
We now create each layers of the network by implementing yadll.layers
classes.
The first layers must be a yadll.layers.Input
that give the shape of the input data.
This network will be a mlp with two dense layer with rectified linear unit activation and dropout.
Each layer receive as incoming the previous layer.
Each layer has a name. You can provide it or it will be, by default, the name of the layer class, space, the number of
the instantiation.
The last layer is a yadll.layers.LogisticRegression
which is a dense layer with softmax activation.
Layers names are optional.
# Create connected layers
# Input layer
l_in = yadll.layers.InputLayer(shape=(hp.batch_size, 28 * 28), name='Input')
# Dropout Layer 1
l_dro1 = yadll.layers.Dropout(incoming=l_in, corruption_level=0.4, name='Dropout 1')
# Dense Layer 1
l_hid1 = yadll.layers.DenseLayer(incoming=l_dro1, n_units=500, W=yadll.init.glorot_uniform,
l1=hp.l1_reg, l2=hp.l2_reg, activation=yadll.activations.relu,
name='Hidden layer 1')
# Dropout Layer 2
l_dro2 = yadll.layers.Dropout(incoming=l_hid1, corruption_level=0.2, name='Dropout 2')
# Dense Layer 2
l_hid2 = yadll.layers.DenseLayer(incoming=l_dro2, n_units=500, W=yadll.init.glorot_uniform,
l1=hp.l1_reg, l2=hp.l2_reg, activation=yadll.activations.relu,
name='Hidden layer 2')
# Logistic regression Layer
l_out = yadll.layers.LogisticRegression(incoming=l_hid2, n_class=10, l1=hp.l1_reg,
l2=hp.l2_reg, name='Logistic regression')
We create a yadll.network.Network
object and add all the layers sequentially.
Order matters!!!
# Create network and add layers
net = yadll.network.Network('2 layers mlp with dropout')
net.add(l_in)
net.add(l_dro1)
net.add(l_hid1)
net.add(l_dro2)
net.add(l_hid2)
net.add(l_out)
We add the network and the updates function to the model and train the model. Here we update with the stochastic gradient descent with Nesterov momentum.
# add the network to the model
model.network = net
# updates method
model.updates = yadll.updates.nesterov_momentum
# train the model and save it to file at each best
model.train(save_mode='each')
Here is the output when trained on a NVIDIA Geforce Titan X card:
epoch 463, minibatch 100/100, validation error 1.360 %
epoch 464, minibatch 100/100, validation error 1.410 %
epoch 465, minibatch 100/100, validation error 1.400 %
Optimization completed. Early stopped at epoch: 466
Validation score of 1.260 % obtained at iteration 23300, with test performance 1.320 %
Training mlp with dropout took 02 m 29 s
Making Prediction¶
Once the model is trained let’s use it to make prediction:
# make prediction
# We can test it on some examples from test
test_set_x = data.test_set_x.get_value()
test_set_y = data.test_set_y.eval()
predicted_values = model.predict(test_set_x[:30])
print ("Predicted values for the first 30 examples in test set:")
print predicted_values
print test_set_y[:30]
This should give you
Predicted values for the first 30 examples in test set:
[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1]
[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1]
Saving/loading models¶
Yadll provides two ways to save and load models.
Save the model¶
This first method for saving your model is to pickle the whole model. It is not recommended for long term storage but is very convenient to handle models. All you have to do is provide you model constructor with a file name. The model will be saved after training.
model = yadll.model.Model(name='mlp with dropout', data=data, file='best_model.ym')
You can also save your model by setting the save_mode argument of the train function. If you didn’t give a file name to the constructor it will create one (model.name + ‘_YmdHMS.ym’). You can set it to ‘end’ (save at the end of the training) or ‘each’ (save after each best model).
model.train(save_mode='each')
If you used ‘each’ and if your system crash you will be able to restart the training from the last best model.
To load the model just do
# load the saved model
model2 = yadll.model.load_model('best_model.ym')
Warning
- Do not use this method for long term storage or production environment.
- Model trained on GPU will not be usable on CPU.
Save the network parameters¶
This second method is more robust and can be used for long term storage. It consists in saving the parameters (pickling) of the network.
Once the model has trained the network you can save its parameters
# saving network parameters
net.save_params('net_params.yp')
Now you can retrieve the model with those parameters, but first you have to recreate the model. When loading the parameters, the network name must match the saved parameters network name.
# load network parameters
# first we recreate the network
# create the model
model3 = yadll.model.Model(name='mlp with dropout', data=data,)
# Hyperparameters
hp = yadll.hyperparameters.Hyperparameters()
hp('batch_size', 500)
hp('n_epochs', 1000)
hp('learning_rate', 0.1)
hp('momentum', 0.5)
hp('l1_reg', 0.00)
hp('l2_reg', 0.0000)
hp('patience', 10000)
# add the hyperparameters to the model
model3.hp = hp
# Create connected layers
# Input layer
l_in = yadll.layers.InputLayer(shape=(hp.batch_size, 28 * 28), name='Input')
# Dropout Layer 1
l_dro1 = yadll.layers.Dropout(incoming=l_in, corruption_level=0.4, name='Dropout 1')
# Dense Layer 1
l_hid1 = yadll.layers.DenseLayer(incoming=l_dro1, n_units=500, W=yadll.init.glorot_uniform,
l1=hp.l1_reg, l2=hp.l2_reg, activation=yadll.activations.relu,
name='Hidden layer 1')
# Dropout Layer 2
l_dro2 = yadll.layers.Dropout(incoming=l_hid1, corruption_level=0.2, name='Dropout 2')
# Dense Layer 2
l_hid2 = yadll.layers.DenseLayer(incoming=l_dro2, n_units=500, W=yadll.init.glorot_uniform,
l1=hp.l1_reg, l2=hp.l2_reg, activation=yadll.activations.relu,
name='Hidden layer 2')
# Logistic regression Layer
l_out = yadll.layers.LogisticRegression(incoming=l_hid2, n_class=10, l1=hp.l1_reg,
l2=hp.l2_reg, name='Logistic regression')
# Create network and add layers
net2 = yadll.network.Network('2 layers mlp with dropout')
net2.add(l_in)
net2.add(l_dro1)
net2.add(l_hid1)
net2.add(l_dro2)
net2.add(l_hid2)
net2.add(l_out)
# load params
net2.load_params('net_params.yp') # Here we don't train the model but reload saved parameters
# add the network to the model
model3.network = net2
Save the configuration¶
Models can be saved as configuration objects or files.
# Saving configuration of the model. Model doesn't have to be trained
conf = model.to_conf() # get the configuration
model.to_conf('conf.yc') # or save it to file .yc by convention
and reloaded:
# Reconstruction the model from configuration and load paramters
model4 = yadll.model.Model()
model4.from_conf(conf) # load from conf obj
model5 = yadll.model.Model()
model5.from_conf(file='conf.yc') # load from conf file
You can now reload parameters or train the network.
Networks can be modified directly from the conf object.
Note
By convention we use the .ym extension for Yadll Model file, .yp for Yadll Parameters file and .yc for configuration but it is not mandatory.
Run the examples¶
Yadll provide a rather exhaustive list of conventional network implementation.
You will find them in the /yadll/examples/networks.py
file.
Let’s try those network on the MNIST dataset. in the /yadll/examples/mnist_examples.py
file.
- Logisitic Regression
- Multi Layer Perceptron
- MLP with dropout
- MLP with dropconnect
- Conv Pool
- LeNet-5
- Autoencoder
- Denoising Autoencoder
- Gaussian Denoising Autoencoder
- Contractive Denoising Autoencoder
- Stacked Denoising Autoencoder
- Restricted Boltzmann Machine
- Deep Belief Network
- Recurrent Neural Networks
- Long Short-Term Memory
You can get the list of all available networks:
python mnist_examples.py --network_list
Training a model for example lenet5:
python mnist_examples.py lenet5
Hyperparameters and Grid search¶
Yadll provide the yadll.hyperparameters.Hyperparameters
to hold the
hyperparameters of the model. It also allows to perform a grid search optimisation
as the class is iterable over all hyperparameters combinations.
Let’s first define our hyperparameters and their search space
# Hyperparameters
hps = Hyperparameters()
hps('batch_size', 500, [50, 100, 500, 1000])
hps('n_epochs', 1000)
hps('learning_rate', 0.1, [0.001, 0.01, 0.1, 1])
hps('l1_reg', 0.00, [0, 0.0001, 0.001, 0.01])
hps('l2_reg', 0.0001, [0, 0.0001, 0.001, 0.01])
hps('activation', tanh, [tanh, sigmoid, relu])
hps('initialisation', glorot_uniform, [glorot_uniform, glorot_normal])
hps('patience', 10000)
Now we will loop over each possible combination
reports = []
for hp in hps:
# create the model
model = Model(name='mlp grid search', data=data)
# add the hyperparameters to the model
model.hp = hp
# Create connected layers
# Input layer
l_in = InputLayer(shape=(None, 28 * 28), name='Input')
# Dense Layer 1
l_hid1 = DenseLayer(incoming=l_in, n_units=5, W=hp.initialisation, l1=hp.l1_reg,
l2=hp.l2_reg, activation=hp.activation, name='Hidden layer 1')
# Dense Layer 2
l_hid2 = DenseLayer(incoming=l_hid1, n_units=5, W=hp.initialisation, l1=hp.l1_reg,
l2=hp.l2_reg, activation=hp.activation, name='Hidden layer 2')
# Logistic regression Layer
l_out = LogisticRegression(incoming=l_hid2, n_class=10, l1=hp.l1_reg,
l2=hp.l2_reg, name='Logistic regression')
# Create network and add layers
net = Network('mlp')
net.add(l_in)
net.add(l_hid1)
net.add(l_hid2)
net.add(l_out)
# add the network to the model
model.network = net
# updates method
model.updates = yadll.updates.sgd
reports.append((hp, model.train()))
Warning
These hyperparameters would generate 4*4*4*4*3*2=1536 different combinations. Each of these combinations would have a different training time but if it takes 10 minutes on average, the whole optimisation would last more the 10 days!!!
to run this example, just do:
python hp_grid_search.py