Data¶

yadll.data.normalize(x)[source]¶

Normalization: Scale data to [0, 1]

\[z = (x - min(x)) / (max(x) - min(x))\]

Parameters:	x: numpy array
Returns:	z, min, max

yadll.data.apply_normalize(x, x_min, x_max)[source]¶: Apply normalization to data given min and max

yadll.data.revert_normalize(z, x_min, x_max)[source]¶: Return x given z, min and max

yadll.data.standardize(x, epsilon=1e-06)[source]¶

Standardization: Scale to mean=0 and std=1

\[z = (x - mean(x)) / std(x)\]

Parameters:	x: numpy array
Returns:	z, mean, std

yadll.data.apply_standardize(x, x_mean, x_std)[source]¶: Apply standardization to data given mean and std

yadll.data.revert_standardize(z, x_mean, x_std)[source]¶: Return x given z, mean and std

yadll.data.one_hot_encoding(arr, N=None)[source]¶

One hot encoding of a vector of integer categorical variables in a range [0..N].

You can provide the higher category N or max(arr) will be used.

Parameters:

arr : numpy array

array of integer in a range [0, N]

N : int, optional

Higher category

Returns:

one hot encoding [0, 1, 0, 0]

Examples

>>> a = np.asarray([1, 0, 3])
>>> one_hot_encoding(a)
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])
>>> one_hot_encoding(a, 5)
array([[ 0.,  1.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  1.,  0.,  0.]])

yadll.data.one_hot_decoding(mat)[source]¶

decoding of a one hot matrix

Parameters:

mat : numpy matrix

one hot matrix

Returns:

vector of decoded value

Examples

>>> a = np.asarray([[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]])
>>> one_hot_decoding(a)
array([1, 0, 3])

class yadll.data.Data(data, preprocessing=None, shared=True, borrow=True, cast_y=False)[source]¶

Data container.

data is made of train_set, valid_set, test_set and set_x, set_y = set

Parameters:

data : string

data file name (with path)

shared : bool

theano shared variable

borrow : bool

theano borrowable variable

cast_y : bool

cast y to intX

Examples

Load data

>>> yadll.data.Data('data/mnist/mnist.pkl.gz')

Methods

dataset :

return the dataset as Theano shared variables [(train_set_x, train_set_y), (valid_set_x, valid_set_y), (test_set_x, test_set_y)]