Softmax Function

The soft-max function is a generalization of the logistic function that normalizes a N-dimensional vector of arbitrary real values to a range of (0,1). It is used to obtain the posterior distribution of a class for a given sample vector x with weights w:

$p(y=j | x) = \frac{e^{x^\top w_j}}{\sum_{i=1}^{N} e^{x^\top w_i}}$

The softmax function can be used as an activation function for artificial neural networks and is also used in softmax regression where we are interested in multi-class classification as opposed to only binary classification of logistic regression. Below I describe the difference between softmax and the commonly used sigmoid function.

Sigmoid

The function: $\sigma (x) = \frac{1}{1+e^{-1}}$ .

Its derivative: $\frac{d \sigma (x)}{d x} = (1-\sigma(x)) \times \sigma (x)$ .

The problem here is the exponential e, which quickly goes to infinity, even though the result of σ is restricted to the interval [0, 1]. The solution: the sigmoid can be expressed in terms of tanh: $\sigma (x) = \frac{1}{2} (1+\tanh (\frac{x}{2}))$ .

Softmax

Softmax, which is more generally defined as $\eta (u_{i}) = \frac{e^{u_{i}}}{\sum_{j=1}^{J} e^{u_{j}} }$ (where u is a vector), is a little more complicated than the sigmoid function when it comes to its understanding and derivative. So the key here is to express softmax in terms of a log sum function: $\log \eta (u_{i}) = u_{i} - \log \sum_{j=1}^{J} e^{u_{j}}$ .

The vector of partial derivatives, i.e. the gradient, of softmax is analogous to the sigmoid: $\frac{\partial}{\partial{u_{i}}} \eta(u_{i}) = (1-\eta(u_{i}))\times \eta(u_{i})$ .

Mad Mind of a PhD Student

Blog, Website, and Rantings of Kostas Hatalis

Softmax Function

Sigmoid

Softmax

Leave a comment Cancel reply

Sigmoid

Softmax

Share this:

Related

Leave a comment Cancel reply