In this post I want to include basic derivaties and primitive building blocks required from calculus for the model optimization. These are commonly used, but after some time I’m losing grip of these basics, so I want to keep it one place for future references.

## Standard functions and its derivatives

<div> $$ \begin{array}{l} \ \ \ \ \ \ \ \ \frac{\DifferentialD }{\DifferentialD x} \ \ \frac{1}{x} \ =\ -\frac{1}{x^{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\DifferentialD }{\DifferentialD x} \ \ log( x) \ =\ \frac{1}{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \frac{\DifferentialD }{\DifferentialD x} \ e^{x} \ =e^{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\DifferentialD }{\DifferentialD x} \ \ f( x) g( x) \ =\ f( x) \ \frac{\DifferentialD }{\DifferentialD x} \ g( x) \ +\ g( x) \ \frac{\DifferentialD }{\DifferentialD x} \ f( x) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ \ \frac{\DifferentialD }{\DifferentialD x} \ \ \frac{f( x)}{g( x)} \ =\ \ \ \ \ \frac{f^{}( x) \ g( x) \ -\ f( x) \ g^{}( x)}{g( x)^{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\ \ \ {\textstyle Where\ f^{}( x) \ =\ \frac{\DifferentialD }{\DifferentialD x} \ f( x) \ \ \&\ g^{}( x) \ =\ \frac{\DifferentialD }{\DifferentialD x} \ g( x)} \end{array}$$ </div>

## Sigmoid activation function

Sigmoid helps to bring the inputs between [0, 1], used most of the classification problems. eg; regression, neural network etc. ## Tanh activation function

Speciality of tanh function is it brings the inputs into [-1, 1], some cases this range is important especially RNN, where the model want to selectively forget things. ## Relu

Widely used activation function in Neural network and CNN models. This activation function helps the Deep learning jump in multiple domains. As early days getting a proper activation function was really restricting complex neural network architectures. ## Leaky Relu

Improved version of Relu which reduces the chance of vanishing gradient issue. ## Softmax

Used mainly as last layer in nerual network. This function helps to bring all activations from previous layer into probability values. Because of this property this function used most of the classification problems.  ## Cross entropy loss function

Objective function used in neural network or classification problems. ## Partial Derviative

This concept are learned in undergrad courses. All ML problems are finally comes to numerical optimization problem. Derivatives plays a method to optimize the parameters of the function. We required partial derivatives to find derivative of multi-varite equations.

The key idea is, at every step we calculate the derivative of one-dimention or (one variable) keeping all other variables constant.

 Note The math equetions are created using https://www.mathcha.io online editor.

Go Top