- Back to Home »
- Autodiff
Autodiff
Backpropagation involves lot of differentiation and implementing backprop by hand is like programming in assembly language.
Autodiff is a library to build an automatic
differentiation which helps to easily make derivatives.
Automatic differentiation (autodiff)
Refers to a general way of taking a program which computes a value, and automatically constructing a procedure for computing derivatives of that value.
Backpropagation
It is the special case of autodiff applied to neural nets but in
machine learning, we often use backprop synonymously with autodiff.
An autodiff system will convert the program
into a sequence of primitive operations which have specified routines for
computing derivatives. In this representation, backprop can be done in a
completely mechanical way.
Autograd:
This class is an engine to calculate derivatives
(Jacobian-vector product to be more precise). It records a graph of all the
operations performed on a gradient enabled tensor and creates an acyclic graph
called the dynamic computational graph. The leaves of this graph are input
tensors and the roots are output tensors.
Autodiff in Pytorch:
At x = 1, y = 6,
===================================
For the above code
Disabling gradient calculation is useful for inference, when you are sure that you will not call :meth:Tensor.backward()
. It will reduce memory consumption for computations that would otherwise have requires_grad=True
.
In this mode, the result of every computation will have requires_grad=False
, even when the inputs have requires_grad=True
.
import torch import math dtype = torch.float device = torch.device("cpu") # device = torch.device("cuda:0") # Uncomment this to run on GPU # Create Tensors to hold input and outputs. # By default, requires_grad=False, which indicates that we do not need to # compute gradients with respect to these Tensors during the backward pass. x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype) y = torch.sin(x) # Create random Tensors for weights. For a third order polynomial, we need # 4 weights: y = a + b x + c x^2 + d x^3 # Setting requires_grad=True indicates that we want to compute gradients with # respect to these Tensors during the backward pass. a = torch.randn((), device=device, dtype=dtype, requires_grad=True) b = torch.randn((), device=device, dtype=dtype, requires_grad=True) c = torch.randn((), device=device, dtype=dtype, requires_grad=True) d = torch.randn((), device=device, dtype=dtype, requires_grad=True) learning_rate = 1e-6 for t in range(2000): # Forward pass: compute predicted y using operations on Tensors. y_pred = a + b * x + c * x ** 2 + d * x ** 3 # Compute and print loss using operations on Tensors. # Now loss is a Tensor of shape (1,) # loss.item() gets the scalar value held in the loss. loss = (y_pred - y).pow(2).sum() if t % 100 == 99: print(t, loss.item()) # Use autograd to compute the backward pass. This call will compute the # gradient of loss with respect to all Tensors with requires_grad=True. # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding # the gradient of the loss with respect to a, b, c, d respectively. loss.backward() # Manually update weights using gradient descent. Wrap in torch.no_grad() # because weights have requires_grad=True, but we don't need to track this # in autograd. with torch.no_grad(): a -= learning_rate * a.grad b -= learning_rate * b.grad c -= learning_rate * c.grad d -= learning_rate * d.grad # Manually zero the gradients after updating weights a.grad = None b.grad = None c.grad = None d.grad = None print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')