infio-copilot/example_notes/Math block test (softmax math function).md
2025-01-05 21:14:35 +08:00

722 B

Softmax

The softmax function transforms a vector into a probability distribution such that the sum of the vector is equal to 1.



Numerical stability improvements

Rescaling exponent

Due to the exponent operation, it is very likely that you get \infty values. You can prevent this by ensuring that the largest possible exponent is 0. This is typically implement by finding the maximum value and then subtracting it:

def softmax(x, dim=-1):  
    c = x.max()  
    return torch.exp(x - c) / torch.exp(x - c).sum(dim=dim, keepdim=True)

Calculating the log of the softmax

Calculating the log of the softmax can be numerically unstable so it is better to use the log-softmax approach.