The Opening the Black Box of Deep Neural Networks via Information uses the diffusion process and heat equation as intuition for gradient flow between the layers of the deep network. I needed a refresher on the underlying math. Found some very good sources:

In "What is Laplacian" and subsequent video you get a great insightful introduction of Laplacian, heat and wave equations.

In the applied math section which continues to "Fourier and Laplace Transformation" section of the differential equation lecture series, there is also a shorter explanation of diffusion.

There is also stochastic nature to the whole thing that is nicely covered in the Stochastic process, Ito calculus , and stochastic differential equation section of the Topics in Mathematics with Applications in Finance lectures with Notes and Vidoes.