Abstract:
Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons N, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.