Johannes Jahn, A new mini-batch stochastic quasi-newton method for deep learning

Full Text: PDF
DOI: 10.23952/jano.8.2026.2.01
Volume 8, Issue 2, 1 August 2026, Pages 157-179

Abstract. The numerical method presented in this paper combines a mini-batch stochastic gradient method with a simple quasi-Newton method, and it can be used for solving large-scale finite-sum minimization problems. As a consequence of the fact that this particular quasi-Newton method is designed to operate with a simplified matrix class, it is well-suited to implementation on GPUs, thereby enabling its use in the context of deep learning. The number of iterations performed per mini-batch is not merely one, but rather a relatively small number. This quasi-Newton method is examined in detail with regard to the question of superlinear convergence. Moreover, convergence results are provided for the expectation of the distance between the iteration points and the solution of the optimization problem throughout the entire procedure. Even without convexity assumptions, we can show a result regarding the decrease of the objective function values. Finally, the results of numerical tests with a very large number of variables demonstrate the performance of the new algorithm. This method has great potential for applications in deep learning.

How to Cite this Article:
J. Jahn, A new mini-batch stochastic quasi-newton method for deep learning, J. Appl. Numer. Optim. 8 (2026), 157-179.