Multilevel Architectures and Algorithms in Deep Learning


Project Description

The design of deep neural networks (DNNs) and their training is a central issue in machine learning. Progress in these areas is one of the driving forces for the success of these technologies. Nevertheless, tedious experimentation and human interaction is often still needed during the learning process to find an appropriate network structure and corresponding hyperparameters to obtain the desired behavior of a DNN.

The strategic goal of the proposed project is to provide algorithmic means to improve this situation. Our methodical approach relies on well established mathematical techniques; identify fundamental algorithmic quantities and construct a-posteriori estimates for them, identify and consistently exploit an appropriate topological framework for the given problem class, establish a multilevel structure for DNNs to account for the fact that DNNs only realize a discrete approximation of a continuous nonlinear mapping relating input to output data. Combining this idea with novel algorithmic control strategies and preconditioning, we will establish the new class of adaptive multilevel algorithms for deep learning, which not only optimize a fixed DNN, but also adaptively refine and extend the DNN architecture during the optimization loop. This concept is not restricted to a particular network architecture, and we will study feedforward neural networks, ResNets, and PINNs as relevant examples.

Our integrated approach will thus be able to replace many of the current manual tuning techniques by algorithmic strategies, based on a-posteriori estimates. Moreover, our algorithm will reduce the computational effort for training and also the size of the resulting DNN, compared to a manually designed counterpart, making the use of deep learning more efficient in many aspects. Finally, in the long run our algorithmic approach has the potential to enhance the reliability and interpretability of the resulting trained DNN.

Associated Publications

  • Evelyn Herberg, Roland Herzog, Frederik Köhne, Leonie Kreis and Anton Schiela
    Sensitivity-based layer insertion for residual and feedforward neural networks, 2023
    bibtex
    @ONLINE{HerbergHerzogKoehneKreisSchiela:2023:1,
      AUTHOR = {Herberg, Evelyn and Herzog, Roland and Köhne, Frederik and Kreis, Leonie and Schiela, Anton},
      DATE = {2023-11},
      EPRINT = {2311.15995},
      EPRINTTYPE = {arXiv},
      TITLE = {Sensitivity-based layer insertion for residual and feedforward neural networks},
    }
  • Roland Herzog, Frederik Köhne, Leonie Kreis and Anton Schiela
    Frobenius-type norms and inner products of matrices and linear maps with applications to neural network training, 2023
    bibtex
    @ONLINE{HerzogKoehneKreisSchiela:2023:1,
      AUTHOR = {Herzog, Roland and Köhne, Frederik and Kreis, Leonie and Schiela, Anton},
      DATE = {2023-11},
      EPRINT = {2311.15419},
      EPRINTTYPE = {arXiv},
      TITLE = {Frobenius-type norms and inner products of matrices and linear maps with applications to neural network training},
    }
  • Frederik Köhne, Leonie Kreis, Anton Schiela and Roland Herzog
    Adaptive step sizes for preconditioned stochastic gradient descent, 2023
    bibtex
    @ONLINE{KoehneKreisSchielaHerzog:2023:1,
      AUTHOR = {Köhne, Frederik and Kreis, Leonie and Schiela, Anton and Herzog, Roland},
      DATE = {2023-11},
      EPRINT = {2311.16956},
      EPRINTTYPE = {arXiv},
      TITLE = {Adaptive step sizes for preconditioned stochastic gradient descent},
    }
  • Multilevel Training of Residual Neural Networks
    M.Sc. Thesis, Heidelberg University, 2022
    bibtex
    @THESIS{Kreis:2022:1,
      AUTHOR = {Kreis, Leonie},
      INSTITUTION = {Heidelberg University},
      DATE = {2022-11-29},
      TITLE = {Multilevel Training of Residual Neural Networks},
      TYPE = {M.Sc. Thesis},
    }

Agency logo