Already two weeks of coding have gone past!
My progress can be tracked in this issue. Part of the project is making LAPACK routines accesible from ChainerX performing computations in CPU and GPU. Another part is making them differentiable.
LAPACK (Linear Algebra Package) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. [wiki]
It was decided to make first everything work with GPU using CUDA and then move to CPU side of things. In CUDA world there are two alternatives to LAPACK routines implementations: cuSOLVER and Magma. It is worth to mention that there exist also alternative based on OpenCL named ViennaCL. The choice in this project fell on cuSOLVER since it is available with CUDA Toolkit and does not require additional installation.
The first implemented routine is Cholesky decomposition. It is a decomposition of a (Hermitian) positive-definite matrix into the product of a lower triangular matrix and its (conjugate) transpose $$ A = L L^* $$ It is based on potrf. The application include but not limited to solving linear systems, gaussian processes, kalman filtering.
Then QR decomposition was implemented. It is a decomposition of a matrix $A$ into a product $$A = Q R$$ of an orthogonal matrix \( Q \) and an upper triangular matrix \( R \). The implementation is based on geqrf and ormqr routines.
SVD and pseudo-inverse are implemented with gesvd routine.
Lastly, solve and inverse functions were implemented using LU decomposition getrf.