About: Half-precision computation refers to performing floating-point operations in a 16-bit format. While half-precision has been driven largely by machine learning applications, recent algorithmic advances in numerical linear algebra have discovered beneficial use cases for half precision in accelerating the solution of linear systems of equations at higher precisions. In this paper, we present a high-performance, mixed-precision linear solver ([Formula: see text]) for symmetric positive definite systems in double-precision using graphics processing units (GPUs). The solver is based on a mixed-precision Cholesky factorization that utilizes the high-performance tensor core units in CUDA-enabled GPUs. Since the Cholesky factors are affected by the low precision, an iterative refinement (IR) solver is required to recover the solution back to double-precision accuracy. Two different types of IR solvers are discussed on a wide range of test matrices. A preprocessing step is also developed, which scales and shifts the matrix, if necessary, in order to preserve its positive-definiteness in lower precisions. Our experiments on the V100 GPU show that performance speedups are up to 4.7[Formula: see text] against a direct double-precision solver. However, matrix properties such as the condition number and the eigenvalue distribution can affect the convergence rate, which would consequently affect the overall performance.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Half-precision computation refers to performing floating-point operations in a 16-bit format. While half-precision has been driven largely by machine learning applications, recent algorithmic advances in numerical linear algebra have discovered beneficial use cases for half precision in accelerating the solution of linear systems of equations at higher precisions. In this paper, we present a high-performance, mixed-precision linear solver ([Formula: see text]) for symmetric positive definite systems in double-precision using graphics processing units (GPUs). The solver is based on a mixed-precision Cholesky factorization that utilizes the high-performance tensor core units in CUDA-enabled GPUs. Since the Cholesky factors are affected by the low precision, an iterative refinement (IR) solver is required to recover the solution back to double-precision accuracy. Two different types of IR solvers are discussed on a wide range of test matrices. A preprocessing step is also developed, which scales and shifts the matrix, if necessary, in order to preserve its positive-definiteness in lower precisions. Our experiments on the V100 GPU show that performance speedups are up to 4.7[Formula: see text] against a direct double-precision solver. However, matrix properties such as the condition number and the eigenvalue distribution can affect the convergence rate, which would consequently affect the overall performance. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : covidontheweb.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	Half-precision computation refers to performing floating-point operations in a 16-bit format. While half-precision has been driven largely by machine learning applications, recent algorithmic advances in numerical linear algebra have discovered beneficial use cases for half precision in accelerating the solution of linear systems of equations at higher precisions. In this paper, we present a high-performance, mixed-precision linear solver ([Formula: see text]) for symmetric positive definite systems in double-precision using graphics processing units (GPUs). The solver is based on a mixed-precision Cholesky factorization that utilizes the high-performance tensor core units in CUDA-enabled GPUs. Since the Cholesky factors are affected by the low precision, an iterative refinement (IR) solver is required to recover the solution back to double-precision accuracy. Two different types of IR solvers are discussed on a wide range of test matrices. A preprocessing step is also developed, which scales and shifts the matrix, if necessary, in order to preserve its positive-definiteness in lower precisions. Our experiments on the V100 GPU show that performance speedups are up to 4.7[Formula: see text] against a direct double-precision solver. However, matrix properties such as the condition number and the eigenvalue distribution can affect the convergence rate, which would consequently affect the overall performance.
Subject	Video cards Software quality OpenCL compute devices Numerical linear algebra Binary arithmetic Floating point types
part of	Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs
is abstract of	Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs
is hasSource of	covid:ann/target/433d57cf3f4e2dba26611d29148d1fd1d7d84a21 covid:ann/target/6dc36a2b054335ecc456670683c3e24227473d2f covid:ann/target/1b523dcc7354b104778d0855d6cc0b192dfef139 covid:ann/target/5dc6c7eff6227a506278eeb6acd2987a52a75d80 covid:ann/target/8b7fa8a8808830c048f3686a6a3aab558da39c45 covid:ann/target/0b8ee87e5ec483c55826b70b18c412288a31dd95 covid:ann/target/ae1a307efbcbc31e9c0048fe3c5ded5acaba4413 covid:ann/target/72b3b7edd2385c64b7f6948aa45b8e4299b10d6d covid:ann/target/4d0ede86e4985f4f60088ed4e27da644fd50b73b covid:ann/target/691952ca01f6609583b1af2055808ac3553853a9 covid:ann/target/0b48d68d84252e55eadb56fc3860697eaf4fc7af covid:ann/target/30b9226f7bab3eaef10d29b01b8faaed65a7eaf6 covid:ann/target/8e3df3b50c47c4ff22597c94aeaa8bede63c06b3 covid:ann/target/e6f61889ecb9a9ed4922124f4f108d4dc7850722 covid:ann/target/4f0571968ed47e779f55941d68fa1f31330cf173 covid:ann/target/94d6f093dd749d2160aded6189f32b0ff5479687 covid:ann/target/84fd7a9af88e15a1cba72b821e74f5475876b049 covid:ann/target/6d9b7fe75d906bfc7dd3110fb1718f5c491c54f2 covid:ann/target/e9bd3fd2ec194995d8a4d0f3054cc93f03626c08 covid:ann/target/23b7eabc19104a6f1b819f305ef399aa0b1c3670 covid:ann/target/8cce7a63b1c01877ad1c412a08778cb47bd05f9e covid:ann/target/fed38e73c7b5eaab4862cc04396ba95e4ca164f4 covid:ann/target/1a55d78dafa3896fd591dd53b81ca6bcaaec0a98 covid:ann/target/96f409b21ce11d26c2143e73f419a92c72e778e5 covid:ann/target/47774968fba3612634b3f4185689205b8b564d33 covid:ann/target/de72cc9402a517e10895d0fa84bd70175cd63742 covid:ann/target/5b075f071b1cc6f88454b44a80aa3924a3d7db9c covid:ann/target/e5e0d334cf7a1b95a44aaaec6be5f342243ebf79 covid:ann/target/d85bae1f75eb597754e76369f1b6288d920599ba covid:ann/target/5e7434b9624fc20016ea81bdb827341ae5befdd5 covid:ann/target/77b24daa1ea4228877a7bd39ddeca7404623e105 covid:ann/target/111176bd9dc8387c27d9dda1155ac57ffcf65346 covid:ann/target/f3add19e5c7e24968ec20bcdb0eb8548937611e1 covid:ann/target/cd42bcb86f0428c4e6782e0c447ee365af6a0df5 covid:ann/target/ebbed18595ac478340e4c4e6dab3d60d17f95f8d covid:ann/target/c451565ea2097c1495f3bcdffa8f3d3303d3b78e covid:ann/target/63798f6633548c1e092fdefc4ec7f2f26c4cb473 covid:ann/target/fdcc29dea764b01a6dbd180a42b969a4ec01504c covid:ann/target/21d147811d9233794ccbf981901f7df3c684fcf9 covid:ann/target/1b9ccb6c9a9e9306e60a51fdd672c1e3adaa9e43 covid:ann/target/053bd4807c4bcc18d5374a924964b2daac14a1a5 covid:ann/target/ec2c7c2c6c035f620a39e56e14cb473d6345479a covid:ann/target/0c9925d9a65b640cf6b10e7eba586bef0ac4c734 covid:ann/target/a7949a56f5f5a5deb54491d138b181c795c1547a covid:ann/target/d3e22a0a07028901878c9d98e81c5cf5c4e72614 covid:ann/target/9ef00337cf60a3741d9cd3580e086293703f0f81 covid:ann/target/716592e9cbe6a0cb836a203af04382d0d68cab20 covid:ann/target/6db579cd83c423b2ada11c6f3ae99cba545e8ee5 covid:ann/target/64578b6f4e6590eef153d67656fab3362aff6986 covid:ann/target/a3333e3ab4eef82db0960137c6a7799f891a0abe covid:ann/target/46d6bb47a96caa18b1daecef44b9ddb7435f6983 covid:ann/target/008ea61527f0da22527e9a9d1dcc02b681cf6ae5 covid:ann/target/d2907aa346fc999c29dab678cb2d90d3931cb169 covid:ann/target/18d9f149b92aea6481ba7a69b1e903310b681b2b covid:ann/target/c8ecc21a59adb99ea28380bf07d39809cfd3c521 covid:ann/target/d82b0ea8441430c663ed95d7d256828181d40fcf covid:ann/target/398d7843795650d0c83f34f7eb8d2553eae940df covid:ann/target/76da8c48af176cc3aa30b1ab622ab0cab7884942 covid:ann/target/16f9500fb2a78d90ca23d49d02135de589041a27 covid:ann/target/40f9746c0b10d8671dc35a9c66a53089828ff1bd covid:ann/target/cad7244715cf57e25568821662c0feb6c30faa40 covid:ann/target/eed18c1d2aa04b89b5867a75f545e2e3c487b798 covid:ann/target/fd4146aa5728c063bddbe58cc407675854a788ac covid:ann/target/45cd02e9198dfb03121766b73320ab2b844b29d2 covid:ann/target/9cd36126799448c5382217ef4564f8f4883c46a0 covid:ann/target/23826be782967176ab37a5d583bed5307827e6c7 covid:ann/target/4104f0fd1699bec19994154a5d6e6e6b7ce3dadc covid:ann/target/8e9d2eaa6f815dfd41ef3e66226ddaeceb96e088 covid:ann/target/a78a7d3789c3779906fa1256fbff584b7bbea61f covid:ann/target/b847b6706ca9040982a47b976ac91b615b4935d5 covid:ann/target/c3b998bd9206f72e2d59338efa84a61718540e27 covid:ann/target/ecc3ec87cfa9cc5f948548f1c7c27168a329ee43 covid:ann/target/ed9fa7c053bf1b8fc80c2d10e1c9a3ecc86ee749 covid:ann/target/a0701c4427a286eeb0fc9935f015e5bfda7cb0dc covid:ann/target/b1e34b73e6dc2fc3c889faa235afb93035f6b005 covid:ann/target/7b127c9f1e3747b43ca462ae47743d426a6b63ef covid:ann/target/07a9a5062209f681f61b5c1357da24b90db23bac covid:ann/target/3d6d58f5288cab3780f0411b964ece8ab0751c1f covid:ann/target/7b11ea51cba3d16f17358914bc63b97deca82a91 covid:ann/target/f0ee783f1d6f2330d8d1708f60d5012fac6a6d47 covid:ann/target/fb57bb443fac347bfdb3d2a9029df2832e3e557f covid:ann/target/0757c3a95e78bcb0f1c47f274df1487ea4917f99 covid:ann/target/16ae7932aab1948430be5e5c0c1a7a1f7c05585c covid:ann/target/93312938ca4330697a60e8e64930094f5527777b covid:ann/target/9a61295c9c0909119760892fdec3e31077af37e0 covid:ann/target/db29b7bbb0231868f1a68591e04ccdc28f085462 covid:ann/target/f5b2121263c27686c577406f30bff1374dee639d covid:ann/target/515468f212916ee438bc19adcc0f1869708a8033 covid:ann/target/08cf1fffd6f81f4d8f7a05c3a401bcf1ebbff8a2 covid:ann/target/9efdf902c0838e2b731d25375caca1ba8c27aeaf

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software