About: GPU computing kernels are relatively simple to write if achieving the best performance is not of the highest priority. However, it can quickly become a much more daunting task when users try to tune and optimize their kernels to obtain the highest performance. This is due to GPUs’ massive degree of parallelism, complex memory hierarchy, fine grain synchronization, and long memory access latency. Hence, users must carry out the complex tasks of profiling, analyzing, and tuning to reduce performance bottlenecks. Today’s GPUs can generate hundreds of performance events that comprehensively quantify the behavior of a kernel. Instead of relying on experts’ manual analysis, this paper targets using machine learning methods to generalize GPU performance counter data to determine the characteristics of a GPU kernel as they will reveal possible reasons for low performance. We choose a set of problem-independent counters as our inputs to design and compare three machine learning methods to automatically classify the execution behavior of a kernel. The experimental results on stencil computing kernels and sparse matrix multiplications show the machine learning models’ good accuracy, and demonstrate a feasible approach that is capable of classifying a kernel’s characterizations and suggesting changes to a skilled user, who can subsequently improve kernel performance with less guessing.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: GPU computing kernels are relatively simple to write if achieving the best performance is not of the highest priority. However, it can quickly become a much more daunting task when users try to tune and optimize their kernels to obtain the highest performance. This is due to GPUs’ massive degree of parallelism, complex memory hierarchy, fine grain synchronization, and long memory access latency. Hence, users must carry out the complex tasks of profiling, analyzing, and tuning to reduce performance bottlenecks. Today’s GPUs can generate hundreds of performance events that comprehensively quantify the behavior of a kernel. Instead of relying on experts’ manual analysis, this paper targets using machine learning methods to generalize GPU performance counter data to determine the characteristics of a GPU kernel as they will reveal possible reasons for low performance. We choose a set of problem-independent counters as our inputs to design and compare three machine learning methods to automatically classify the execution behavior of a kernel. The experimental results on stencil computing kernels and sparse matrix multiplications show the machine learning models’ good accuracy, and demonstrate a feasible approach that is capable of classifying a kernel’s characterizations and suggesting changes to a skilled user, who can subsequently improve kernel performance with less guessing. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : covidontheweb.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	GPU computing kernels are relatively simple to write if achieving the best performance is not of the highest priority. However, it can quickly become a much more daunting task when users try to tune and optimize their kernels to obtain the highest performance. This is due to GPUs’ massive degree of parallelism, complex memory hierarchy, fine grain synchronization, and long memory access latency. Hence, users must carry out the complex tasks of profiling, analyzing, and tuning to reduce performance bottlenecks. Today’s GPUs can generate hundreds of performance events that comprehensively quantify the behavior of a kernel. Instead of relying on experts’ manual analysis, this paper targets using machine learning methods to generalize GPU performance counter data to determine the characteristics of a GPU kernel as they will reveal possible reasons for low performance. We choose a set of problem-independent counters as our inputs to design and compare three machine learning methods to automatically classify the execution behavior of a kernel. The experimental results on stencil computing kernels and sparse matrix multiplications show the machine learning models’ good accuracy, and demonstrate a feasible approach that is capable of classifying a kernel’s characterizations and suggesting changes to a skilled user, who can subsequently improve kernel performance with less guessing.
subject	Artificial intelligence Virtual reality Australian inventions OpenCL compute devices Operating system kernels GPGPU Hardware acceleration Graphics hardware Application-specific integrated circuits Graphics processing units
part of	Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning
is abstract of	Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning
is hasSource of	covid:ann/target/85bac65a659be872cad2181b1885ea5913ba8a9e covid:ann/target/660c18b26973ac517867dcbb650f6dda810007f0 covid:ann/target/870cfe5f71ac171e1fae23f6bdafc822b218b588 covid:ann/target/ae09a9c8fe6db25adcf23325273e21de306cd8fe covid:ann/target/e1147df91a2a4b6d4c53c118759203a341a6e74a covid:ann/target/3a3b3a6b168fa9f72479e55e553bdd1559f49812 covid:ann/target/58a7327bd61961ae12a3160b4f5949d46fcae0b0 covid:ann/target/4f094965374b3f659db1db8fd07d18f201b10d61 covid:ann/target/0f7f5b8d5a15b91d9c6487d6b337aa351d06a13d covid:ann/target/6813d235e134472b1c670304b15b25a8f06fc10c covid:ann/target/7a25a06808b6a28dd40ec32b6328c71390f6d0f0 covid:ann/target/181912fa4fcf2d52b91bbcc8463633459e807771 covid:ann/target/f1bbf1ed563f78aea5db4cecd4adae578571a0d1 covid:ann/target/8f9547f707433d68a441492a7dcad6c749bf69a8 covid:ann/target/b7118c321155d4c156b7c490588ee75931d17f3f covid:ann/target/a7827640a8ec99a8d42621cc06a1e399d2e78448 covid:ann/target/7fa1339eb2decf9ea888e166750dc8395cc7dcfe covid:ann/target/4f372f2927e7fc8f8b218d45c3c9a3113b74a9f7 covid:ann/target/b9e5efaaec3747827a1726457b8567e39df1e380 covid:ann/target/996ff3a5833847be059f4babf8c32218b279eb6d covid:ann/target/064d8a66ba16b7b2d9569882db380d952c454696 covid:ann/target/5e6b0b34c28c51cf9644b5870a600c8ddde3e400 covid:ann/target/5507b3773512ed4f9e51d8ac6fb8617a12f16744 covid:ann/target/6e9324652f37351e649a65069b2bd4da3d1af3ef covid:ann/target/0208ea3e527cc75e1b6eb29c137ff5979bd9eb61 covid:ann/target/608f3ef158ea71b2f850f8e604b5f88556318c1e covid:ann/target/0d80598411d54ac0430ec383865376e971542e19 covid:ann/target/d1a2206ca37e68972fab2b5a41e35a43f81276a2 covid:ann/target/bffef980bd56eb37f97c06712fb457fe819b5f87 covid:ann/target/7ca67328b468f310bea6a3e8d5b0ef7d057935b5 covid:ann/target/6108fd260a03d816467143fdb30ca13edbfa815f covid:ann/target/b6718efc1afff49d53faac5197c6d146241d1ebe covid:ann/target/b4abaf954c3aa59d8eb6594ed46653b3db3de4e2 covid:ann/target/616913d308e9aa40c2b72466dc5b7055dc1f9b96 covid:ann/target/366ac5e1a9048bd8d00b4d3fa52b228a38927e79 covid:ann/target/34d130a826cfb42607dd87200e590098ce43067e covid:ann/target/ef4747f772da7b783ffd39d01e0a73e310623f4e covid:ann/target/db2409c646501cac05e4eb2f3eadd1d2c6db1bd5 covid:ann/target/0d6f87154f5a5c0dcfeb4c61abfbfaedff29e92c covid:ann/target/27a365c0b6e776fe74f1a497f1e32b8c3f9ab0e0 covid:ann/target/2bc4deb8ba055c17680ecb45dfe64063cf073646 covid:ann/target/41d5522f04b4adf4887dc0822b2fe31e5f894ae9 covid:ann/target/5aa78121f5b6704e91e11a0ff87d0ea0102cde42 covid:ann/target/2279ae6b5fb9066b217b4a72dc585107e915c46c covid:ann/target/3c687b1319a4d43e0928deba805e894bc5556ab2 covid:ann/target/81914ed73244080aaf215317f8a14cb77d717c35 covid:ann/target/d96cd71899630aa759d7986d5938c2c721aed25a covid:ann/target/198062d9b7311e6470df3852b0c173d1f9788e29 covid:ann/target/592fcb5893c3e1b7d553a96f3564b2ae03afb4aa covid:ann/target/669363d6afd3c7fd18e1dd5006603562912b6f77 covid:ann/target/cb12d0aa9a21811ae0430213c25bc50cdbc3ecbd covid:ann/target/2d3aa82a2b9366e84141a92977b796cc9c15c0e5 covid:ann/target/86d422c10bfdf1d9b21c251700f3202a80eef522 covid:ann/target/99d2ab45ed212e1400d09135d3153d7cc8a05a69 covid:ann/target/ea129cab4cb612d6ac9b3b79a94f508eeb4d15fb covid:ann/target/831db1444a1c64453c290c9007485797e4c84fd0 covid:ann/target/8ff1d3bb47a7bb462bd9b6662d6a77bf131601d2 covid:ann/target/c17e24d27030873896951c9d70f04cf10e21dd81 covid:ann/target/118b4e4a68972ab5d4c2a85afc77a532567416ed covid:ann/target/e68c2cd0ee0e4dde1fca6304fe2880b0a83d9e36 covid:ann/target/9c23e21ffd1695e4ee69388e225ccb563077ec76 covid:ann/target/d9cebc477d3fdb8242d1eca58696ce25d5d169dc covid:ann/target/b9626709053d01e52a70ebef55edbc3e116df447 covid:ann/target/f14d8511d101fc628c8c8b280571eaf3b0575b91 covid:ann/target/05ad32da862cc4ffc2218fa67d0dc9f1d7f0f6f4 covid:ann/target/d34724cfcb1b507b64062bd7ae43ab6c5e65c840 covid:ann/target/6bab28fdf2a8c3a967e59a992890780251455bfc covid:ann/target/24782884f12e27f58c3957450101b896e1d43472 covid:ann/target/b3ce7b23a2583173e6c9deee49ce3627b8331fcf covid:ann/target/8050bc39cdbd5390a3bbaf32e1beedc1bf38e204 covid:ann/target/7bf0fd83439249ec645da2c8d4f4bf2406fca095 covid:ann/target/2aaac8272269b2e0c46c8a8bb2c4338cf93f0054 covid:ann/target/8431afcbc5a4fea13426f181f3c9c74ab863cf42 covid:ann/target/9c3f2b9158a7ea63122f469997007cfeb9af50e4 covid:ann/target/312ec332f1a8788759e086bb7a4b72d567eb5bb9 covid:ann/target/f330d1e780b7e970077f61b37d9e14f70102fa1b covid:ann/target/9f3ba55553c55577973e0da3701b17200741acf8 covid:ann/target/a423cce1f8299d2e7173665587b42562b87f7c81 covid:ann/target/21b01053fd184d6bbcb98a6804f0846288a3c4b8 covid:ann/target/82712343c3198d5716e141520abe46c8595a2900 covid:ann/target/85db9e353f1c64c450b160cb2694905d49306e14 covid:ann/target/88f257bc5dd4f57923f28f355caa4d54c9ff5789 covid:ann/target/8da98ade86cb4deb65a26a4033cf14e9964c5d84 covid:ann/target/b5daa7ae545124ccbce073514dd937fe0dcff928 covid:ann/target/bfe121fed0282cdc7b37069afa1e40bda0d3497e covid:ann/target/e300d8616b9e787289dad8fde820f51be5a0f33a

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software