HEIDI: Oden, Lena: Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Oden, Lena [VerfasserIn]
	Klenk, Benjamin [VerfasserIn]
	Fröning, Holger [VerfasserIn]
Titel:	Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy
Verf.angabe:	Lena Oden, Benjamin Klenk, Holger Fröning
Umfang:	10 S.
Fussnoten:	Gesehen am 03.07.2017
Titel Quelle:	Enthalten in: Parallel computing
Jahr Quelle:	2016
Band/Heft Quelle:	57(2016), S. 125-134
ISSN Quelle:	1872-7336
Abstract:	Graphic Processing Units (GPUs) are widely used in high performance computing, due to their high computational power and high performance per Watt. However, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation, while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. However, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how dynamic parallelism solves this problem. GPU-controlled communication in combination with dynamic parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Using other in-kernel synchronization methods results in massive performance losses, due to the forced serialization of the GPU thread blocks. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware.
DOI:	doi:10.1016/j.parco.2016.02.005
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. Verlag: http://dx.doi.org/10.1016/j.parco.2016.02.005
	Verlag: http://www.sciencedirect.com/science/article/pii/S0167819116300011
	DOI: https://doi.org/10.1016/j.parco.2016.02.005
Datenträger:	Online-Ressource
Sprache:	eng
K10plus-PPN:	1560390727
Verknüpfungen:	→ Zeitschrift

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/68133278

Impressum / Datenschutz Design