In situ neighborhood sampling for large-scale GNN training

Song, Yuhang; Chen, Po Hao; Lu, Yuchen; Abrar, Naima; Kalavri, Vasiliki

In situ neighborhood sampling for large-scale GNN training

Files

3662010.3663443.pdf(1.43 MB)

Published version

Date

2024-06-09

DOI

10.1145/3662010.3663443

Authors

Song, Yuhang

Chen, Po Hao

Lu, Yuchen

Abrar, Naima

Kalavri, Vasiliki

URI

https://hdl.handle.net/2144/49418

Citation

Yuhang Song, Po Hao Chen, Yuchen Lu, Naima Abrar, and Vasiliki Kalavri. 2024. In situ neighborhood sampling for large-scale GNN training. In Proceedings of the 20th International Workshop on Data Management on New Hardware (DaMoN '24). Association for Computing Machinery, New York, NY, USA, Article 11, 1–5. https://doi.org/10.1145/3662010.3663443

Abstract

Graph Neural Network (GNN) training algorithms commonly perform neighborhood sampling to construct fixed-size mini-batches for weight aggregation on GPUs. State-of-the-art disk-based GNN frameworks compute sampling on the CPU, transferring edge partitions from disk to memory for every mini-batch. We argue that this design incurs significant waste of PCIe bandwidth, as entire neighborhoods are transferred to main memory only to be discarded after sampling. In this paper, we make the first step towards an inherently different approach that harnesses near-storage compute technology to achieve efficient large-scale GNN training. We target a single machine with one or more SmartSSD devices and develop a high-throughput, epoch-wide sampling FPGA kernel that enables pipelining across epochs. When compared to a baseline random-access sampling kernel, our solution achieves up to 4.26× lower sampling time per epoch.

License

© 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. This article has been published under a Read & Publish Transformative Open Access (OA) Agreement with ACM.

cb

Collections

BU Open Access Articles

Full item page