论文阅读

abstract

the limitation:

GNN needs fast inference.
Existing works focus on target GNN acceleration, such as GCN.
And many works rely on pre-processing which is not suitable to real-time application.

This paper work:

Propose FlowGNN, which is generalizable to the majority of message -passing GNNs.

and more in details:

a novel and scalable dataflow architecture(enhance speed of message-passing)
~~real-time GNN inference without any pre-processing~~
verify the proposed architecture on FPGA board(Xilinx Alveo U50 FPGA board)

Mentioning a survey: Computing Graph Neural Networks: A Survey from Algorithms to Accelerators.

Limitation

The most significant is that advanced GNNs cannot be simplified as matrix multiplications(the method is SpMM and GEMM), and edge embeddings cannot be ignored.

Edge embedding

Edge embeddings are used to represent important edge attributes, such as chemical bonds in a molecule.

this make the SpMM and GEMM not useful.

if not use edge embedding, message passing is $\phi (x_i^l)$, and it can be optimized by GEMM and SpMM.However, if use embedding, it will be $\phi (x_i^l,e_{i,j}^l)$, and the demension of node representation and edge representation is not same.

Invalid existing optimizations

SOTA I-GCN merge the same neighbor node as one node to optimize.

However, considering edge embedding, the $e_{ac},e_{bc}$ is not the same as $e_{bc},e_{bd}$

we cannot converge the node a and b

Non-trivial aggregation

the GEMM and SpMM use a certain pattern to optimize. However ,the compute coefficient is dynamic.

Anisotropic GNNs

such as GAT, it has a attention coefficient which is calculated by $x_i^{l},x_j^l$. and it is dynamic, which prevents GAT from being expressed as matrix multiplications to optimize.

Pre-processing

not applicable in RT application

Motivations and Innovations

Innovation:

Explicit message passing for generality
Multi-queue multi-level parallelism for performance

GENERIC ARCHITECTURE*

Message Passing Mechanism

$$
\mathcal{MP scatter} \
m_{1\to 0}^{l-1}=\phi (x_1^{l-1},e_{0,1}^{l-1}) \
\mathcal{MP gather} \
m_0^l=\mathcal A(m_{1\to0}^{l-1}) \
m_1^l=\mathcal A(m_{0\to1}^{l-1},m_{2\to1}^{l-1},m_{3\to1}^{l-1}) \
\mathcal{NT} \
x_1^l=\gamma (m_1^l,x_1^{l-1})
$$

3 phases:

message passing(gather):

from message buffer get $m$, and do aggregating $A(m_{2\to1}^l, m_{3\to1}^l, m_{4\to1}^l)$.

the $\mathcal{A}(.)$ function could be sum, max, mean, std.dev.
node transformation:

$\gamma (x_1^l, m_1^l)$.

$\gamma (.)$ could be full-connected layer, MLP…
message passing(scatter):

$\phi (x_1^{l+1},e_{1,2}^{l+1})\to m_{1\to 2}^{l+1}$. Sent it to message buffer for the next layer Message passing(gather)