少女祈祷中...

a single layer og GNN

there are many different GNN: GCN, GraphSAGE, GAT…

the difference between them mainly are:

  • message send
  • message aggregation

image-20231018220120532

message send

$$
m_u^l=\mathcal {MSG}^{(l)}(h_u^{l-1})
$$

moostly the message function is linear:
$$
m_u^l=W^{(l)}h_u^{(l-1)}
$$
image-20231018220634661

message aggregation

the aggregate function could be: sum, average, or max polling…

and its previous message also should be considered, so:

image-20231018220734835

image-20231018220745103

GCN

Lets look at GCN, single layer:

image-20231018220913386

  • message send: one linear and a normalized factor $\frac{1}{N}$
  • message aggregate: use $SUM(.)$ as the aggregate function

GraphSAGE

Single attention

image-20231018221152676

it can select $SUM(.),MAX(.)…$ as its AGG.

and the linear is behand AGG with a concat.

What’s more, the AGG also canbe:

image-20231018221459065

GAT

image-20231018221534957

Intuition: different neighbors set different attention coefficient.

for GCN, it is $\frac{1}{N}$

and for CAT, how to caculate attention?

the importance of u towards v is:

image-20231018221924962

the weight is form a linear:

image-20231018222042889

and for a node v, its attention is:

image-20231018222004730

Mult-attention

image-20231018222156770

the reason is: avoid the attention coefficient to trap into local optimization

Deep learning module

image-20231019080319778

batch normalization

Goal: stablize the train stage

Idea:

  • Re-center the node mean to 0
  • Re-scale the variance to 1

Setup:

  • Re-center and re-scale:

    image-20231019080836608

  • use trainable parameter $\beta$ and $\gamma$ to normalize the label

    image-20231019081048015

Dropout

Goal: prevent overfitting

Idea:

  • when training, randomly(the prob. p) set some neurons to 0
  • when testing, using all neurons to computer.

In mlp:

image-20231019082644910

in GNN, it is in the stage of message send, in linear layer:

image-20231019082752171

Activation

image-20231019083002574

Stack Layers of a GNN

how to stack single GNN layer?

  • stack layers sequentially
  • add skip connections

stack layers sequentially

let’s see a 3 layers GNNs:

image-20231021214229778

what’s the problem?

It might over-smoothing

The notion is: in the i th layer(such as 4th, 5th…), all node embeddings converge to the same value. This means in the last layer, all nodes are same. And our object is to make distinguish deferent nodes.

let’s talk about an another notion: Receptive Field

it means: the set of nodes that determine the enbedding of a node of interest.

and in GNN, the receptive field of one node can be seen as K-hop neighborhood.

image-20231021215129650

and the over-smoothing problem can be represent as:

image-20231021215316837

so, the lesson is: we should determine the layers number cautious.

And the experience is: the GNNs layers number is often the receptive field of nodes plus one.

enhance the expression power

the next problem is: how could enhance the expression power of GNN?

  1. add layers that not passing message

    image-20231021215956703

  2. Increasing the expressive power withen each layers

    image-20231021220204140

skip connection

the problem is: what if some downstream task still needs many GNN layers?

the intuition is from ResNet.

we could add skip connections in GNNs.

one way:

image-20231021221203305

Another:

image-20231021221218844

why does it useful?

the skip connection model can create a mixture of n models.

image-20231021221332742

the difference of GCN and the GCN with skip connection

image-20231021221451314