少女祈祷中...

in node embedding, we use encoder to map original network node to embedding space, the encoder is:
$$
z_v=Z.v
$$
the matrix is what we need to learn.

now, we use deep learning-based method as our encoder, which is composed of multiple layers of non-linear transformations.

image-20231016195600647

What makes graph representation learning so difficult?

image-20231016195941481

  • no fixed graph
  • Dynamic and no fixed fratures

Basics of DL

we usually view the task as an Optimization problem, which means we need to min. the objective function(loss function).

for predict task, use L2 loss:
$$
L(y,f(x))=||y-f(x)||_2
$$
or L1 loss.

for classify problem, we use cross entropy(CE)
$$
CE(y,f(x))=-\sum_{i=1}^C(y_i\log f(x)_i)
$$

$$
f(x)=softmax(g(x))
$$

g is the model.

Deep Learning for graph

what if we directly feed the graph adj. matrix and feature set?

image-20231016201516077

Problem:

  • Parameter $O(|V|)$, certainly speaking is $O(|V|+num_{frature})$
  • the graph size is fixed
  • sensitive to node ordering

what about convolutions?

image-20231016201921224

problem:

  • no stride or slide
  • permutation invariance(改变排序不变性)

image-20231016202616104

graph and node representation should be the same for plan1 and plan2.

we need a permutation invarient function!

and maps, cons cannot solve this.

Graph Convolutional Networks

main problems are:

  1. too many parameter

  2. order invariance

first we solve proble 1!

Lets consider mlp $\to$ CNN

we use the kernel to share parameter and aggregate information(spacial)

can we apply it to GNN?

idea: for node v, use it as an kernel to aggregate information from $N(v)$, thus, node’s neiborhood defins the computation graph

image-20231017102318921

image-20231017102403312

now we solve the prob.1, how to keep order invariance?

image-20231017104641358

the approach is: Average information from neighbors and apply a neural network

now we can get the formulation of GNN(GCN):

image-20231017103110872

and the matrix formulation:

image-20231017103158325

image-20231017103350574

Train GNN

Supervised learning

$$
\min _{\theta}\mathcal{L}(y,f(z_v))
$$

e.g., drug classification(toxic or not)

image-20231017104129035

the $f(x_v)$ is $\sigma (z_v^T\theta)$

Unsupervised learning

$$
\mathcal{L}=\sum_{z_u,z_v}CE(y_{u,v},DEC(z_u,z_v))
$$

if node u and v are similar, then $y_{u,v}=1$