Graph Convolutional Network — DGL 1.1.3 documentation (2024)

DGL

1.1.x

Get Started

Install and Setup
A Blitz Introduction to DGL

Advanced Materials

User Guide
用户指南
사용자 가이드
🆕 Tutorials: dgl.sparse
Stochastic Training of GNNs
Training on CPUs
Training on Multiple GPUs
Distributed training
Paper Study with DGL
- Graph Convolutional Network
- Relational Graph Convolutional Network
- Line Graph Neural Network
- Understand Graph Attention Network
- Tree-LSTM in DGL
- Generative Models of Graphs
- Capsule Network
- Transformer as a Graph Neural Network

API Reference

dgl
dgl.data
dgl.dataloading
dgl.DGLGraph
dgl.distributed
dgl.function
dgl.geometry
dgl.nn (PyTorch)
dgl.nn (TensorFlow)
dgl.nn (MXNet)
dgl.nn.functional
dgl.ops
dgl.optim
dgl.sampling
🆕 dgl.sparse
dgl.multiprocessing
dgl.transforms
User-defined Functions

Notes

Contribute to DGL
DGL Foreign Function Interface (FFI)
Performance Benchmarks

Misc

Frequently Asked Questions (FAQ)
Environment Variables
Resources

Paper Study with DGL
Graph Convolutional Network
Edit on GitHub

Note

Model Overview¶

GCN from the perspective of message passing¶

We describe a layer of graph convolutional neural network from a messagepassing perspective; the math can be found here.It boils down to the following step, for each node \(u\):

1) Aggregate neighbors’ representations \(h_{v}\) to produce anintermediate representation \(\hat{h}_u\). 2) Transform the aggregatedrepresentation \(\hat{h}_{u}\) with a linear projection followed by anon-linearity: \(h_{u} = f(W_{u} \hat{h}_u)\).

We will implement step 1 with DGL message passing, and step 2 byPyTorch nn.Module.

GCN implementation with DGL¶

We first define the message and reduce function as usual. Since theaggregation on a node \(u\) only involves summing over the neighbors’representations \(h_v\), we can simply use builtin functions:

import osos.environ["DGLBACKEND"] = "pytorch"import dglimport dgl.function as fnimport torch as thimport torch.nn as nnimport torch.nn.functional as Ffrom dgl import DGLGraphgcn_msg = fn.copy_u(u="h", out="m")gcn_reduce = fn.sum(msg="m", out="h")

We then proceed to define the GCNLayer module. A GCNLayer essentially performsmessage passing on all the nodes then applies a fully-connected layer.

Note

This is showing how to implement a GCN from scratch. DGL provides a moreefficient builtin GCN layer module.

class GCNLayer(nn.Module): def __init__(self, in_feats, out_feats): super(GCNLayer, self).__init__() self.linear = nn.Linear(in_feats, out_feats) def forward(self, g, feature): # Creating a local scope so that all the stored ndata and edata # (such as the `'h'` ndata below) are automatically popped out # when the scope exits. with g.local_scope(): g.ndata["h"] = feature g.update_all(gcn_msg, gcn_reduce) h = g.ndata["h"] return self.linear(h)

The forward function is essentially the same as any other commonly seen NNsmodel in PyTorch. We can initialize GCN like any nn.Module. For example,let’s define a simple neural network consisting of two GCN layers. Suppose weare training the classifier for the cora dataset (the input feature size is1433 and the number of classes is 7). The last GCN layer computes node embeddings,so the last layer in general does not apply activation.

class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.layer1 = GCNLayer(1433, 16) self.layer2 = GCNLayer(16, 7) def forward(self, g, features): x = F.relu(self.layer1(g, features)) x = self.layer2(g, x) return xnet = Net()print(net)

Out:

Net( (layer1): GCNLayer( (linear): Linear(in_features=1433, out_features=16, bias=True) ) (layer2): GCNLayer( (linear): Linear(in_features=16, out_features=7, bias=True) ))

We load the cora dataset using DGL’s built-in data module.

from dgl.data import CoraGraphDatasetdef load_cora_data(): dataset = CoraGraphDataset() g = dataset[0] features = g.ndata["feat"] labels = g.ndata["label"] train_mask = g.ndata["train_mask"] test_mask = g.ndata["test_mask"] return g, features, labels, train_mask, test_mask

When a model is trained, we can use the following method to evaluatethe performance of the model on the test dataset:

def evaluate(model, g, features, labels, mask): model.eval() with th.no_grad(): logits = model(g, features) logits = logits[mask] labels = labels[mask] _, indices = th.max(logits, dim=1) correct = th.sum(indices == labels) return correct.item() * 1.0 / len(labels)

We then train the network as follows:

import timeimport numpy as npg, features, labels, train_mask, test_mask = load_cora_data()# Add edges between each node and itself to preserve old node representationsg.add_edges(g.nodes(), g.nodes())optimizer = th.optim.Adam(net.parameters(), lr=1e-2)dur = []for epoch in range(50): if epoch >= 3: t0 = time.time() net.train() logits = net(g, features) logp = F.log_softmax(logits, 1) loss = F.nll_loss(logp[train_mask], labels[train_mask]) optimizer.zero_grad() loss.backward() optimizer.step() if epoch >= 3: dur.append(time.time() - t0) acc = evaluate(net, g, features, labels, test_mask) print( "Epoch {:05d} | Loss {:.4f} | Test Acc {:.4f} | Time(s) {:.4f}".format( epoch, loss.item(), acc, np.mean(dur) ) )

Out:

 NumNodes: 2708 NumEdges: 10556 NumFeats: 1433 NumClasses: 7 NumTrainingSamples: 140 NumValidationSamples: 500 NumTestSamples: 1000Done loading data from cached files./home/ubuntu/prod-doc/readthedocs.org/user_builds/dgl/envs/1.1.x/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3441: RuntimeWarning: Mean of empty slice. out=out, **kwargs)/home/ubuntu/prod-doc/readthedocs.org/user_builds/dgl/envs/1.1.x/lib/python3.7/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)Epoch 00000 | Loss 1.9440 | Test Acc 0.3760 | Time(s) nanEpoch 00001 | Loss 1.7760 | Test Acc 0.5380 | Time(s) nanEpoch 00002 | Loss 1.5871 | Test Acc 0.5690 | Time(s) nanEpoch 00003 | Loss 1.4662 | Test Acc 0.5960 | Time(s) 0.0191Epoch 00004 | Loss 1.3598 | Test Acc 0.6260 | Time(s) 0.0187Epoch 00005 | Loss 1.2603 | Test Acc 0.6680 | Time(s) 0.0188Epoch 00006 | Loss 1.1644 | Test Acc 0.7060 | Time(s) 0.0187Epoch 00007 | Loss 1.0717 | Test Acc 0.7270 | Time(s) 0.0187Epoch 00008 | Loss 0.9821 | Test Acc 0.7340 | Time(s) 0.0187Epoch 00009 | Loss 0.8975 | Test Acc 0.7190 | Time(s) 0.0187Epoch 00010 | Loss 0.8202 | Test Acc 0.7120 | Time(s) 0.0186Epoch 00011 | Loss 0.7505 | Test Acc 0.6960 | Time(s) 0.0186Epoch 00012 | Loss 0.6873 | Test Acc 0.6880 | Time(s) 0.0187Epoch 00013 | Loss 0.6274 | Test Acc 0.6900 | Time(s) 0.0187Epoch 00014 | Loss 0.5708 | Test Acc 0.6930 | Time(s) 0.0187Epoch 00015 | Loss 0.5187 | Test Acc 0.7070 | Time(s) 0.0187Epoch 00016 | Loss 0.4710 | Test Acc 0.7100 | Time(s) 0.0188Epoch 00017 | Loss 0.4275 | Test Acc 0.7150 | Time(s) 0.0188Epoch 00018 | Loss 0.3881 | Test Acc 0.7250 | Time(s) 0.0187Epoch 00019 | Loss 0.3527 | Test Acc 0.7330 | Time(s) 0.0186Epoch 00020 | Loss 0.3205 | Test Acc 0.7420 | Time(s) 0.0186Epoch 00021 | Loss 0.2909 | Test Acc 0.7450 | Time(s) 0.0186Epoch 00022 | Loss 0.2643 | Test Acc 0.7450 | Time(s) 0.0186Epoch 00023 | Loss 0.2406 | Test Acc 0.7490 | Time(s) 0.0187Epoch 00024 | Loss 0.2192 | Test Acc 0.7490 | Time(s) 0.0186Epoch 00025 | Loss 0.1998 | Test Acc 0.7500 | Time(s) 0.0187Epoch 00026 | Loss 0.1820 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00027 | Loss 0.1657 | Test Acc 0.7480 | Time(s) 0.0186Epoch 00028 | Loss 0.1508 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00029 | Loss 0.1372 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00030 | Loss 0.1249 | Test Acc 0.7510 | Time(s) 0.0186Epoch 00031 | Loss 0.1138 | Test Acc 0.7510 | Time(s) 0.0186Epoch 00032 | Loss 0.1036 | Test Acc 0.7540 | Time(s) 0.0187Epoch 00033 | Loss 0.0944 | Test Acc 0.7530 | Time(s) 0.0187Epoch 00034 | Loss 0.0862 | Test Acc 0.7520 | Time(s) 0.0187Epoch 00035 | Loss 0.0787 | Test Acc 0.7520 | Time(s) 0.0187Epoch 00036 | Loss 0.0720 | Test Acc 0.7540 | Time(s) 0.0187Epoch 00037 | Loss 0.0659 | Test Acc 0.7530 | Time(s) 0.0187Epoch 00038 | Loss 0.0605 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00039 | Loss 0.0555 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00040 | Loss 0.0511 | Test Acc 0.7500 | Time(s) 0.0186Epoch 00041 | Loss 0.0471 | Test Acc 0.7510 | Time(s) 0.0186Epoch 00042 | Loss 0.0434 | Test Acc 0.7490 | Time(s) 0.0186Epoch 00043 | Loss 0.0401 | Test Acc 0.7460 | Time(s) 0.0186Epoch 00044 | Loss 0.0371 | Test Acc 0.7440 | Time(s) 0.0187Epoch 00045 | Loss 0.0344 | Test Acc 0.7430 | Time(s) 0.0186Epoch 00046 | Loss 0.0320 | Test Acc 0.7440 | Time(s) 0.0186Epoch 00047 | Loss 0.0298 | Test Acc 0.7410 | Time(s) 0.0186Epoch 00048 | Loss 0.0277 | Test Acc 0.7400 | Time(s) 0.0186Epoch 00049 | Loss 0.0259 | Test Acc 0.7390 | Time(s) 0.0186

GCN in one formula¶

Mathematically, the GCN model follows this formula:

\(H^{(l+1)} = \sigma(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}H^{(l)}W^{(l)})\)

Here, \(H^{(l)}\) denotes the \(l^{th}\) layer in the network,\(\sigma\) is the non-linearity, and \(W\) is the weight matrix forthis layer. \(\tilde{D}\) and \(\tilde{A}\) are separately the degreeand adjacency matrices for the graph. With the superscript ~, we are referringto the variant where we add additional edges between each node and itself topreserve its old representation in graph convolutions. The shape of the input\(H^{(0)}\) is \(N \times D\), where \(N\) is the number of nodesand \(D\) is the number of input features. We can chain up multiplelayers as such to produce a node-level representation output with shape\(N \times F\), where \(F\) is the dimension of the output nodefeature vector.

The equation can be efficiently implemented using sparse matrixmultiplication kernels (such as Kipf’spygcn code). The above DGL implementationin fact has already used this trick due to the use of builtin functions.

Note that the tutorial code implements a simplified version of GCN where wereplace \(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}\) with\(\tilde{A}\). For a full implementation, see our examplehere.

Total running time of the script: ( 0 minutes 1.634 seconds)

Download Python source code: 1_gcn.py

Download Jupyter notebook: 1_gcn.ipynb

Gallery generated by Sphinx-Gallery