[NMI] A geometric deep learning approach to predict binding conformations of bioactive molecules 解读

Title: A geometric deep learning approach to predict binding conformations of bioactive molecules



发表周期: Received: 18 May 2021; Accepted: 28 September 2021; Published online: 2 December 2021


One of the difficulties arises from the fact that only a small portion of the large chemical space will bind to a specific biological target and result in a therapeutic effect.

这篇文章中作者应用深度学习模型DeepDock捕获protein-ligand complex中的原子级微环境特征学习其中的结合构象。


The model reported in this study was trained using the general set of the PDBbind database (v.2019)30, which contains a collection of 17,679 protein–ligand structures with their respective potency (half-maximum inhibitory concentration, dissociation constant and so on). From these, we removed those complexes that are included in the CASF-2016 benchmark and those that failed during the preprocessing step, leaving a total of 16,367 protein–ligand complexes, which were randomly divided in a training set containing 15,000 complexes and a test set with 1,367.


The PDBbind database is a comprehensive collection of experimentally measured binding affinity data (Kd, Ki, and IC50) for the protein-ligand complexes deposited in the Protein Data Bank (PDB).[1][2] It thus provides a link between energetic and structural information of protein-ligand complexes, which is of great value to various studies on molecular recognition occurred in biological systems.


Every node is represented by a one-hot vector that indicates the atom type among 28 possibilities (Be, B, C, N, O, F, Mg, Si, P, S, Cl, V, Fe, Co, Cu, Zn, As, Se, Br, Ru, Rh, Sb, I, Re, Os, Ir, Pt and Hg).


As in MaSIF, protein surfaces were triangulated using MSMS37 with a density of 3.0 nodes per Å^2 and a probe radius of 1.5 Å. The resulting meshes were downsampled to a resolution of 1 Å and processed using PyMesh.




Our approach directly uses the molecular surface of the binding site in the form of a polygon mesh.

Both the target mesh and the ligand graph are processed by independent residual GNNs. Through this procedure, the processed node fea- tures not only contain information of an individual atom or point in the molecular surface, but also have information about the other nodes around them. In other words, the processed atom features encode all the atomic environment around a specific atom, whereas the target features encode a patch of the molecular surface around a specific point.

GNN从ligand和target中分别提取出来的特征在特征向量上简单拼接后被送入mixture density network (MDN)。

by using this probability density function we can estimate the likelihood of finding ligand node i separated from a target node j by any distance $d_{ij}$.


作者通过在CASF-2016 benchmark预测

  1. Docking power
  2. Forward screening power
  3. Reverse screening power


Docking power

The evaluation of docking power measures the ability of a scor- ing function to identify native ligand binding poses among a set of decoys.

Screening power

The evaluation of screening power in CASF-2016 is designed to measure the ability of a scoring function to identify true binders of a specific target from a pool of random compounds.



  1. 特征提取
  2. 特征融合
  3. MDN

First, the node and edge features are projected to a 128D embedding using a linear layer.

We then use a sequence of three GNNs to update each node and edge based on their neighbouring nodes and the type of edges connecting them.

After the initial processing by the GNNs, the node and edge features were processed by 10 residual GNN blocks.

The MDN uses an MLP to create a hidden representation $h_{r,s}$ that combines the concatenated target and ligand node information.

All MLPs used are composed of a linear layer followed by batch normalization and an exponential linear unit (ELU) as activation function.