小康文章阅读笔记

小康文章阅读笔记

GROMACS蛋白配体复合物模拟

2025-03-27

gromacs小分子的配体的电荷问题一直时gromacs模拟蛋白-配体复合物中的难点与重点~网上有许多相关方面的教程,但是都没有成为系统,应李继存老师之邀,把最近摸索的方法过程总结出来,分享给大家~感谢网络上的朋友对我的帮助~

一、配体文件构建

1. 高斯输入文件建立:

antechamber -i lig.mol2 -fi mol2 -o lig.gjf -fo gcrt -pf y -gm "%mem=32GB" -gn "%nproc=12" -nc 0 -gk "#HF/6-31G*SCF=tight Test Pop=MK iop(6/33=2) iop(6/42=6) iop(6/50=1) opt nosymm" -ge lig.gesp -gv 1 -at gaff2

当然也可以进行基组的更换:

antechamber -i lig.mol2 -fi mol2 -o lig.gjf -fo gcrt -pf y -gm "%mem=32GB" -gn "%nproc=12" -nc 0 -gk "#b3lyp/6-31G(d) SCF=tight Test Pop=MK iop(6/33=2) iop(6/42=6) iop(6/50=1) opt nosymm" -ge lig.gesp -gv 1 -at gaff2

高斯运行:

g16 lig.gjf

当然还有一种方法进行计算

antechamber -i lig2.pdb -fi pdb -o LIG.mol2 -fo mol2 -pf y -c bcc -s 2 -rn LIG -at gaff2

2.拟合成resp文件

antechamber -i lig.gesp -fi gesp -o LIG.mol2 -fo mol2 -pf y -c resp -rn LIG -at gaff2

3.生成缺失参数文件

parmchk2 -i LIG.mol2 -f mol2 -o lig.frcmod -s gaff2

生成AMBER参数文件以及坐标文件,由于现在最新的ff19sb力场在gromacs里面使用还是不那么完善,所以目前我还是用的
ff14sb

source leaprc.protein.ff14SB
 source leaprc.gaff2
 loadamberparams lig.frcmod
 lig=loadmol2 LIG.mol2
 check lig
 saveamberparm lig lig.prmtop lig.inpcrd
 quit

命名为leap.in

tleap -f leap.in

4. 力场转化

安装acpype,当然也可以用卢老师的sobtop,更加方便

conda create -n acpypemd python=3.8
conda activate acpypemd
conda install -c conda-forge acpype

力场转化:

acpype -p lig.prmtop -x lig.inpcrd -d

二、蛋白模拟

1. 蛋白力场生成

由于用的ff14sb,需要下载力场文件:
https://github.com/intbio/gromacs_ff

进行蛋白的模拟:

gmx_mpi pdb2gmx -f re.pdb -o protein_processed.gro -water tip3p -ignh

如果报错可以尝试使用pdbfixer进行修复一下:

conda install -c conda-forge pdbfixer

2. 复合物力场构建

复制一份protein_processed.gro并重新命名为complex.gro
将得到的LIG_GMX.top中顶部的[defaults],[ atomtypes ],底部的[ system ],[ molecules ]删除,改名为LIG.itp。记得[ moleculetype ]需要被保留!!
将LIG_GMX.gro中内容放入complex.gro中:
示例如下:

   1  LIG   C1    1  17.821  15.972  15.867
   1  LIG   C2    2  17.782  15.845  15.831
   1  LIG   C3    3  17.651  15.821  15.787
   1  LIG   C4    4  17.559  15.924  15.780
   1  LIG   C5    5  17.596  16.054  15.819
   1  LIG   C6    6  17.730  16.076  15.858
   1  LIG   C7    7  17.423  15.887  15.725
   1  LIG   O1    8  17.376  16.146  15.829
   1  LIG   C8    9  17.505  16.172  15.821
   1  LIG   C9   10  17.105  15.994  15.670
   1  LIG  C10   11  17.139  16.145  15.664
   1  LIG  C11   12  17.146  16.210  15.804
   1  LIG  C12   13  17.285  16.258  15.843
   1  LIG  C13   14  17.401  15.932  15.581
   1  LIG  C14   15  17.274  15.892  15.511
   1  LIG  C15   16  17.149  15.918  15.547
   1  LIG   O2   17  17.546  16.285  15.822
   1  LIG  C16   18  17.289  16.310  15.986
   1  LIG   O3   19  17.484  15.992  15.520
   1  LIG   O4   20  17.872  15.747  15.840
   1  LIG   H1   21  17.835  15.664  15.810
   1  LIG   O5   22  17.778  16.195  15.893
   1  LIG   H2   23  17.714  16.264  15.872
   1  LIG   H3   24  17.921  15.992  15.899
   1  LIG   H4   25  17.624  15.722  15.756
   1  LIG   H5   26  17.344  15.924  15.788
   1  LIG   H6   27  17.413  15.778  15.724
   1  LIG   H7   28  16.998  15.982  15.683
   1  LIG   H8   29  17.150  15.951  15.758
   1  LIG   H9   30  17.065  16.195  15.603
   1  LIG  H10   31  17.234  16.158  15.613
   1  LIG  H11   32  17.110  16.141  15.880
   1  LIG  H12   33  17.081  16.297  15.809
   1  LIG  H13   34  17.318  16.335  15.775
   1  LIG  H14   35  17.292  15.844  15.416
   1  LIG  H15   36  17.072  15.888  15.479
   1  LIG  H16   37  17.387  16.347  16.012
   1  LIG  H17   38  17.218  16.391  15.998
   1  LIG  H18   39  17.262  16.231  16.056

并且需要修改顶部的原子数目,改为两个gro文件之和

在topol文件中添加小分子的位置限制:

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; Include ligand topology 
#include "LIG.itp" 

; Include water topology
#include "amber99sb.ff/tip3p.itp"

[ molecules ]
; Compound        #mols
Protein             1
LIG                 1

[注:增加了itp文件,以及molecules增加了小分子,即:

; Include ligand topology 
#include "LIG.itp"
LIG                 1

将[ atomtypes ]内的内容写入top文件中
位于[ moleculetype ]上部,例如:

;新增部分如下
[ atomtypes ]
;name   bond_type     mass     charge   ptype   sigma         epsilon       Amb
 ca       ca          0.00000  0.00000   A     3.31521e-01   4.13379e-01 ; 1.86  0.0988
 c3       c3          0.00000  0.00000   A     3.39771e-01   4.51035e-01 ; 1.91  0.1078
 os       os          0.00000  0.00000   A     3.15610e-01   3.03758e-01 ; 1.77  0.0726
 c        c           0.00000  0.00000   A     3.31521e-01   4.13379e-01 ; 1.86  0.0988
 ce       ce          0.00000  0.00000   A     3.31521e-01   4.13379e-01 ; 1.86  0.0988
 c2       c2          0.00000  0.00000   A     3.31521e-01   4.13379e-01 ; 1.86  0.0988
 o        o           0.00000  0.00000   A     3.04812e-01   6.12119e-01 ; 1.71  0.1463
 oh       oh          0.00000  0.00000   A     3.24287e-01   3.89112e-01 ; 1.82  0.0930
 ho       ho          0.00000  0.00000   A     5.37925e-02   1.96648e-02 ; 0.30  0.0047
 ha       ha          0.00000  0.00000   A     2.62548e-01   6.73624e-02 ; 1.47  0.0161
 hc       hc          0.00000  0.00000   A     2.60018e-01   8.70272e-02 ; 1.46  0.0208
 h1       h1          0.00000  0.00000   A     2.42200e-01   8.70272e-02 ; 1.36  0.0208

[ moleculetype ]
; Name            nrexcl
Protein_chain_A     3

3.定义单元盒子

gmx_mpi editconf -f complex.gro -o newbox.gro -bt cubic -d 1.0

4.加水

gmx_mpi solvate -cp newbox.gro -cs spc216.gro -p topol.top -o solv.gro

5.添加离子

ions.mdp文件如下:

; LINES STARTING WITH ';' ARE COMMENTS
title		    = Minimization	; Title of run

; Parameters describing what to do, when to stop and what to save
integrator	    = steep		; Algorithm (steep = steepest descent minimization)
emtol		    = 1000.0  	; Stop minimization when the maximum force < 10.0 kJ/mol
emstep          = 0.01      ; Energy step size
nsteps		    = 50000	  	; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist		    = 1		    ; Frequency to update the neighbor list and long range forces
cutoff-scheme   = Verlet
ns_type		    = grid		; Method to determine neighbor list (simple, grid)
rlist		    = 1.0		; Cut-off for making neighbor list (short range forces)
coulombtype	    = cutoff	; Treatment of long range electrostatic interactions
rcoulomb	    = 1.0		; long range electrostatic cut-off
rvdw		    = 1.0		; long range Van der Waals cut-off
pbc             = xyz 		; Periodic Boundary Conditions

离子添加

gmx_mpi grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr
gmx_mpi genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral

替换 15 SOL内的元素

6.能量最小化

em.mdp文件内容如下:

; LINES STARTING WITH ';' ARE COMMENTS
title		    = Minimization	; Title of run

; Parameters describing what to do, when to stop and what to save
integrator	    = steep		; Algorithm (steep = steepest descent minimization)
emtol		    = 1000.0  	; Stop minimization when the maximum force < 10.0 kJ/mol
emstep          = 0.01      ; Energy step size
nsteps		    = 50000	  	; Maximum number of (minimization) steps to perform

; Parameters describing how to find the neighbors of each atom and how to calculate the interactions
nstlist		    = 1		        ; Frequency to update the neighbor list and long range forces
cutoff-scheme   = Verlet
ns_type		    = grid		    ; Method to determine neighbor list (simple, grid)
rlist		    = 1.2		    ; Cut-off for making neighbor list (short range forces)
coulombtype	    = PME		    ; Treatment of long range electrostatic interactions
rcoulomb	    = 1.2		    ; long range electrostatic cut-off
vdwtype         = cutoff
vdw-modifier    = force-switch
rvdw-switch     = 1.0
rvdw		    = 1.2		    ; long range Van der Waals cut-off
pbc             = xyz 		    ; Periodic Boundary Conditions
DispCorr        = no

可以根据进行修改例如emtol等等的修改

gmx_mpi grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx_mpi  mdrun -v -deffnm em

7. NVT

首先对配体进行限制以免爆开

gmx_mpi make_ndx -f LIG.gro -o index_lig.ndx

选择所有的非氢数据

> 0 & ! a H*
> q
gmx_mpi genrestr -f LIG.gro -n index_lig.ndx -o posre_lig.itp -fc 1000 1000 1000

增加配体约束在topol文件中的Include water topology之前

; Ligand position restraints
#ifdef POSRES
#include "posre_lig.itp"
#endif

例如:


; Include ligand topology 
#include "LIG.itp" 

; Ligand position restraints
#ifdef POSRES
#include "posre_lig.itp"
#endif

; Include water topology
#include "./amber14sb_parmbsc1_cufix.ff/tip3p.itp"

构建索引:

gmx_mpi make_ndx -f em.gro -o index.ndx

选择蛋白和配体

> 1 | 13
> q

构建 nvt.mdp文件例如如下:

title                   = Protein-ligand complex NVT equilibration 
define                  = -DPOSRES  ; position restrain the protein and ligand
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 50000     ; 2 * 50000 = 100 ps
dt                      = 0.002     ; 2 fs
; Output control
nstenergy               = 500   ; save energies every 1.0 ps
nstlog                  = 500   ; update log file every 1.0 ps
nstxout-compressed      = 500   ; save coordinates every 1.0 ps
; Bond parameters
continuation            = no        ; first dynamics run
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds to H are constrained 
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme           = Verlet
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 20        ; largely irrelevant with Verlet
rlist                   = 1.2
vdwtype                 = cutoff
vdw-modifier            = force-switch
rvdw-switch             = 1.0
rvdw                    = 1.2       ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
rcoulomb                = 1.2       ; short-range electrostatic cutoff (in nm)
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling
tcoupl                  = V-rescale                     ; modified Berendsen thermostat
tc-grps                 = Protein_LIG Water_and_ions    ; two coupling groups - more accurate
tau_t                   = 0.1   0.1                     ; time constant, in ps
ref_t                   = 300   300                     ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl                  = no        ; no pressure coupling in NVT
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Dispersion correction is not used for proteins with the C36 additive FF
DispCorr                = no 
; Velocity generation
gen_vel                 = yes       ; assign velocities from Maxwell distribution
gen_temp                = 300       ; temperature for Maxwell distribution
gen_seed                = -1        ; generate a random seed

其中nvt的时间可以修改,主要是看是否平衡,此处我未进行修改

执行nvt:

gmx_mpi grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr
gmx_mpi mdrun -v -deffnm nvt 

8. NPT

npt模拟的mdp如下:

title                   = Protein-ligand complex NPT equilibration 
define                  = -DPOSRES  ; position restrain the protein and ligand
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 250000     ; 2 * 250000 = 500 ps
dt                      = 0.002     ; 2 fs
; Output control
nstenergy               = 500       ; save energies every 1.0 ps
nstlog                  = 500       ; update log file every 1.0 ps
nstxout-compressed      = 500       ; save coordinates every 1.0 ps
; Bond parameters
continuation            = yes       ; continuing from NVT 
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds to H are constrained 
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme           = Verlet
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 20        ; largely irrelevant with Verlet
rlist                   = 1.2
vdwtype                 = cutoff
vdw-modifier            = force-switch
rvdw-switch             = 1.0
rvdw                    = 1.2       ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
rcoulomb                = 1.2
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling
tcoupl                  = V-rescale                     ; modified Berendsen thermostat
tc-grps                 = Protein_LIG Water_and_ions    ; two coupling groups - more accurate
tau_t                   = 0.1   0.1                     ; time constant, in ps
ref_t                   = 300   300                     ; reference temperature, one for each group, in K
; Pressure coupling
pcoupl                  = C-rescale                     ; pressure coupling is on for NPT
pcoupltype              = isotropic                     ; uniform scaling of box vectors
tau_p                   = 2.0                           ; time constant, in ps
ref_p                   = 1.0                           ; reference pressure, in bar
compressibility         = 4.5e-5                        ; isothermal compressibility of water, bar^-1
refcoord_scaling        = com
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Dispersion correction is not used for proteins with the C36 additive FF
DispCorr                = no 
; Velocity generation
gen_vel                 = no        ; velocity generation off after NVT 

其中npt的时间可以修改,主要是看是否平衡,此处我未进行修改

执行nvt:

gmx_mpi grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx_mpi mdrun -v -deffnm npt 

9. MD 正式模拟

md的mdp文件如下:

title                   = Protein-ligand complex MD simulation 
; Run parameters
integrator              = md        ; leap-frog integrator
nsteps                  = 5000000   ; 2 * 5000000 = 10000 ps (10 ns)
dt                      = 0.002     ; 2 fs
; Output control
nstenergy               = 5000      ; save energies every 10.0 ps
nstlog                  = 5000      ; update log file every 10.0 ps
nstxout-compressed      = 5000      ; save coordinates every 10.0 ps
; Bond parameters
continuation            = yes       ; continuing from NPT 
constraint_algorithm    = lincs     ; holonomic constraints 
constraints             = h-bonds   ; bonds to H are constrained
lincs_iter              = 1         ; accuracy of LINCS
lincs_order             = 4         ; also related to accuracy
; Neighbor searching and vdW
cutoff-scheme           = Verlet
ns_type                 = grid      ; search neighboring grid cells
nstlist                 = 20        ; largely irrelevant with Verlet
rlist                   = 1.2
vdwtype                 = cutoff
vdw-modifier            = force-switch
rvdw-switch             = 1.0
rvdw                    = 1.2       ; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype             = PME       ; Particle Mesh Ewald for long-range electrostatics
rcoulomb                = 1.2
pme_order               = 4         ; cubic interpolation
fourierspacing          = 0.16      ; grid spacing for FFT
; Temperature coupling
tcoupl                  = V-rescale                     ; modified Berendsen thermostat
tc-grps                 = Protein_LIG Water_and_ions    ; two coupling groups - more accurate
tau_t                   = 0.1   0.1                     ; time constant, in ps
ref_t                   = 300   300                     ; reference temperature, one for each group, in K
; Pressure coupling 
pcoupl                  = Parrinello-Rahman             ; pressure coupling is on for NPT
pcoupltype              = isotropic                     ; uniform scaling of box vectors
tau_p                   = 2.0                           ; time constant, in ps
ref_p                   = 1.0                           ; reference pressure, in bar
compressibility         = 4.5e-5                        ; isothermal compressibility of water, bar^-1
; Periodic boundary conditions
pbc                     = xyz       ; 3-D PBC
; Dispersion correction is not used for proteins with the C36 additive FF
DispCorr                = no 
; Velocity generation
gen_vel                 = no        ; continuing from NPT equilibration 

正式模拟:

gmx_mpi grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_10.tpr

gmx_mpi mdrun -v -deffnm md_0_10

三、数据分析