Automatization of Tinker for conformational search

Tinker is a freely available, open source collection of molecular mechanics 
programs, which supports a variety of force fields, including MMFF, 
specifically designed for small organic molecules. One of the programs 
in Tinker is the scan conformers generator. The feature of this generator
is the thoroughness of conformational search, which clearly distinguishes it
from its contemporary analogues aimed at quick work with huge databases of 
drug-like compounds rather than targeting completeness of the conformational 
space.

Some our recent examples indicating this are
http://limor1.nioch.nsc.ru/quant/conformers/POXTRD/
http://limor1.nioch.nsc.ru/quant/conformers/1w96/
http://limor1.nioch.nsc.ru/quant/conformers/Verticillatriene/
http://limor1.nioch.nsc.ru/quant/conformers/tentoxin/

However, Tinker does not have scripts to automatically assign the atom types
needed to run the force field calculations, e.g. MMFF94, the most reliable 
force field for small organic molecules.
Fortunately, sdf2xyz2sdf can automatically generate Tinker inputs with MMFF94 
atom types.
Here, we propose an algorithm to automate an application of Tinker for 
conformational search. The algorithm is implemented in an operable bash 
script, see below. All explanations are given in the comments.

#!/bin/bash 

# It's the public domain software

# Dependencies:
# 1) scan program from Tinker package, http://dasher.wustl.edu/tinker/
# Based on our expertise, we recommend the scan program from Tinker v.7 as 
# scan subroutine from Tinker v.8 (last release) works less reliable.
# 2) sdf2tinkerxyz to be downloaded from sdf2xyz2sdf, http://sdf2xyz2sdf.sourceforge.net/
# 3) obabel to be downloaded from Open Babel, http://openbabel.org/wiki/Main_Page

# Useful link about Tinker+sdf2xyz2sdf:
# https://www.wildcardconsulting.dk/useful-information/scripting-molecular-mechanics-calculations-using-tinker-and-sdf2xyz2sdf/

TINKER_DIR="/usr/local/lib/tinker"
scan="$TINKER_DIR/bin/scan"

obabel="/usr/local/bin/obabel"

sdf2tinkerxyz="$TINKER_DIR/bin/sdf2tinkerxyz"
#tinkerxyz2sdf="$TINKER_DIR/bin/tinkerxyz2sdf"

# Let's exemplified an application of this algorithm by 
# conformational search in cyclodecene.
# You may draw 2D cyclodecene i.e. in MarvinSketch, add hydrogenes, 
# transform it in 3D and save it as sdf  OR
# generate from smile with Open Babel
echo 'Generate 3D structure'
$obabel -:"C1CCCC=CCCCC1" -O cyclodecene.sdf --gen3D
#$obabel -:"C(=O)(O)C(C)(C)Oc1ccc(cc1)CC(=O)Nc1cc(cc(c1)C)C" -O 1g9w.sdf --gen3D
echo
# Likewise, any other format can be converted to sdf, i.e.
#$obabel cyclodecene.xyz -O cyclodecene.sdf
# But please check bonds in sdf if initial format doesn't contain connectivity.


#for file in "$@"; do  # For tinker_scan *.sdf
for file in cyclodecene.sdf; do
  mol=`basename $file .sdf`

  # Tinker scan needs "good" numeration of atoms to work automatically
  # For example, cyclodecene drawn in Marvin Sketch 
  # has "bad" numeration and scan fails with error
  # 'INITROT  --  Rotation about     1     2 occurs more than once in Z-matrix'
  # Molden can renumerate atoms in "good" manner, but it isn't scriptable.
  # We found that canonical numeration produced by Open Babel is "good" for scan.
  echo 'Renumerate structure'
  $obabel $mol.sdf -O $mol.sdf --canonical --title $mol
  echo
  # Explicit title (1st line of sdf file) is needed for sdf2tinkerxyz

  # Convert SDF files into TINKER XYZ files performing automatic assignment 
  # of MMFF94 atom types, bond types and charges.
  # http://dx.doi.org/10.1007/s00894-011-1111-7
  echo "Convert SDF into TINKER XYZ"
  [ -e $mol.xyz ] && mv $mol.xyz $mol.xyzSA
  $sdf2tinkerxyz < $mol.sdf
  # title.xyz and title.key files are generated (title is 1st line of sdf file)
  mv $mol.xyz $mol.txyz
  [ -e $mol.xyzSA ] && mv $mol.xyzSA $mol.xyz
  echo

  # MMFF94 parameters
  echo "PARAMETERS $TINKER_DIR/params/mmff" >> $mol.key

  # To keep the chirality found at chiral tetravalent centers
  echo "ENFORCE-CHIRALITY" >> $mol.key

  ############ This paragraph does not apply to scan, but is relevant for 
  ############ other tinker programs that compute the single point energy.
  # When dealing with the pi-sistems, sdf2tinkerxyz adds lines of type 
  # 'mmff-pibond     2     3' into key file. Depending on the numbering of atoms, 
  # this line may have an inverted order of numbers: 'mmff-pibond     3     2'. 
  # This does not affect the optimized structure and its energy, but only the 1st  
  # case gives correct single point energy. Although this has not been thoroughly 
  # checked, it is possible that sorting the numbers in ascending order is the 
  # solution to the problem. Our perl-script 'conformers' does this just in case.
  ############

  # !!!!! RUN GENERATION OF CONFORMERS !!!!!!
  # Detailed description of Tinker's scan algorithm:
  # http://dasher.wustl.edu/ponder/papers/ccb-report-1998-01.pdf
  echo "Run Tinker scan for $mol"
  $scan $mol.txyz 0 5 20 0.0001 | tee $mol.out | \
  grep -e 'Number of Torsions' -e 'Potential Surface Map'
  # Parameters '0 5 20 0.0001' for scan:
  # 0  - Automatic Selection of Torsional Angles
  # 5  - Number Search Directions for Local Search
  # 20 - Energy Threshold for Local Minima
  # 0.0001 - RMS Gradient per Atom Criterion
  # We recommend to try to increase 2nd and 3rd parameters for more full
  # conformational space (but sometimes incteasing of Search Directions
  # results in less space)

  # Extract from out file lines like
  # `    Potential Surface Map       Minimum     54               36.9909`
  # Resulting line is '054  36.9909'
  echo "Extract energies from out file"
  perl -ane '/Potential Surface Map/ && printf "%03d  %s\n", @F[4,5]' \
  $mol.out > $mol.energy
  echo

  # Tinker format isn't common.
  # Unfortunately, program tinkerxyz2sdf from sdf2xyz2sdf doesn't work 
  # well and obabel doesn't recognize Tinker's MMFF atoms.
  # Therefore we will convert Tinker format to XMol xyz format manually, 
  # numbering the files according to the energy and 
  # placing the energy in the comments (the 2nd line).
  # It takes a lot of time, perl script 'conformers' completes it more faster.
  echo "Sort conformers by energy and convert Tinker format into XMol xyz"
  count=1
  sort -n -k 2 $mol.energy | while read N E; do
    fcount=`printf %03d $count`
    echo "$mol.$N => $mol.$fcount.xyz"
    (read N_atoms _
    echo " $N_atoms" > $mol.$fcount.xyz
    echo " Energy $E" >> $mol.$fcount.xyz
    while read _ at x y z _; do
      at=`echo "$at" | perl -ne 's/^W+//; s/ZINC/ZN/;
                                 /^(C[LAU]|NA|SI|FE|[HCONSPFK]|BR|LI|ZN|MG)/ && 
                                 print ucfirst lc $1'`
      echo " $at   $x   $y   $z" >> $mol.$fcount.xyz
    done) < $mol.$N
    let count++
  done
  echo
  # !!!!! END GENERATION OF CONFORMERS !!!!!!
  
  # The same work conformers perl-script can do
  # http://limor1.nioch.nsc.ru/quant/program/conformers/
  #conformers -tinker -only_gen *.sdf
  # -only_gen eliminates additional optimization and removal of duplicates.

  # Often Tinker scan provides already optimized and only unique conformers,
  # therefore post-processing of the conformers is not required.
  # But sometimes removing of duplicates is required, i.e. by conformers script:
  # cat $mol.[0-9][0-9][0-9]*.xyz > t.xyz; conformers -no_mopac -no_ret t.xyz

  # Tinker scan does'n keep cis/trans configuration of double bonds. Is's bad.
  # As workaround, we use clusterization of conformers by smiles.
  echo "Clusterization of conformers by smiles"
  $obabel $mol.[0-9][0-9][0-9]*.xyz -o smi - | perl -lane \
  '($smi,$name) = /(\S+)\s+(.+)/; push @{$h{$smi}}, $name; 
  END {while (($k,$v) = each %h) {print "$k ",$#$v+1; print "@$v"}}'
  # In case of cyclodecene, result is
  # C\1=C\CCCCCCCC1: 32 conformers with cis double bond
  # C\1=C/CCCCCCCC1:  9 trans
  # [CH]1[CH]CCCCCCCC1: 13 also trans with strained double bond 
  # which isn't recognized by obabel as double bond.
  # Perl script 'sort_by_smi' produce more readable and compact ouput
  # http://limor1.nioch.nsc.ru/quant/program/sort_by_smi/
  
  # Delete conformers in Tinker format
  rm -f $mol.{[0-9][0-9][0-9],[0-9][0-9][0-9][0-9],[0-9][0-9][0-9][0-9][0-9]}
  rm -f $mol.{txyz,key,out,energy}
done