CAS, SMILES, InChl and Fingerprints

1 minute read

Published:

An introduction to CAS, SMILES, InChl and Fingerprints of drugs or chemicals

CAS:

  • unique numerical identifier assigned by the Chemical Abstracts Service (CAS) to every chemical substance.
  • Example: caffeine: 58-08-2

SMILES strings:

  • Simplified Molecular Input Line Entry System.
  • using short ASCII strings.
  • it is very common that there are a lot of SMILES strings that represent the same structure.
  • unique SMILES string is also called the “canonical SMILES”.
  • Example: CN1C=NC2=C1C(=O)N(C(=O)N2C)C

InChI identifiers:

  • IUPAC Chemical Identifier
  • a unique textual label for any chemical substance.
  • comprise different layers and sub‐layers of information separated by slashes (/).
  • example: caffeine: InChl=1S/C8H10N4O2/c1–10–4–9–6–5(10)7(13)12(3)8(14)11(6)2/h4H,1–3H3 img

InChI Keys

  • a hashed version of the InChI img
  • The hash function is a one-way conversion img

hashed fingerprints:

  • use fixed-length binary values (0/1 bits) to encode molecular features, used to evaluation the similarity or diversity of chemicals.
  • one molecule -> one fingerprint; but different molecules may have the same fingerprint.
  • cannot confirm the presence of a substructure in a molecule, but confirm the absence of a substructure.
  • the structure of a chemical cannot be induced from the fingerprint.

References:

Updated: 1st Dec. 2021