Data Scientist – Cheminformatics
Data Scientists specializing in Cheminformatics apply advanced statistical and machine learning models to chemical data to support drug discovery, compound optimization, and virtual screening efforts. They work closely with medicinal chemists and computational biologists to predict properties and behaviors of small molecules.
Responsibilities include:
Developing predictive models – QSAR, molecular docking, physicochemical property prediction, and ADMET modeling.
Managing large compound libraries – Using cheminformatics tools like RDKit, Open Babel, or KNIME for structure handling and descriptor generation.
Visualizing and analyzing SAR data – Supporting lead optimization through clustering, PCA, or similarity analyses.
Contributing to compound prioritization strategies – Helping R&D teams focus efforts on the most promising drug candidates using data-driven scoring systems.
Maintaining clean, structured data environments – Using SQL, Python, and cloud-based data warehousing to handle high-volume structure–activity datasets.
Ideal candidates bring a strong background in chemistry or computational biology, paired with Python or R expertise, and familiarity with chemical notations like SMILES, InChI, and SDF formats.