About Me

I am a computational scientist with 3+ years of experience, specialized in building scalable and robust AI/ML models, bioinformatics pipelines, and data analysis tools, focusing on advancing drug discovery, precision medicine, target identification, clinical development and therapeutic research through cross-functional collaboration and data-driven solutions. Passionate about solving complex biomedical problems at the intersection of AI, omics, and healthcare.

Gmail | LinkedIN | GitHub | Medium

Cross-Cutting Capabilities

AI/ML Model Development
Pipeline Automation
Data Integration
Knowledge Engineering
Predictive Analytics

Technical Skills

Machine Learning Graph Neural Networks Deep Learning LSTM NLP LLM Knowledge Graphs GraphRAG GenAI Real World Data WGS RNA Seq scRNA Seq GWAS Cromwell Nextflow Docker RESTful API

Expertise

Multi disciplinary capabilities across research, therapeutics, and technology development

Therapeutic Development

1

Target Identification

Discovering and validating therapeutic targets

GWAS Knowledge Graphs Systems Modeling scRNA-Seq RNA-Seq
2

Lead Discovery

Designing potential therapeutic candidates

Antibody Design Molecule generation QSPR Models
3

Preclinical

Evaluating safety and efficacy

In Silico KO

Research & Precision Medicine

1

Multi-Omics Analysis

Extracting biological insights from complex data

WGS scRNA Seq RNA Seq Proteomics
2

Disease Modeling

Understanding disease mechanisms

Knowledge Graphs Boolean Networks Simulation Biomarker Identification
3

Clinical Translation

Bridging research to patient care

Temporal KGs Real World Evidence Decision Support

Bioinformatics Pipeline Development

1

Data Acquisition

Ingesting and standardizing biological data

WGS/RNA Seq/scRNA Seq Data Wrangling
2

Analysis Pipeline

Building robust analysis workflows

Nextflow Docker Cromwell
3

Knowledge Extraction

Deriving actionable insights

ML Models Biomarker Identification Visualization

Translational AI/ML Applications

1

Data Preparation

Structuring biomedical data for AI

Feature Engineering Embeddings Data Augmentation
2

Model Development

Building domain specific AI solutions

Applied ML NLP LLM RAG
3

Deployment

Implementing solutions in practice

AWS LangChain REST API

Work Experience

AI driven antibody sequence generation

Developed an AI driven pipeline for antibody sequence generation, leveraging deep learning and generative models trained on large scale OAS & SAbDab antibody datasets. Integrated with sequence optimization and structural validation tools to design high affinity, developable antibodies with therapeutic potential.

AntiFold | AbLang | diffAb | BioPhi | LLM fine tunning

Knowledge Graph

Built a KG with open source biological/Real World/Clinical data, harmonized with controlled vocabularies for each entity. Application included drug repurposing, target identification, safety assessment for toxicity and organ wise stratification, reducing months of work to weeks

Neo4j | Link prediction | Node classification | Community Detection

Automated QSPR model building for property prediction

Developed an automated ML pipeline to build Quantitative Structure-Property Relationship (QSPR) model for drug property prediction, that helped reducing dependency on data scientist for model building and increased capabilities across departments.

RDKit | Morgen fingerprint | Optuna - Bayesian Optimization

Machine Learning driven biomarker identification

Developed a computational workflow for biomarker identification, by training classical ML models using RNA seq data, to identify diagnostic and prognostic signatures, enabling patient stratification for precision medicine applications.

DEG | WCGNA | Applied ML | Feature importance - SHAP | Kaplan Meier

Network analysis - a systems approach for target identification and prioritization

1. High throughput drug combination prioritization using network proximity analysis

2. Network vulnerability analysis for target prioritization

Network proximity | Network attributes | STRING

GWAS - WGS

1. GenomeIndia cohort: Quality control and variant association (SNPs) studies.

2. AllOfUs cohort: Developed a Cromwell (WDL) pipeline to scale up association studies accross 100s of traits using EHRs in the cohort to identify statistically significant biomarkers. Association statistics for the study included dosage sensitivity and gene burden test.

Association statistics | Dosage sensitivity | Burden test

In silico KnockOut/perturbation

Developed a high throughput Boolean model simulation pipeline for in silico gene knockout/perturbation experiments, using RNA seq data to initialize the system states supporting data-driven therapeutics which enhances precision in target prioritization.

Attractors | Trap spaces | Logical gates | KEGG

Structure based druggability Prediction

Developed a structure based druggability prediction pipeline leveraging parallel processing to accelerate searches across a database of known binding pockets, enabling rapid identification of similar sites to assess target protein druggability.

Graph Analytics | Structural bioinformatics | Multiprocessing

Personal Projects

CAMDA challenge

  • Constructed a Temporal Knowledge Graph from diabetes patient records using Neo4j, enabling disease progression analysis (trajectory analysis), complication pathways, and early risk prediction
  • Integrating llama3 with TKG to setup a GraphRAG workflow for various AI driven medical applications
View on GitHub | Read on Medium

Biomedical research assistant

  • Developed a biomedical research assistant that streamlines literature exploration using AI. The tool retrieves and summarizes top PubMed papers(using PubMed API) with a RAG based pipeline powered by LLaMA 3 and ChromaDB, and includes a chatbot for natural language Q&A. Runs entirely locally using Ollama, ensuring privacy and full control.
View on GitHub | Read on Medium

NeurIPS - Polymer Prediction

A Machine Learning Approach to Polymer Property Prediction: Interpretable Models. Predictive models for Tc, Tg, Rg, FFV and density of polymers.

View on GitHub | Read on Medium

Protein Classification Models

  • Built a Deep learning model (using PyTorch) using protein sequence embeddings for protein classification
  • Developing a Graph Attention Networks (GAT) model for protein structure classification. (work in progress)
View on GitHub

IDA-BRCA

Integrative data analysis (IDA) of Breast cancer (BRCA) dataset for predictive model.

  • A multi-output classification model using iTRAQ proteome profiles of 77 cancer samples from TCGA.
  • Classification models using the dataset from METABRIC containing mRNA levels z-score for predicting breast cancer type, tumor stage, and overall survival.
View on GitHub

Natural Language processing

  • Autism Spectrum Disorder (ASD) classification from caregiver-written behavioral text
  • Sentiment analysis: Mental health classification
  • Disaster tweets classification
View on GitHub

Kaggle-projects

  • Chronic kidney disease prediction using electronic health records.
  • Pancreatic cancer prediction using urinary biomarkers from urine samples.
  • Identifed protein biomarkers that can discriminant between different experimental classes of mice with Down syndrome.
  • Built a plant health prediction model using biosensor data, deployed in Streamlit, and connected it with a REST API for real-time prediction.
View on GitHub

My Articles

Fingerprinting Polymers for Interpretable Machine Learning

Fingerprinting Polymers for Interpretable Machine Learning

A comprehensive approach using cheminformatics and XGBoost for QSPR modelling

Read on Medium
CAMDA challenge

CAMDA challenge

Mapping Diabetes Trajectories with Temporal Knowledge Graphs & GraphRAG

Read on Medium
Biomedical Research Assistant

Biomedical Research Assistant

Biomedical Research Assistant built using RAG, LangChain, and LLaMA3

Read on Medium
Temporal Knowledge Graphs

Temporal Knowledge Graphs

Time Aware Intelligence in Healthcare

Read on Medium
Evaluating Lexical and Semantic Representations for Autism Detection from Caregiver Remarks

Evaluating Lexical and Semantic Representations for Autism Detection from Caregiver Remarks

Autism Spectrum Disorder (ASD) classification from caregiver-written behavioral text

Read on Medium