About Me

I am an AI/ML and Bioinformatics professional with 3 years of experience, specialized in building scalable and robust AI/ML models, bioinformatics pipelines, and data analysis tools, focusing on advancing drug discovery, precision medicine, target identification, clinical development and therapeutic research through cross-functional collaboration and data-driven solutions. Passionate about solving complex biomedical problems at the intersection of AI, omics, and healthcare. Gmail | LinkedIN | GitHub

Cross-Cutting Capabilities

AI/ML Model Development
Pipeline Automation
Data Integration
Knowledge Engineering
Predictive Analytics

Technical Skills

Machine Learning Deep Learning Graph Neural Networks Knowledge Graphs RAG LLM GEN-AI GCP AWS WGS RNA Seq Single Cell Seq GWAS Systems Modeling Nextflow Docker R Shiny RESTful API

Life Science Expertise

Multi disciplinary capabilities across research, therapeutics, and technology development

Therapeutic Development

1

Target Identification

Discovering and validating therapeutic targets

GWAS Knowledge Graphs Systems Modeling scRNA-Seq RNA-Seq
2

Lead Discovery

Designing potential therapeutic candidates

Antibody Design QSPR Models
3

Preclinical

Evaluating safety and efficacy

In Silico KO PK/PD Modeling

Research & Precision Medicine

1

Multi-Omics Analysis

Extracting biological insights from complex data

WGS scRNA-Seq RNA-Seq Proteomics
2

Disease Modeling

Understanding disease mechanisms

Knowledge Graphs Boolean Networks Simulation GWAS Biomarker Identification
3

Clinical Translation

Bridging research to patient care

RAG Systems Temporal KGs Real World Evidence Decision Support

Bioinformatics Pipeline Development

1

Data Acquisition

Ingesting and standardizing biological data

WGS/RNA-Seq/scRNA-Seq API Integration Data Wrangling
2

Analysis Pipeline

Building robust analysis workflows

Nextflow Docker GATK Scanpy
3

Knowledge Extraction

Deriving actionable insights

ML Models Biomarker Identification Visualization

Translational AI/ML Applications

1

Data Preparation

Structuring biomedical data for AI

Feature Engineering Embeddings Data Augmentation
2

Model Development

Building domain specific AI solutions

GNN LLM Bayesian Methods
3

Deployment

Implementing solutions in practice

RAG LangChain REST API AWS

Work Experience

AI driven antibody sequence generation

Developed an AI driven pipeline for antibody sequence generation, leveraging deep learning and generative models trained on large scale OAS & SAbDab antibody datasets. Integrated with sequence optimization and structural validation tools to design high affinity, developable antibodies with therapeutic potential.

AntiFold | AbLang | diffAb | BioPhi

Knowledge Graph

Built a KG with open source biological/Real World/Clinical data, harmonized with controlled vocabularies for each entity. Application included drug repurposing, target identification, safety assessment for toxicity and organ wise stratification, reducing months of work to weeks

Neo4j | Link prediction | Node classification | Community Detection

Automated QSPR model building for property prediction

Developed an automated ML pipeline to build Quantitative Structure-Property Relationship (QSPR) model for drug property prediction, that helped reducing dependency on data scientist for model building and increased capabilities across departments.

RDKit | Morgen fingerprint | Optuna - Bayesian Optimization

Machine Learning driven biomarker identification

Developed a computational workflow for biomarker identification, by training ensemble ML models using omics data (RNA seq, scRNA seq), to identify diagnostic and prognostic signatures, enabling patient stratification for precision medicine applications.

XGBoost | Feature importance | ROC | Pathway enrichment

GWAS - WGS

1. GenomeIndia cohort: Quality control and variant association (SNPs) studies.

2. AllOfUs cohort: Developed a workflow for loss of function association studies to identify biomarkers with the help of recorded EHR data.

Association statistics | Dosage sensitivity | Burden test

In silico KnockOut/perturbation

Developed a high throughput Boolean model simulation pipeline for in silico gene knockout/perturbation experiments, using RNA seq data to initialize the system states supporting data-driven therapeutics which enhances precision in target prioritization.

Attractors | Trap spaces | Logical gates | KEGG

Structure based druggability Prediction

Developed a structure based druggability prediction pipeline leveraging parallel processing to accelerate searches across a database of known binding pockets, enabling rapid identification of similar sites to assess target protein druggability.

Graph Analytics | Structural bioinformatics | Multiprocessing

Personal Projects

CAMDA challenge

  • Constructed a Temporal Knowledge Graph from diabetes patient records using Neo4j, enabling disease progression analysis (trajectory analysis), complication pathways, and early risk prediction
  • Integrating llama3 with TKG to setup a RAG workflow for various AI driven medical applications (work in progress)
View on GitHub


IDA-BRCA

Integrative data analysis (IDA) of Breast cancer (BRCA) dataset for predictive model.

  • A multi-output classification model using iTRAQ proteome profiles of 77 cancer samples from TCGA.
  • Classification models using the dataset from METABRIC containing mRNA levels z-score for predicting breast cancer type, tumor stage, and overall survival.
View on GitHub


Kaggle-projects

  • Chronic kidney disease prediction using electronic health records.
  • Pancreatic cancer prediction using urinary biomarkers from urine samples.
  • Identifed protein biomarkers that can discriminant between different experimental classes of mice with Down syndrome.
  • Built a plant health prediction model using biosensor data, deployed in Streamlit, and connected it with a REST API for real-time prediction.
View on GitHub


Biomedical research assistant

  • Developed a biomedical research assistant that streamlines literature exploration using AI. The tool retrieves and summarizes top PubMed papers(using PubMed API) with a RAG based pipeline powered by LLaMA 3 and ChromaDB, and includes a chatbot for natural language Q&A. Runs entirely locally using Ollama, ensuring privacy and full control.
View on GitHub


NGS Workflows

  • Variant calling using GATK
  • RNA seq analysis
  • Single cell RNA seq analysis using scanpy and scvelo for clustering and RNA velocity analysis respectively
View on GitHub


TOX24 Challenge

Predictive models for drug toxicity using data from tox24 challange

  • QSPR based model
  • Fingerprint based deep learning model
  • Graph Attention Network model
View on GitHub


Protein Classification Models

  • Built a Deep learning model (using PyTorch) using protein sequence embeddings for protein classification
  • Developing a Graph Attention Networks (GAT) model for protein structure classification. (work in progress)
View on GitHub


Image Classification

MedMNIST datasets was used for building classification models(CNN) for different modalities

  • Predicting patients with breast cancer (breast ultrasound). Dataset: BreastMNIST
  • Predicting patients with pneumonia (Chest X-Ray). Dataset: PneumoniaMNIST
  • Predicting patients with common blinding retinal diseases (optical coherence tomography). Dataset: OctMNIST
View on GitHub


Natural Language processing

  • Sentiment analysis: Mental health classification
View on GitHub


Target Safety Agent

  • Target safety assessment is a crucial step in drug discovery, ensuring the prioritization of viable therapeutic targets while minimizing risks. This assessment aids in refining target selection and exploring alternative therapeutic strategies to mitigate safety risks.
  • Leveraging data from open databases and Llama 3 to generate a safety report for the target protein.

Coming soon !

My Articles

Biomedical Research Assistant

Biomedical Research Assistant

Biomedical Research Assistant built using RAG, LangChain, and LLaMA3

Read on Medium
Temporal Knowledge Graphs

Temporal Knowledge Graphs

Time Aware Intelligence in Healthcare

Read on Medium