Fingerprinting Polymers for Interpretable Machine Learning
A comprehensive approach using cheminformatics and XGBoost for QSPR modelling
Read on MediumA Bioinformatics professional with 3+ years of experience in AI/ML model development, biomarker identification, and knowledge graph applications. Skilled in GenAI, LLM, knowledge graph analytics and core bioinformatics solutions to drive data driven insights, enhance decision making, and deliver scalable solutions through cross functional collaboration across diverse domains. Focused on advancing computational research that integrates AI and data-driven science to solve complex problems across biology, healthcare, and beyond.
Gmail | LinkedIN | GitHub | Medium
Bridging research, therapeutic innovation, and technology development through a multidisciplinary capabilities.
Discovering and validating therapeutic targets
Designing potential therapeutic candidates
Evaluating safety and efficacy
Extracting biological insights from complex data
Understanding disease mechanisms
Bridging research to patient care
Ingesting and standardizing biological data
Building robust analysis workflows
Deriving actionable insights
Structuring biomedical data for AI
Building domain specific AI solutions
Implementing solutions in practice
Developed an AI driven pipeline for antibody sequence generation, leveraging deep learning and generative models trained on large scale OAS & SAbDab antibody datasets. Integrated with sequence optimization and structural validation tools to design high affinity, developable antibodies with therapeutic potential.
AntiFold | AbLang | diffAb | BioPhi | LLM fine tuning
Built a KG with open source biological/Real World/Clinical data, harmonized with controlled vocabularies for each entity. Application included drug repurposing, target identification, safety assessment for toxicity and organ wise stratification, reducing months of work to weeks
Neo4j | Link prediction | Node classification | Community Detection
Developed an automated ML pipeline to build Quantitative Structure-Property Relationship (QSPR) model for drug property prediction, that helped reducing dependency on data scientist for model building and increased capabilities across departments.
RDKit | Morgan fingerprint | Optuna - Bayesian Optimization
Developed a computational pipeline for automated biomarker identification using TCGA database, by training classical ML models using RNA seq data, to identify diagnostic and prognostic signatures in BRCA, LIHC, LUAD and PRAD. Enabling faster patient stratification for precision medicine applications.
DEG analysis | WCGNA | Applied ML | SHAP | Kaplan Meier
1. High throughput drug combination (synergy) assessment using network proximity analysis
2. Network vulnerability analysis for target prioritization
Network proximity | Network attributes | STRING
1. GenomeIndia cohort: Performed quality control and variant association analysis followed by cross population studies.
2. AllOfUs cohort: Developed a Cromwell (WDL) pipeline to scale up association studies across 3000+ traits using EHRs to identify statistically significant biomarkers. Association statistics for the study included dosage sensitivity and gene burden test.
Association statistics | Dosage sensitivity | Burden test
Developed a high throughput Boolean model simulation pipeline for in silico gene knockout/perturbation experiments, using RNA seq data to initialize the system states supporting data-driven therapeutics which enhances precision in target prioritization.
Attractors | Trap spaces | Logical gates | KEGG
Developed a structure based druggability prediction pipeline leveraging parallel processing to accelerate searches across a database of known binding pockets, enabling rapid identification of similar sites to assess target protein druggability.
Graph Analytics | Structural bioinformatics | Multiprocessing
Developed an automated compound identification pipeline leveraging spectral data, matched against a processed in-house reference compound database.
Spectral data processing | Automation
A Machine Learning Approach to Polymer Property Prediction: Interpretable Models. Predictive models for Tc, Tg, Rg, FFV and density of polymers.
View on GitHub | Read on MediumBuild a QSPR model to predict DILI from SMILE, generating descriptors and fingerprints using RDKit and building the model and elucidating mechanistic insights based on SHAP score.
View on GitHubIntegrative data analysis (IDA) of Breast cancer (BRCA) dataset for predictive model.
A comprehensive approach using cheminformatics and XGBoost for QSPR modelling
Read on Medium
Mapping Diabetes Trajectories with Temporal Knowledge Graphs & GraphRAG
Read on Medium
Biomedical Research Assistant built using RAG, LangChain, and LLaMA3
Read on Medium
Autism Spectrum Disorder (ASD) classification from caregiver-written behavioral text
Read on Medium