AI Systems Engineer & Researcher

Aditya
Raj_

I build high-scale AI systems — from prosody-disentangled deepfake detectors to multi-modal video intelligence engines. Published researcher. Startup founder. Available for complex AI engineering contracts.

View Projects Hire Me
6+
Published Papers
45K+
Video Minutes Processed
1K+
Users Served
// 01 — Profile

Engineer who ships
at research depth.

I'm an AI systems engineer and researcher with 3+ years building production ML systems at the intersection of signal processing, multimodal AI, and high-throughput backends. My work spans published speech/audio ML research, two AI product startups, and enterprise-grade infra at Arcesium.

My stack goes deep: I don't just call APIs — I design adversarial training frameworks, build streaming inference pipelines, and architect the backends that make AI products actually scale. I've processed 45K+ minutes of video, deployed on Kubernetes with Kafka/SQS, and shipped models to production edge devices.

I'm now targeting ambitious freelance AI projects — if you're building something where the AI part is the hard part, let's talk.

PyTorch / State Space Models
Signal Processing & Audio ML
LLM Agents & RAG Systems
Streaming Inference / Edge AI
Multimodal Fusion
FastAPI / Docker / K8s
Kafka / SQS / Redis
Wav2Vec2 / Transformers
TCN / ResNet / Mamba SSM
AWS EC2 / S3 / KEDA
// 02 — Selected Work

Systems built,
not prototyped.

Active Build
02 — Longevity AI Platform

Tessera — AI-Driven Longevity & Biomarker Platform

A deep-tech longevity platform that builds personalised health-extension programs from a 56-marker biomarker panel. The AI engine ingests blood work (metabolic, hormonal, inflammatory, haematological, cardiac markers) to calculate a client's PhenoAge biological age using the Levine formula, then outputs a four-lever protocol — nutrition, training, recovery, and supplementation — calibrated to their specific aging trajectory. Integrates an AI + doctor hybrid workflow: the AI handles intake (127 structured questions), biomarker interpretation, and protocol design; a licensed physician reviews and approves. Tiered programs (Foundations / Performance / Continuum) with 90-day tracking cycles and re-assessment loops.
PhenoAge Algorithm Biomarker Intelligence 56-Marker Panel AI Protocol Engine LLM Interpretation Doctor-AI Workflow Longitudinal Tracking Hyderabad Launch
Platform Rulebook
03 — Audio AI

Prosody-Content Disentangled Audio Deepfake Detector

State-of-the-art audio deepfake detection architecture targeting ICASSP/INTERSPEECH-level results. The core insight: existing detectors overfit to per-generator acoustic artifacts that change with every new spoofing model. This system trains a lightweight TCN/ResNet prosody encoder against a frozen Wav2Vec2 teacher via adversarial disentanglement — a Gradient Reversal Layer actively punishes the model for learning speech content, forcing generalization through prosodic cues alone. Trained on 119+ spoofing models (MLAAD) interleaved 50/50 with LibriSpeech. At inference, the heavy Wav2Vec2 is discarded — only the compact, edge-deployable encoder remains.
Wav2Vec2 TCN/ResNet Adversarial Training GRL MLAAD Dataset EER Optimization
GitHub Repository
04 — AI Video Engine

VidSimplify — Manim AI Animation Engine

15K+
Minutes Generated
1K+
Users
4
Enterprise Clients
Full-scale AI video production engine that converts natural language prompts into polished Manim animations. Built on a Reflexion-style multi-step LLM reasoning loop: scene decomposition → code synthesis → self-critique → validation → render. Uses a cascade of models (DeepSeek for code generation, smaller specialist models for scene planning and error correction). Async GPU-backed rendering pipeline with job resumption, rate-limiting, and intelligent caching. Real-time editing capability on generated animations. Scaled to 4 enterprise clients and 15,000+ processed minutes.
Reflexion Agent DeepSeek Manim LLM Orchestration Async GPU Pipeline Production Scale
VidSimplify.com GitHub Repository
05 — High-Scale Video AI

AI Viral Clip Intelligence Engine

30K+
Minutes Processed
Production video intelligence backend that ingests hour-long videos and surfaces the highest-engagement clips using ML virality scoring. The pipeline: audio-visual feature extraction → engagement signal modeling → clip boundary detection → automated subtitle generation → multilingual dubbing with voice synthesis. Transcription engine handles speaker diarization and timestamp alignment. Built for scale — async job queuing, distributed workers, and persistent state management across long-running GPU tasks. Served 30,000+ minutes of processed content in production.
Virality Scoring Whisper / ASR Multilingual TTS Video Segmentation Distributed Workers Async GPU
GitHub Repository
06 — AI Identity Synthesis

VidSimplify Cloner — AI Video Personality Engine

End-to-end AI video transformation platform enabling complete personality cloning, lip-sync dubbing, and multi-language localization. The system fuses three synchronized AI tracks: voice cloning (zero-shot speaker embedding extraction → neural vocoder synthesis), facial replication (identity-preserving face re-enactment with landmark-driven motion transfer), and lip synchronization (phoneme-to-viseme mapping with video-grade temporal alignment). Designed for professional content localization workflows — a single source video can be dubbed and visually cloned into any target language with no manual intervention.
Voice Cloning Lip Sync Face Re-enactment Zero-Shot TTS Global Dubbing Video Synthesis
GitHub Repository
07 — EdTech AI Platform

Knewbit Max — Adaptive AI Learning Platform

Next-generation personalized learning platform powered by Google Gemini LearnLM. Implements Socratic tutoring methodology — the AI guides through questions rather than direct answers, with dynamic cognitive load management. Multi-modal learning stack: multilingual video dubbing pipeline for course content, auto-generated adaptive flashcards and quizzes from course material, skill-graph-aware course recommendation engine. The recommendation system infers learning trajectories from enrollment history to serve personalized learning paths. Built with H.264-optimized async video processing, YouTube URL ingestion, and full progress analytics.
Gemini LearnLM Socratic AI Multilingual Dubbing Recommendation Engine Adaptive Learning EdTech
GitHub Repository
08 — Medical AI

Handwritten Prescription OCR & Clinical AI

Production-grade OCR pipeline targeting the notoriously difficult domain of handwritten Indian doctor prescriptions. Systematic multi-model benchmarking across MiniCPM-V, Gemini Vision, and LLaMA vision variants to identify optimal accuracy-latency tradeoffs. The pipeline extracts structured JSON (drug name, dosage, frequency, route) from low-quality scans and photographs. Extended into a full clinical intelligence layer: integrated HuaTuo-GPT for radiology report interpretation, lab value analysis, and diagnostic support. Evolved into Docmate — a full multimodal medical chatbot handling prescriptions, lab reports, and imaging.
MiniCPM-V Gemini Vision LLaMA OCR HuaTuo-GPT Clinical NLP Medical AI
Docmate — Vimeo Demo GitHub Repository
// 03 — Research

Published work in
speech & signal ML.

2025
ASVSpoof 2021: Detecting Spoofed Utterances Through Hybrid Features
APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 2
2024
Audio Deepfakes: Feature Extraction and Model Evaluation for Detection
INCET 2024 — 5th International Conference for Emerging Technology
2024
Secure Federated Learning for Gate-Level IP Hardware Trojan Detection Using Homomorphic Encryption
CCIS 2024 — International Conference on Communication, Control, and Intelligent Systems
2022
Vocal Biomarker Based COVID-19 Detection Using Convolutional and Deep Neural Networks
IEEE UPCON 2022
2022
Automatic Speaker Verification Spoof Detection Using Gaussian Mixture Model
IEEE UPCON 2022
2022
Vocal Biomarker Based COVID-19 Detection Using DNN and Transfer Learning ResNet50
IEEE UPCON 2022
// 04 — Experience

Where the systems
went live.

Senior Software Engineer
Arcesium, Hyderabad
  • Designed and scaled compliance infra systems, reducing query latency 10x via backend re-architecture and data structure optimization.
  • Led AI adoption across compliance products — built reusable infra patterns now deployed across multiple product lines.
  • Built config-driven form infrastructure, AOP-based auth, and audit logging tools used across all compliance domains.
  • Resolved critical production incidents across Kubernetes, PostgreSQL, and SSL layers.
Jul 2025 – Present
Founder — Kalman Labs / VidSimplify
Hyderabad, India
  • Built and shipped VidSimplify.com — AI animation engine processing 15,000+ minutes for 1,000+ users and 4 enterprise clients.
  • Architected Docmate: multimodal medical AI chatbot for clinical summarization, prescription OCR, and imaging analysis.
  • Built cell-state simulation engine using Mamba-style SSMs + neural operators for drug perturbation trajectory prediction.
  • Designed and delivered AI Legal Intelligence Platform for US law firms — dual-product system covering document comparison and multi-agent legal assistant workflows.
  • Currently building Tessera — a deep-tech AI longevity platform using biomarker intelligence to personalise biological age reversal protocols.
Nov 2024 – Oct 2025
Software Engineer
Arcesium, Hyderabad
  • Owned development of major UI + backend features serving 100K+ records at scale.
  • Deployed tooling integrated with Kafka, SQS, KEDA, EC2, and role-based access systems.
Jul 2023 – Jun 2025
Deep Learning Researcher
Pucho Digital Health Inc., Remote
  • Researched Private AI in healthcare: Federated Learning + Homomorphic Encryption. Built signal-image preprocessing pipeline and deployed FastAPI backend on AWS EC2.
Jan – May 2022
// 05 — Work Together

Got a hard
AI problem?

I take on freelance AI engineering projects where the technical complexity is the point.
Audio/video AI, ML systems, multimodal pipelines, agent architectures — let's build it right.

See My Work adityaraj20008@gmail.com