Rohan Surana

Masters Student in Data Science @ UCSD

I am a Master's student in Data Science at the University of California, San Diego (UCSD), advised by Prof. Julian McAuley, also actively collaborating with Adobe and Netflix. I previously earned my B.S. in Software Engineering, summa cum laude, from San Jose State University.

My research spans preference optimization, information retrieval, and multimodal learning. I develop methods that help LLMs learn from sparse, structured feedback: pairwise preferences, ranked lists, and in-context signals. Recent work includes multi-negative DPO with principled selection strategies, in-context ranking objectives for retrieval and recommendation, and multimodal extensions that reduce hallucinations. I also build practical systems like active dialogue synthesis for low-resource domains and benchmarks for audio-centric recommendation.

What's New

Research Topics

MASS-DPO
MASS-DPO: Multi-Negative Active Sample Selection for Direct Policy Optimization
Rohan Surana*, J. Wu*, X. Li, Y. Shen, C. Wang, T. Yu, P. Ammanabrolu, J. Shang, J. McAuley
Under Review

We cast multi-negative DPO under the Plackett-Luce model as a D-optimal design problem and develop a greedy, theoretically grounded selection strategy for informative negatives. This improves alignment efficiency and performance.

IRPO
In-context Ranking Preference Optimization (IRPO)
J. Wu*, Rohan Surana*, Z. Xie, Y. Shen, Y. Xia, T. Yu, R. Rossi, P. Ammanabrolu, J. McAuley
COLM 2025

We extend Direct Preference Optimization to ranking, allowing LLMs to learn from sparse, in-context listwise feedback and directly optimize differentiable surrogates of ranking metrics. This connects preference optimization with practical IR settings like conversational recommendation and generative retrieval.

MISP-DPO
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization (MISP-DPO)
X. Li, C. Wang, J. Wu, Rohan Surana, T. Yu, J. McAuley, J. Shang
Under Review

We extend multi-negative DPO to the multimodal setting, using a CLIP+SAE-based sampler and importance sampling under a Plackett–Luce objective to select semantically diverse visual negatives. This substantially improves multimodal alignment and reduces hallucinations compared to single-negative multimodal DPO baselines.

WS-GRPO
WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization
G. Mundada*, Rohan Surana*, J. Y. Zhang, X. Li, T. Yu, L. Yao, J. Shang, J. McAuley, J. Wu
Under Review

We address GRPO's dependence on dense step-wise rewards by learning to extract dense preference signals from sparse outcome supervision. WS-GRPO trains a preference model on trajectory-level outcomes, then leverages it to provide step-wise weakly-supervised rewards combined with terminal rewards during group-relative policy optimization. This enables effective reasoning model training without expensive step-by-step annotations.

RLVR Survey
From Verifiable Rewards to Policy Learning: A Survey of Reinforcement Learning from Verifiable Rewards
G. Mundada*, Rohan Surana*, S. Yu, J. Y. Zhang, Z. Huang, Y. Xiong, X. Li, Y. Xia, R. Jain, C. Huang, N. L. Kuang, T. Yu, R. A. Rossi, D. Zhou, L. Yao, J. Shang, J. McAuley, J. Wu
Survey Paper

We provide the first comprehensive survey of Reinforcement Learning from Verifiable Rewards (RLVR), systematizing methods that train language models using verifier feedback. We introduce taxonomies organizing approaches by verification type, reward computation, and policy learning, establishing unified terminology for mathematical reasoning, code generation, and instruction following.

CRS Synthesis
From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System
Rohan Surana*, J. Wu*, Z. Xie*, Y. Xia, H. Steck, D. Liang, N. Kallus, J. McAuley
Preprint

Joint work with Netflix where we design an active data augmentation framework that turns static domain data (reviews, metadata, collaborative signals) into synthetic conversations using black-box LLMs, enabling smaller in-house CRS models to operate in true zero-/low-resource settings.

MusiCRS
MusiCRS: Benchmarking Audio-Centric Conversational Recommendation
Rohan Surana*, A. Namburi*, G. Mundada*, A. Lal*, Z. Novack, J. McAuley, J. Wu
Under Review

We build the first benchmark that ties real conversational queries to actual music tracks, enabling evaluation across audio-only, text-only, and audio+text settings. This exposes how current CRS models over-rely on text and struggle with nuanced audio reasoning.

Traceable MLLM
Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View
Z. Huang, J. Wu, Rohan Surana, R. Jain, T. Yu, R. Addanki, D. Arbour, S. Kim, J. McAuley
COLM 2025

We propose an information-theoretic framework (via a concept bottleneck and mutual-information-style measures) to make MLLMs more traceable: quantifying how much visual information is retained, transformed, or discarded as it flows through the model under different textual instructions.

Image Diff Captioning
Image Difference Captioning via Adversarial Preference Optimization
Z. Huang, J. Wu, Rohan Surana, T. Yu, D. Arbour, R. Sinha, J. McAuley
EMNLP 2025

We formulate image difference captioning as a preference-optimization problem and introduce an adversarial hard-negative retriever plus DPO-style training to better capture fine-grained visual differences. This combines multimodal reasoning with preference-based training.

AMPS
AMPS: Adaptive Modality Preference Steering via Functional Entropy
Z. Huang, X. Li, J. Wu, Rohan Surana, T. Yu, R. Wang, J. McAuley, J. Shang
Under Review

We study modality preference in MLLMs (e.g., over-reliance on text vs. images) and propose an entropy-based diagnostic plus a sample-wise steering mechanism that adjusts steering intensity per input to avoid generation collapse while still shifting modality usage in a controlled way.

Experience

Dell Technologies

Jul 2022 - Sept 2025

June 2025 - Sept 2025

AI Research Intern | Hopkinton, MA

Built production-grade multi-agent LLM systems with 40% latency reduction, scalable RAG pipelines, and autonomous monitoring infrastructure.

Mar 2024 - Aug 2024

Software Engineer II | Santa Clara, CA

Architected TOSCA-based orchestration framework and real-time infrastructure digital twin with graph databases, achieving 30% faster provisioning and 25% response time improvement.

Jul 2022 - Mar 2024

Software Engineer I | Santa Clara, CA

Automated GPU-accelerated ML infrastructure with Kubernetes operators (20% faster deployment), redesigned gRPC services leading a 4-person team, and expanded observability coverage by 30%.

Aug 2021 – May 2022

Developed real-time vehicle detection and tracking system using YOLOv5 and BiLSTM for autonomous transportation, achieving 0.45 RMSE.

Feb 2021 – Mar 2021

Contributed to Apache Tika, enhancing file detection and content analysis capabilities.

May 2020 – Aug 2020

Built data mining modules using Spark, Scala, and GNNs, improving pipeline throughput by 25% and reducing job runtimes by 45% for healthcare and finance clients.

Teaching

Graduate Teaching Assistant
University of California, San Diego
Mar 2025 – Present
  • CSE 258: Web Mining and Recommender Systems (Sep 2025 - Present)
  • CSE 153: Machine Learning for Music (Mar 2025 - Sep 2025)
Teaching Assistant & Tutor
San Jose State University
Jan 2020 – May 2022
  • CS Peer Tutor: SJSU Peer Connections (Jan 2022 - May 2022)
  • CS46B: Introduction to Data Structures (Aug 2021 - May 2022)
  • CS149: Operating Systems (Aug 2021 - Jan 2022)
  • Math Workshop Facilitator: Calculus II (Jan 2020 - Aug 2021)

Education

University of California, San Diego
Sept. 2024 - Mar. 2026 (Expected)

Masters of Science in Data Science

GPA: 3.88/4.00

Advisor: Prof. Julian McAuley

San Jose State University
Aug. 2018 - May 2022

Bachelor of Science in Software Engineering

GPA: 3.87/4.00 (Summa Cum Laude)

Projects

Privacy-Preserving LLM Training with Synthetic Data Sept 2024 – Dec 2024
CrewAI, Unsloth, PyTorch, Llama-2

Designed multi-agent framework for PII-safe data generation and fine-tuned 2 open-source LLMs achieving 99% PII removal with 270-question evaluation benchmark.

Autonomous Transportation Computer Vision System Jan 2022 – May 2022
YOLOv5, TensorFlow, OpenCV

Built real-time vehicle detection and tracking system with YOLOv5 and BiLSTM achieving 0.45 RMSE, optimized for Raspberry Pi deployment.

Technical Skills

Languages
Python Golang Java Scala SQL
ML / Deep Learning
PyTorch TensorFlow scikit-learn Transformers OpenCV NumPy Pandas
LLM / NLP
LangGraph LangChain vLLM Unsloth CrewAI Hugging Face
Infrastructure
Kubernetes Docker FastAPI Spark gRPC Git MLflow Ray AWS GCP
Databases
ChromaDB MySQL PostgreSQL Dgraph Neo4j MongoDB