Brian Su

ML Engineer at Roblox w/ 9+ years of experience in recommender systems. I build ML models that help 100M+ daily users find games they want to play. Previously, I built similar models for e-commerce recommendations at Wish.

Side projects:

onlyesim.com (because international data plans suck)

Current goals:

Health: Above Average VO2 max, sub-2 hr half-marathon, complete a half-ironman

Articles

Foundations

Series9 parts164 min total

The Multi-Task Fusion Problem: Why Finding the Right Weights is Hard

Part 1 of 9: Understanding how recommendation systems combine multiple prediction scores and why optimizing the fusion weights is challenging.

Series5 parts

Multi-Objective Ranking

The first article in a practitioner series on combining prediction scores into a single ranking. Covers the universal pattern, taxonomy of approaches from static weights to RL policies, and when to use each.

Series4 parts

RL for Recommender Systems

Why reinforcement learning matters for recommendations, a refresher on MDP fundamentals, and understanding the spectrum from bandits to full RL.

February 5, 2026

AUROC and AUPRC: What They Actually Tell You About Your Ranking Model

Five ways to understand AUROC, why AUPRC matters more for imbalanced recsys tasks, and practical guidance for interpreting these metrics in production.

Read article

February 5, 202625 min read

Biases in Large-Scale Recommender Systems

A comprehensive guide to understanding and mitigating the six major biases that affect recommendation systems: selection, position, exposure, popularity, conformity, and feedback loops.

Read article

Optimization

February 6, 202650 min read

LLM Serving from Scratch: The Systems Behind Fast Inference

How LLMs are efficiently served in production — from KV cache management and PagedAttention to speculative decoding and prefill-decode disaggregation.

Read article

February 5, 2026

Attention Optimization: From Memory Walls to Flash Attention

Why transformer inference is memory-bound, how KV caching eliminates redundant computation, and how FlashAttention achieves 75% GPU utilization through IO-aware tiling.

Read article