Distributed Hyperparameter Tuning on Kubernetes with Ray Tune (1)

Renee LIN
4 min readApr 7, 2024

This work consists of three posts

  1. How to use Ray Tune
  2. How to use Ray Tune with Weights&Bias
  3. How to use Kubernetes on Azure
  4. How to train the model on the Kubernetes

My motivation

Tuning hyperparameter is a nightmare for me. I love study, but that tuning process killed all my interests. But I have to finish my work. I am working on the third project of my study, a route choice problem with inverse reinforcement learning. The model is not stable and difficult to converge. I tried several combination and assume some hyperparameters might be influential. Then set a random search for the optimal combination. But I just couldn’t get it.

I thought that with several virtual machines running, the training speed could be faster. I could fire up some cloud compute engines and transfer my code over, setup environment to run. But thinking of the tedious work of firing up and setup the nodes, I hesitated.

Then I thought of Kubernetes. I was curious about it around 4 years ago in a cloud computing course, but never had the chance to try(lazy). This must be the way those engineers training their models in real life, since the model is huge, and we heard about they use thousands of GPUs.

Then I search “Kubernetes Hyperparameters”, I got Ray Tune.

Ray Tune

--

--

Renee LIN

Passionate about web dev and data analysis. Huge FFXIV fan. Interested in healthcare data now.