Einführung in MLOps: Hyperparameter-Tuning

AI agents are rapidly transforming the healthcare landscape, ushering in a new era of innovation and efficiency. These intelligent tools, capable of processing vast amounts of medical data and learning from complex patterns, are redefining how care is delivered to patients and managed by providers. By integrating AI agents into healthcare systems, hospitals and clinics can harness advanced decision-making support, enabling clinicians to make faster, more accurate diagnoses and create personalized treatment plans tailored to each patient’s unique needs.

For patients, AI agents promise a more proactive and connected healthcare experience; reminding them of important checkups, monitoring their well-being in real time, and prompting early interventions before problems escalate. Healthcare providers, on the other hand, benefit from reduced administrative burdens, enhanced data-driven insights, and greater operational efficiency across their practices. From streamlining scheduling and claims processing to flagging potential medical errors and suggesting next steps, AI agents are poised to revolutionize the delivery and management of healthcare. As the adoption of AI becomes more widespread, its intelligent integration within electronic health records and healthcare workflows holds the potential to dramatically improve outcomes, patient satisfaction, and the overall quality of care.

What are Hyperparameters in Machine Learning?

Hyperparameters are parameters that control the learning process. In contrast to other parameters, e.g., model weights, hyperparameters are not learned during the training process. Instead, you set them before training an ML model.

An example of a hyperparameter is the learning rate for training a neural network. Learning rate determines the step size at which the optimizer updates the model weights during training. A larger learning rate converges faster, but it can also cause the model to overshoot the optimal solution. A lower learning rate may take longer to converge, but it can help the model to find a better solution.

What is Hyperparameter Optimization in Machine Learning?

Hyperparameter optimization or hyperparameter tuning is the process of finding the best hyperparameters for an ML model. This is done by evaluating different sets of hyperparameter values from a specified search space to identify the best combination.

In the example of the learning rate, hyperparameter optimization aims to find a value that reaches the best solution in a given time frame (essentially finding the best trade-off between advantages and disadvantages of smaller and higher learning rate values).

Optimizing hyperparameters is essential because it can significantly impact a model’s performance. Different hyperparameter values result in different model performances.

How Do You Optimize Hyperparameters?

You can optimize hyperparameters manually or automatically. You can manually search for the best set of hyperparameters based on intuition and experience and through trial and error. Or you can use algorithms to automate this task for you. And automation is far more popular.

Before you begin with the hyperparameter tuning process, you need to define the following:

A set of hyperparameters you want to optimize (e.g., learning rate)

A search space for each hyperparameter either as specific values (e.g., 1e-3, 1e-4, and 1e-5) or as a value range (e.g., between 1e-5 and 1e-3 )

A performance metric to optimize (e.g., validation accuracy)

The number of trial runs (depending on the type of hyperparameter optimization, this can be implicit instead of explicit)

Before starting with automated hyperparameter optimization, you need to specify all of the above. But you can adjust the hyperparameter search space and the number of trial runs during manual hyperparameter tuning.

The generic steps for hyperparameter tuning are:

Select a set of hyperparameter values to evaluate

Run an ML experiment for the selected set of hyperparameters and their values, and evaluate and log its performance metric.

Repeat for the specified number of trial runs or until you are happy with the model’s performance

Depending on whether you manually conduct these steps or automate them, we talk about manual or automated hyperparameter optimization.

After this process, you will end up with a list of experiments, including their hyperparameters and performance metrics. An automated hyperparameter optimization algorithm returns the experiment with the best performance metric and the respective hyperparameter values.

During or after this process, you can compare the experiments’ performance metrics and choose the set of hyperparameter values that resulted in the best performance metric.

Methods for Automated Hyperparameter Optimization

The three main algorithms used in automated hyperparameter optimization are

Grid Search

Random Search

Bayesian Optimization

The main difference between the three algorithms is how they select the set of hyperparameter values to test next. But they are also different in how you define the search space (fixed values vs. value ranges) and how you specify the number of runs (implicit vs. explicit).

This section will explore these differences and their advantages and disadvantages.

I will use W&B Sweeps to optimize the hyperparameters epochs and learning_rate in the following. For more details, you can check out my related Kaggle Notebook and W&B project.

Grid Search

Grid search is a hyperparameter tuning technique that evaluates all possible hyperparameter combinations in a specified grid (Cartesian product). It is a brute-force approach recommended only for ML models with few hyperparameters.

Inputs

A set of hyperparameters you want to optimize

A discretized search space for each hyperparameter either as specific values

A performance metric to optimize

(Implicit number of runs: Because the search space is a fixed set of values, you don’t have to specify the number of experiments to run)

(The differences between random search and Bayesian optimization are highlighted in bold above.)

A popular way to implement grid search in Python is to use GridSearchCV from the scikit learn library. Alternatively, as shown below, you can set up a grid search for hyperparameter tuning with W&B:

Steps

Step 1: The grid search algorithm selects a set of hyperparameter values to evaluate by creating a grid (cartesian product) of all possible hyperparameter combinations of the specified hyperparameter values. Then it simply iterates over the grid. This approach is an exhaustive search or brute force approach.

Below, you can see the resulting grid for our example.

Step 2: Run an ML experiment for the selected set of hyperparameters and their values, and evaluate and log its performance metric.

Step 3: Repeat for the specified number of trial runs or until you are happy with the model’s performance

Output

As with all automated hyperparameter optimization algorithms, Grid Search returns the experiment with the best performance metric and the respective hyperparameter values.

Below you can see at which time the hyperparameter optimization algorithm chose which parameters and the resulting performance. You can make the following observations:

The grid search algorithm iterates over the grid of hyperparameter sets as specified.

Since grid search is an uninformed search algorithm, the resulting performance doesn’t show a trend over the runs.

The best val_acc score is 0.9902

Advantages

Simple to implement

Can be parallelized: because the hyperparameter sets can be evaluated independently

Disadvantages

Not suitable for models with many hyperparameters: this is largely because the computational cost grows exponentially with the number of hyperparameters

Uninformed search because knowledge from previous experiments is not leveraged. You may want to run the grid search algorithm several times with a fine-tuned search space to achieve good results.

Unless you have three or fewer hyperparameters to tune, it is generally recommended to avoid grid search.

Random Search

Random search is a hyperparameter tuning technique that randomly samples values from a specified search space. It is more effective than grid search for ML models with many hyperparameters where only a few affect the model’s performance [1].