Learning an Objective Function through Interaction with a Human Chemist#

An example notebook using Gaussian process preference learning [1] to model the latent utility function of a human medicinal chemist from pairwise preference data. The human medicinal chemist is presented with pairwise observations of molecules \((m_1, m_2)\) and asked to indicate their preference \(r(m_1, m_2) \in \{0, 1\}\). The preference Gaussian process then models the latent utility function \(g(m)\) via a fit to the pairwise preferences.

In this tutorial we use the pairwise GP with Laplace approximation introduced in [1] as our preference GP. It is also possible to use a skew GP model as in [2].

[1]:
"""Library imports"""

import warnings
warnings.filterwarnings("ignore") # Turn off Graphein warnings

from itertools import combinations
from botorch import fit_gpytorch_model
from botorch.models.pairwise_gp import PairwiseGP, PairwiseLaplaceMarginalLogLikelihood
import gpytorch
import numpy as np
from sklearn.model_selection import train_test_split
import torch
from scipy.stats import kendalltau
from matplotlib import pyplot as plt
from gauche.dataloader import MolPropLoader
from gauche.dataloader.data_utils import transform_data
from gauche.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel

%matplotlib inline

We use the photoswitch dataset for purposes of illustration.

[2]:
"""Load the Photoswitch dataset"""

loader = MolPropLoader()
loader.load_benchmark("Photoswitch")

# Featurise the molecules.

# We use the fragprints representations (a concatenation of Morgan fingerprints and RDKit fragment features)

loader.featurize('ecfp_fragprints')
X_fragprints = loader.features
y = loader.labels
Found 13 invalid labels [nan nan nan nan nan nan nan nan nan nan nan nan nan] at indices [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 158]
To turn validation off, use dataloader.read_csv(..., validate=False).
2023.09.2

We implement a utility function for generating ground truth pairwise comparison data i.e. preferences voiced by a simulated medicinal chemist.

[3]:
def generate_comparisons(y, n_comp, noise=0.0, replace=False):
    """Function simulating the preferences of a human chemist.

    Args:
        y: 1D NumPy array of training data labels
        n_comp: Int indicating the number of pairwise comparisons to generate
        noise: Float indicating the level of noise in the chemist's decisions
        replace: Bool indicating whether to generate comparisons with replacement

    Returns:
        comp_pairs: A NumPy array of comparison pairs of the form (m1, m2)

    """
    # generate all possible pairs of elements in y
    all_pairs = np.array(list(combinations(range(y.shape[0]), 2)))
    # randomly select n_comp pairs from all_pairs
    comp_pairs = all_pairs[
        np.random.choice(range(len(all_pairs)), n_comp, replace=replace)
    ]
    # add gaussian noise to the latent y values
    c0 = y[comp_pairs[:, 0]] + np.random.standard_normal(len(comp_pairs)) * noise
    c1 = y[comp_pairs[:, 1]] + np.random.standard_normal(len(comp_pairs)) * noise
    reverse_comp = (c0 < c1)
    comp_pairs[reverse_comp, :] = np.flip(comp_pairs[reverse_comp, :], 1)
    comp_pairs = torch.tensor(comp_pairs).long()

    return comp_pairs

As a performance metric we use the Kendall-Tau rank correlation. Importantly, when a GP is learned on pairwise comparisons, the absolute scale of the predictions will be unavailable but the rank order will be available. As such, we can use a rank-based performance metric.

[4]:
"""Utility function for computing the Kendall-Tau rank correlation between the predictions and the ground truth."""

def eval_kt_cor(model, test_X, test_y):
    """Kendall-Tau rank correlation
    Args:
        model: Instance of pairwise GP
        test_X: n x d Tensor of test input locations
        test_y: n x 1 Tensor of test labels

    Returns:
        The Kendall-Tau rank correlation, a number between 0 and 1.
    """
    pred_y = model.posterior(test_X).mean.squeeze().detach().numpy()
    return kendalltau(pred_y, test_y).correlation
[5]:
"""Utility function for model evaluation"""

# Experiment parameters
n_trials = 20
test_set_size = 0.2 # train/test split
m = 500
noise = 0 # simulate a noiseless oracle

def evaluate_model(X, y):
    """Helper function for model evaluation
    Args:
        X: n x d NumPy array of the full set of inputs. Typically some molecular representation such as fragprints, framgents or fingerprints
        y: n x d NumPy array of the full set of output labels
    Returns:
        Mean KT correlation on the train set, mean KT correlation on the test set.
    """

    # initialise performance metric lists
    kt_list_train = []
    kt_list_test = []

    print('\nBeginning training loop...')

    for i in range(0, n_trials):

        print(f'Starting trial {i}')

        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_set_size, random_state=i)

        train_comp = torch.tensor(generate_comparisons(y_train.squeeze(-1), m, noise=noise))

        # Convert numpy arrays to PyTorch tensors and flatten the label vectors
        X_train = torch.tensor(X_train.astype(np.float64))
        X_test = torch.tensor(X_test.astype(np.float64))
        y_train = torch.tensor(y_train).flatten()
        y_test = torch.tensor(y_test).flatten()

        # initialise pairwise GP model
        model = PairwiseGP(X_train, train_comp, covar_module=gpytorch.kernels.ScaleKernel(TanimotoKernel()))
        # Find optimal model hyperparameters
        mll = PairwiseLaplaceMarginalLogLikelihood(model.likelihood, model)

        # Use the BoTorch utility for fitting GPs in order to use the LBFGS-B optimiser (recommended)
        fit_gpytorch_model(mll)

        # Get into evaluation (predictive posterior) mode
        model.eval()

        # To compute metrics and detach gradients. Must unsqueeze dimension
        y_test = y_test.detach().unsqueeze(dim=1)

        # Compute Kendall-Tau rank correlation
        train_score = eval_kt_cor(model, X_train, y_train)
        test_score = eval_kt_cor(model, X_test, y_test)

        kt_list_train.append(train_score)
        kt_list_test.append(test_score)


    kt_list_train = np.array(kt_list_train)
    kt_list_test = np.array(kt_list_test)

    print("\nmean train KT: {:.4f} +- {:.4f}".format(np.mean(kt_list_train), np.std(kt_list_train)/np.sqrt(len(kt_list_train))))
    print("\nmean test KT: {:.4f} +- {:.4f}".format(np.mean(kt_list_test), np.std(kt_list_test)/np.sqrt(len(kt_list_test))))
[6]:
evaluate_model(X_fragprints, y)

Beginning training loop...
Starting trial 0
Starting trial 1
Starting trial 2
Starting trial 3
Starting trial 4
Starting trial 5
Starting trial 6
Starting trial 7
Starting trial 8
Starting trial 9
Starting trial 10
Starting trial 11
Starting trial 12
Starting trial 13
Starting trial 14
Starting trial 15
Starting trial 16
Starting trial 17
Starting trial 18
Starting trial 19

mean train KT: 0.7969 +- 0.0023

mean test KT: 0.7201 +- 0.0067

References#

[1] Chu, W. and Ghahramani, Z., Preference learning with Gaussian processes. ICML, 2005.

[2] Benavoli, A., Azzimonti, D. and Piga, D., 2021, July. Preferential Bayesian optimisation with Skew Gaussian Processes. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 1842-1850).