Learning an Objective Function through Interaction with a Human Chemist#
An example notebook using Gaussian process preference learning [1] to model the latent utility function of a human medicinal chemist from pairwise preference data. The human medicinal chemist is presented with pairwise observations of molecules \((m_1, m_2)\) and asked to indicate their preference \(r(m_1, m_2) \in \{0, 1\}\). The preference Gaussian process then models the latent utility function \(g(m)\) via a fit to the pairwise preferences.
In this tutorial we use the pairwise GP with Laplace approximation introduced in [1] as our preference GP. It is also possible to use a skew GP model as in [2].
[1]:
"""Library imports"""
import warnings
warnings.filterwarnings("ignore") # Turn off Graphein warnings
from itertools import combinations
from botorch import fit_gpytorch_model
from botorch.models.pairwise_gp import PairwiseGP, PairwiseLaplaceMarginalLogLikelihood
import gpytorch
import numpy as np
from sklearn.model_selection import train_test_split
import torch
from scipy.stats import kendalltau
from matplotlib import pyplot as plt
from gauche.dataloader import MolPropLoader
from gauche.dataloader.data_utils import transform_data
from gauche.kernels.fingerprint_kernels.tanimoto_kernel import TanimotoKernel
%matplotlib inline
We use the photoswitch dataset for purposes of illustration.
[2]:
"""Load the Photoswitch dataset"""
loader = MolPropLoader()
loader.load_benchmark("Photoswitch")
# Featurise the molecules.
# We use the fragprints representations (a concatenation of Morgan fingerprints and RDKit fragment features)
loader.featurize('ecfp_fragprints')
X_fragprints = loader.features
y = loader.labels
Found 13 invalid labels [nan nan nan nan nan nan nan nan nan nan nan nan nan] at indices [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 158]
To turn validation off, use dataloader.read_csv(..., validate=False).
2023.09.2
We implement a utility function for generating ground truth pairwise comparison data i.e. preferences voiced by a simulated medicinal chemist.
[3]:
def generate_comparisons(y, n_comp, noise=0.0, replace=False):
"""Function simulating the preferences of a human chemist.
Args:
y: 1D NumPy array of training data labels
n_comp: Int indicating the number of pairwise comparisons to generate
noise: Float indicating the level of noise in the chemist's decisions
replace: Bool indicating whether to generate comparisons with replacement
Returns:
comp_pairs: A NumPy array of comparison pairs of the form (m1, m2)
"""
# generate all possible pairs of elements in y
all_pairs = np.array(list(combinations(range(y.shape[0]), 2)))
# randomly select n_comp pairs from all_pairs
comp_pairs = all_pairs[
np.random.choice(range(len(all_pairs)), n_comp, replace=replace)
]
# add gaussian noise to the latent y values
c0 = y[comp_pairs[:, 0]] + np.random.standard_normal(len(comp_pairs)) * noise
c1 = y[comp_pairs[:, 1]] + np.random.standard_normal(len(comp_pairs)) * noise
reverse_comp = (c0 < c1)
comp_pairs[reverse_comp, :] = np.flip(comp_pairs[reverse_comp, :], 1)
comp_pairs = torch.tensor(comp_pairs).long()
return comp_pairs
As a performance metric we use the Kendall-Tau rank correlation. Importantly, when a GP is learned on pairwise comparisons, the absolute scale of the predictions will be unavailable but the rank order will be available. As such, we can use a rank-based performance metric.
[4]:
"""Utility function for computing the Kendall-Tau rank correlation between the predictions and the ground truth."""
def eval_kt_cor(model, test_X, test_y):
"""Kendall-Tau rank correlation
Args:
model: Instance of pairwise GP
test_X: n x d Tensor of test input locations
test_y: n x 1 Tensor of test labels
Returns:
The Kendall-Tau rank correlation, a number between 0 and 1.
"""
pred_y = model.posterior(test_X).mean.squeeze().detach().numpy()
return kendalltau(pred_y, test_y).correlation
[5]:
"""Utility function for model evaluation"""
# Experiment parameters
n_trials = 20
test_set_size = 0.2 # train/test split
m = 500
noise = 0 # simulate a noiseless oracle
def evaluate_model(X, y):
"""Helper function for model evaluation
Args:
X: n x d NumPy array of the full set of inputs. Typically some molecular representation such as fragprints, framgents or fingerprints
y: n x d NumPy array of the full set of output labels
Returns:
Mean KT correlation on the train set, mean KT correlation on the test set.
"""
# initialise performance metric lists
kt_list_train = []
kt_list_test = []
print('\nBeginning training loop...')
for i in range(0, n_trials):
print(f'Starting trial {i}')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_set_size, random_state=i)
train_comp = torch.tensor(generate_comparisons(y_train.squeeze(-1), m, noise=noise))
# Convert numpy arrays to PyTorch tensors and flatten the label vectors
X_train = torch.tensor(X_train.astype(np.float64))
X_test = torch.tensor(X_test.astype(np.float64))
y_train = torch.tensor(y_train).flatten()
y_test = torch.tensor(y_test).flatten()
# initialise pairwise GP model
model = PairwiseGP(X_train, train_comp, covar_module=gpytorch.kernels.ScaleKernel(TanimotoKernel()))
# Find optimal model hyperparameters
mll = PairwiseLaplaceMarginalLogLikelihood(model.likelihood, model)
# Use the BoTorch utility for fitting GPs in order to use the LBFGS-B optimiser (recommended)
fit_gpytorch_model(mll)
# Get into evaluation (predictive posterior) mode
model.eval()
# To compute metrics and detach gradients. Must unsqueeze dimension
y_test = y_test.detach().unsqueeze(dim=1)
# Compute Kendall-Tau rank correlation
train_score = eval_kt_cor(model, X_train, y_train)
test_score = eval_kt_cor(model, X_test, y_test)
kt_list_train.append(train_score)
kt_list_test.append(test_score)
kt_list_train = np.array(kt_list_train)
kt_list_test = np.array(kt_list_test)
print("\nmean train KT: {:.4f} +- {:.4f}".format(np.mean(kt_list_train), np.std(kt_list_train)/np.sqrt(len(kt_list_train))))
print("\nmean test KT: {:.4f} +- {:.4f}".format(np.mean(kt_list_test), np.std(kt_list_test)/np.sqrt(len(kt_list_test))))
[6]:
evaluate_model(X_fragprints, y)
Beginning training loop...
Starting trial 0
Starting trial 1
Starting trial 2
Starting trial 3
Starting trial 4
Starting trial 5
Starting trial 6
Starting trial 7
Starting trial 8
Starting trial 9
Starting trial 10
Starting trial 11
Starting trial 12
Starting trial 13
Starting trial 14
Starting trial 15
Starting trial 16
Starting trial 17
Starting trial 18
Starting trial 19
mean train KT: 0.7969 +- 0.0023
mean test KT: 0.7201 +- 0.0067
References#
[1] Chu, W. and Ghahramani, Z., Preference learning with Gaussian processes. ICML, 2005.
[2] Benavoli, A., Azzimonti, D. and Piga, D., 2021, July. Preferential Bayesian optimisation with Skew Gaussian Processes. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 1842-1850).