StackGP.rmse#

rmse is a StackGP function that computes the root mean squared error for a GP model.

The function expects 3 arguments: model, inputData, and response

The arguments are described below:

  • model: A StackGP model.

  • inputData: A numpy array containing the input data used to evaluate the model.

  • response: The ground truth response to compare the model’s response to.


First we need to load in the necessary packages

import StackGP as sgp
import numpy as np

Overview#

Computing RMSE for a random model#

Here we generate a random model with up to 4 variables, the default operator set, the default constant set, and a maxSize of 10.

randomModel=sgp.generateRandomModel(4, sgp.defaultOps(), sgp.defaultConst(), 10)

We can display the random model below

sgp.printGPModel(randomModel)
\[\displaystyle x_{0}^{2} + 12.886958230384\]

Now we generate a random input dataset and a ground truth response vector.

inputData = np.random.rand(4, 100) 
response = np.sin(inputData[0]) + np.cos(inputData[1]) + inputData[2]**2 - inputData[3]**3

Now we can compute the RMSE between the model’s response and the ground truth response.

sgp.rmse(randomModel, inputData, response)
11.885763870944336


Examples#

This section provides some interesting examples to demonstrate how rmse can be used.


Checking error of best evolved model#

Likely, we are interested in determining the quality of the best model evolved on a test set.

Lets start by generating a training and test set with 4 features.

trainInputData = np.random.rand(4, 100)
trainResponse = 5*trainInputData[0] + 2*trainInputData[1] + trainInputData[2]**2
testInputData = np.random.rand(4, 100)
testResponse = 5*testInputData[0] + 2*testInputData[1] + testInputData[2]**2

Now lets evolve a model population using the training data.

models=sgp.evolve(trainInputData, trainResponse, tracking=True, generations=200, elitismRate=10, tourneySize=30)
../_images/c02574be3c0e1769b87d2dfb5a5507aa3060b3d1d8f3548f825304a89caf3aaf.png

Now lets pick the best model evolved from the population.

bestModel = models[0]
sgp.printGPModel(bestModel)
\[\displaystyle 4.05907675740214 + \frac{8.93984953470558 \left(x_{0} \left(x_{2}^{2} + 2.22349023607508\right) + x_{1} - 1.87464002672662\right)}{e^{x_{2}^{2}} + 3.10786161984353}\]

Now lets evaluate the RMSE of the best model on the test set.

sgp.rmse(bestModel, testInputData, testResponse)
0.06950955185265639

We can compare that to the RMSE on the training set to see if we were overfitting the training or performing similarly on both sets.

sgp.rmse(bestModel, trainInputData, trainResponse)
0.06994713153756658

Using RMSE as a Fitness Objective#

While it has been shown to be less effective in training than correlation, for benchmarking or other purposes, it may be beneficial to use RMSE as a training objective.

We will use a randomly generated model to generate the response. So we start with generating a random model.

randomModel=sgp.generateRandomModel(4, sgp.defaultOps(), sgp.defaultConst(), 20)
sgp.printGPModel(randomModel)
\[\displaystyle \frac{e^{x_{2}}}{\sqrt{e^{- x_{2}}}}\]

Now we can generate some data and use the above random model to generate the response from the random input data.

inputData = np.random.rand(4, 100)
response = sgp.evaluateGPModel(randomModel, inputData)

Now we can evolve models to fit the generated training data. We will set rmse as the fitness objective rather than simplicity and correlation.

models = sgp.evolve(inputData,response, tracking=True, modelEvaluationMetrics=[sgp.rmse])
../_images/07f7aa8c11c5ebf2abaf603b2346e47a0057ca06f1ec45db660af5b83a91ec94.png

Now we can look at the best model generated.

sgp.printGPModel(models[0])
\[\displaystyle 0.999512900120274 e^{1.50038757722228 x_{2}} + 0.000571183205158241\]

Now we can check how well it generalizes to test data. First we need to generate the test data.

testInputData = np.random.rand(4, 100)
testResponse = sgp.evaluateGPModel(randomModel, testInputData)

Now we can compute the RMSE on the test data.

sgp.rmse(models[0], testInputData, testResponse)
4.5955382363536954e-05

We can also visualize the quality of predictions on the test data.

sgp.plotModelResponseComparison(models[0], testInputData, testResponse)
../_images/3a99d46d03a4d139ce12c3c363c0d80d61a8ce797c7c675d1e37b6e9a0e72b11.png

To make it easier to see how close the values are to the test values, we can visualize the residual plot.

sgp.plotModelResiduals(models[0], testInputData, testResponse)
../_images/0a803d5274ee158888a4cc106ad55f408881c22736c559b5a5392bc6c47457cf.png

Updating Embedded Objectives in Evolved Model#

We may be interested in evolving a model population using one set of fitness objectives and then evaluating and storing another set of objectives after evolution. We can easily do this with the setModelQuality function.

First we need to generate some models and data.

inputData = np.random.rand(5, 100)
randomModel = sgp.generateRandomModel(5, sgp.defaultOps(), sgp.defaultConst(), 20)
response = sgp.evaluateGPModel(randomModel, inputData)
sgp.printGPModel(randomModel)
\[\displaystyle \frac{x_{1}^{2}}{x_{2}} - x_{4}\]

Now lets train a model using the standard search objectives.

models = sgp.evolve(inputData, response, tracking=True)
../_images/421f8e0dd92dcbbc6d2bd8bb58e471b516a32dc393748227989ac257f19d82f5.png

Now we can generate a test set.

testInputData = np.random.rand(5, 100)
testResponse = sgp.evaluateGPModel(randomModel, testInputData)

Now we can upate the model embedded model quality objectives to include RMSE with respect to the test set.

for model in models: 
    sgp.setModelQuality(model, testInputData, testResponse, [sgp.fitness, sgp.stackGPModelComplexity, sgp.rmse])

Now we can look at the fitness vector of a model and we can see it contains 3 values (\(1-r^2\), complexity, and RMSE).

display(sgp.printGPModel(models[0]))
print("Fitness vector: ", models[0][-1])
\[\displaystyle \frac{1.0 x_{1}^{2}}{x_{2}} - 1.0 x_{4} + 3.89067216665765 \cdot 10^{-15}\]
Fitness vector:  [0.0, 10, 6.365517961489426e-15]