StackGP.setModelQuality

StackGP.setModelQuality#

setModelQuality is a StackGP function that computes the fitness objectives for the model and updates the model inplace.

The function expects 3 arguments: model, inputData, and response

The required arguments are described below:

model: A StackGP model.
inputData: A numpy array containing data to evaluate the model at.
response: A numpy array containing the ground truth response vector.

The function has 1 optional argument:

modelingObjectives: A list of objectives to evaluate the model. If none is supplied, the default of [fitness, stackGPModelComplexity] will be used.

First we need to load in the necessary packages

import StackGP as sgp
import numpy as np

Overview#

Computing setting model quality for a random model#

Here we generate a random model with up to 4 variables, the default operator set, the default constant set, and a maxSize of 10.

randomModel=sgp.generateRandomModel(4, sgp.defaultOps(), sgp.defaultConst(), 10)

We can display the random model below

sgp.printGPModel(randomModel)

\[\displaystyle x_{0} x_{2} - x_{3}\]

Now we can generate some random data to use to evaluate the model. Since the data is random, it isn’t really a measure of model quality, rather just how similar the model response is to the random data.

randomInput=np.random.rand(100, 4)
randomResponse=np.random.rand(100)
sgp.setModelQuality(randomModel, randomInput, randomResponse)
print("Model's fitness vector: ",randomModel[-1])

Model's fitness vector:  [1, 9]

Examples#

This section provides some interesting examples to demonstrate how setModelQuality can be used.

Evaluating Best Model on Test Set#

Once we train a model set, we are likely interested in the quality of the best model on the test set.

Lets start by generating a training and test set with 4 features.

trainInputData = np.random.rand(4, 100)
randomModel = sgp.generateRandomModel(4, sgp.defaultOps(), sgp.defaultConst(), 10)
display(sgp.printGPModel(randomModel))
trainResponse = sgp.evaluateGPModel(randomModel, trainInputData)

\[\displaystyle 8.69462703152357 \cdot 10^{-5} - x_{2}\]

Now lets evolve a model population using the training data.

models=sgp.evolve(trainInputData, trainResponse, tracking=True)

../_images/ec7f684ef18db727a6b91018a6b97ddd0a8c6a4aba4c0b3546fe087e12df661a.png

Now lets pick the best model evolved from the population.

bestModel = models[0]
sgp.printGPModel(bestModel)

\[\displaystyle 8.69462703151445 \cdot 10^{-5} - 1.0 x_{2}\]

Now lets generate a test set.

testInputData = np.random.rand(4, 100)
testResponse = sgp.evaluateGPModel(randomModel, testInputData)

Now we can update the model fitness vector of the best model to determine performance on the test set.

sgp.setModelQuality(bestModel, testInputData, testResponse)
print("Best model's test fitness vector: ", bestModel[-1])

Best model's test fitness vector:  [1.3322676295501878e-15, 5]

Updating Model Population on Test Set#

We may be interested in updating the fitness vector for the whole population using a test set.

Lets start off by generating a random model which we will use to generate the training set.

randomModel=sgp.generateRandomModel(4, sgp.defaultOps(), sgp.defaultConst(), 20)
sgp.printGPModel(randomModel)

\[\displaystyle e^{x_{1} + x_{2} - e^{x_{3}}} + \frac{0.318309886183791}{x_{1}}\]

Now we can generate some data and use the above random model to generate the response from the random input data.

inputData = np.random.rand(4, 100)
response = sgp.evaluateGPModel(randomModel, inputData)

Now we can evolve models to fit the generated training data.

models = sgp.evolve(inputData,response, tracking=True)

../_images/e2e0b87ae7dfc5847fd6a9b0f5abf537a13d35c18c6f18f0a1e8fb1f3b998ae0.png

Lets first look at the Pareto front plot with respect to the training data.

sgp.plotModels(models)

../_images/58c250e90b4b49c2aee0d2de1748263a408f043d2a53fedb3e545916954a3ecd.png

We can also view the model accuracy distribution plot with respect to the training data.

sgp.plotModelAccuracyDistribution(models)

../_images/ed8642d0a14dea79e37ed603d1ee11e1ad24bf27190a3360cd80dda6574fa087.png

Now lets generate some test data.

testInputData = np.random.rand(4, 100)
testResponse = sgp.evaluateGPModel(randomModel, testInputData)

Now we can update the model quality using the test data.

for model in models: 
    sgp.setModelQuality(model, testInputData, testResponse)

Now we can view the Pareto front plot with respect to the test data.

sgp.plotModels(models)

../_images/81db708497ef4d5f6a1cb990ee0369934f2d48927adcaffe6e49ed94787aec11.png

We can also view the model accuracy distribution plot with respect to the test data.

sgp.plotModelAccuracyDistribution(models)

../_images/d071249e58aba8b423efe62206aca389783dbd4ca90dc2b91280ecba7e5e6548.png

Evaluating on a Different Set of Objectives#

There may be cases where we want to train using one set of objectives and then do analysis using another set of objectives. In this case, we will train using the default set of objectives and then evaluate the models on the test set using RMSE.

First lets generate random input data and a random model to generate the target response data.

inputData = np.random.rand(100, 8)
randomModel = sgp.generateRandomModel(8, sgp.defaultOps(), sgp.defaultConst(), 20)
display(sgp.printGPModel(randomModel))
response = sgp.evaluateGPModel(randomModel, inputData)

\[\displaystyle x_{4} \left(2.71828182845905 x_{1} - x_{3}\right) \sqrt{e^{x_{1}}}\]

Now lets train a model population using the generated training data. To make it interesting, lets set the generation count to 200, the elitism rate to 10% and the tournament size to 30.

models=sgp.evolve(inputData, response, tracking=True, generations=200, elitismRate=10, tourneySize=30)

../_images/254af3d45543364ef60f0eda472fcfcb178ae31cfcb8941cc060d4e3aeafd44d.png

Now lets view the accuracy distribution plot with respect to the training data and original fitness objectives.

sgp.plotModelAccuracyDistribution(models)

../_images/2a81567ab98114ef736f760eb476b872c4d3fe48c25c2973ffa63a6105dedb5e.png

Now lets generate some test data.

testInputData = np.random.rand(100, 8)
testResponse = sgp.evaluateGPModel(randomModel, testInputData)

Now rather than just updated the model quality using the test data, lets change the evaluation objective to be rmse.

for model in models:
    sgp.setModelQuality(model, testInputData, testResponse, modelEvaluationMetrics=[sgp.rmse])

Now lets visualize the model accuracy distribution plot. In this case we will see the plot with respect to rmse rather than fitness.

sgp.plotModelAccuracyDistribution(models)

../_images/ccf002e862f31465dd8377ba8c0b1f30f8c0bbd0b9153f5524f77206927e6422.png

That plot may not have been very informative so it may be useful to filter out models with very large errors. Here we select the models with the 100 best accuracies and then display the accuracy distribution plot again.

qualityModels = sgp.selectModels(models, selectionSize=100)
sgp.plotModelAccuracyDistribution(qualityModels)

../_images/d5c8630bf1603853b70f0d95ed132b812909a319d542269f21e84f17a3981483.png