How to take Your Trained Machine Learning Models to GPU for Predictions in 2 Minutes

Tanveer Khan
AI For Real
Published in
5 min readOct 8, 2020

--

figure — 1

Problem

We all encounter above situation when inference or prediction time of our Machine Learning model is high. This specially happens for complicated ensembles models like Random Forest, Gradient Boosting etc. Further prediction time for the model increases with the number of features so larger models have high inference time.

This overall results in poor response time of API serving a model or long duration batch cycles. We strive for improving performance by using techniques like scaling-up server, use load balancer, run multiple models in parallel etc.

In the worst case scenario, We look for training a new model that can perform inference in the faster way or retrain the model on the GPU as some of the implementations like xgboost supports GPU based training.

Now a days most of people/projects have an access to the GPU but they cannot use those GPU’s for improving inference time of existing Trained Sklearn model.

We need some different approach to solve this.

Solution:

In this blog, We will discuss a library from Microsoft Research- Hummingbird, that converts trained scikit-learn models into tensor computations that can run on GPU yielding faster prediction/inference time.

Hummingbird also allows user to leverage latest neural network like Pytorch to accelerate traditional Machine Learning model.

Few benefits of using Hummingbird are:

  1. Latest Improvements — As they are using these latest and evolving framework their models can benefit from current and future improvements/optimization in these frameworks .
  2. Traditional models can run on GPU’s which is a native Hardware Based Acceleration.
  3. Performance Improvements (Time Required for Inference Reduction) during the model serving.
  4. There is no difference in the prediction output of original model and converted model. Model performance stays exactly same.
  5. As most of the companies are investing in the GPU’s so single infrastructure can be used for running both types of models. It will lead to better and efficient Machine learning operations.
  6. Converting existing models to Tensor Computation avoid need of retraining these models with the GPU support. This will save a lot efforts and results in a lot of saving.
  7. Faster Turn Around Time. Converting a model to tensor computation is a matter of one single line of code.
  8. API of Hummingbird is consistent with SKlearn so model can be changed under the hood with NO change in your code. It’s totally abstracted.

So overall conversion process looks like this:

figure -2

The only thing we need to do is just replace machine learning model in figure -1 with converted model in figure -2. No change in Code At All. Amazing !!

Figure — 3

After just replacing the model file you are good to go and you can start using CUDA cores, bandwidth optimization, large number of registers which leads to Faster Computations in GPU.!!

Now let’s look at the code part which is very simple. I am releasing the google colab notebook for the same.

First Train a scikit-learn model for a classification problem which classifies 3 classes. In the below code we will create fake data and create a classifier.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
num_classes = 3
X = np.random.rand(100000, 50)
y = np.random.randint(num_classes, size=100000)
randomForestCPU = RandomForestClassifier(n_estimators=10, max_depth=10)
randomForestCPU.fit(X, y)

Let’s run the prediction using this model which uses CPU to make prediction and compute the time for prediction as.

%%timeit is ipython statement to capture time taken by an execution to complete.

%%timeit
randomForestCPU.predict_proba(X)

We can see the time taken by making predictions over CPU as:

Inference Time Taken By Model On CPU

Now, We will install hummingbird library and will convert our model in TWO LINE OF CODE. Magic isn’t it ??

First line will convert the model to Pytorch equvivalent and then we need to set the target device of execution of model to GPU (to(‘cuda’))

from hummingbird.ml import convert
# Use Hummingbird to convert the model to PyTorch
randomForestGPU = convert(randomForestCPU, 'pytorch')
randomForestGPU.to('cuda')

Let’s go and check the inference time of the GPU based Randomforest that we created in the above step:

Inference Time Taken By Model On GPU

Voila, We can time taken for inference of same number of records on GPU is 19.8 ms and on CPU is 99.1 ms. This 5X reduction in inference time which is a huge huge can.

Another thing that we would need to ensure is to do sanity check that both the models are producing exact same results otherwise our comparison will not be valid. So we will use following code to compute the class prediction.

y_gpu = randomForestGPU.predict(X)
y_cpu = randomForestCPU.predict(X)
if False in np.equal(y_gpu,y_cpu).tolist():
print ("prediction mismatched")
else:
print ("prediction matched")

Results are:

We can despite reduction in inference time both the models produced exactly same results !!

Lo and Behold all this without any retraining, without any significant effort, without any accuracy loss and with ONLY two lines of Code. And no change in existing code etc.

Summary

So we have seen very significant improvement in inference time after using hummingbird converted model. Time reductions are very high and that too without any effort or impacting any of the existing code. We have also noticed that accuracy of original model and converted model is exactly same. So basically there is no effort or cost associated with the process.

However, these time reductions will be very significant for large models if you attempt to use this approach for small simple models like Linear Regression or Logistic regression it will be a smaller improvement. We did an experiment and below finding shows their is a 3X reduction in run time.

We hope you find this post informative and useful. Please drop your suggestion in the comment box.

Happy Learning !!

Code shown in this notebook is present at the below github repo:

Reference Link For Git Repo For Hummingbird

--

--

Tanveer Khan
AI For Real

Sr. Data Scientist with strong hands-on experience in building Real World Artificial Intelligence Based Solutions using NLP, Computer I Vision and Edge Devices.