Einstein summation is a convention for simplifying expression that includes summation of vectors, matrices or in general tensor. *Remember scalar is zero rank tensor, vector is a rank one tensor and matrices are rank two tensors. Basically scalar, vectors and matrices are different forms of tensor based upon their rank.*

There are three rules which need to be followed to represent an expression as Einstien Summation and they are:

- Values along the repeated indices (axis) are multiplied and then implicitly sum over. (if an index (axis) in expression is repeated it will be implicitly summed over i.e. …

Containerization is buzz word. Everyone talks about docker and containerization. Everyone want their project’s to be containerized because of associated benefit. But challenge is there are very few who understands actually what containers are and how they can be used in Artificial Intelligence based projects.

I have attended many sessions about Docker but I couldn’t understand much and more importantly how it’s going to help in my role as Data Scientist.

Objective of this blog is to cover the information that is required to know about containers, what containers are, why they are useful and how to use them as…

**Problem**

We all encounter above situation when inference or prediction time of our Machine Learning model is high. This specially happens for complicated ensembles models like Random Forest, Gradient Boosting etc. Further prediction time for the model increases with the number of features so larger models have high inference time.

This overall results in poor response time of API serving a model or long duration batch cycles. We strive for improving performance by using techniques like scaling-up server, use load balancer, run multiple models in parallel etc.

In the worst case scenario, We look for training a new model that…

Data Exploration is the very first and fundamental task that Data Scientist’s perform as soon as they receive the data.

Often Data Exploration even in a basic sense takes a lot of time. Though some of the metrices which Data Scientists want to take a look are common for various tasks but they usually don’t have a single code base to run such tasks. And if not every time but most of the time they need to re-write the code, fix the error etc. This results in lot of time.

There are various reasons for doing data exploration:

- Understanding distribution…

We often want to experiment with Spark but gets stuck with the absence of Spark Environment. In this post we will discuss how to setup a Spark environment inside the google colab with the few line of codes and We can use spark right away there in few minutes.

Below are the steps for installing Spark inside the google colab:

- Pre-requisite for Spark is installing Java. We need to have Java installed before setting-up Spark in colab.

*!apt-get install openjdk-8-jdk-headless*

2. We will see a message once Java is installed. We can check Java version.

!java — version

Many of us are unaware of a relationship between Cosine Similarity and Euclidean Distance. Knowing this relationship is extremely helpful if we need to use them interchangeably in an indirect manner. One application of this concept is converting your Kmean Clustering Algorithm to Spherical KMeans Clustering algorithm where we can use cosine similarity as a measure to cluster data.

We often want to cluster text documents to discover certain patterns. K-Means clustering is a natural first choice for clustering use case. K-Means implementation of scikit learn uses “Euclidean Distance” to cluster similar data points.

It is also well known that…

Sr. Data Scientist with strong hands-on experience in building Real World Artificial Intelligence Based Solutions using NLP, Computer Vision and Edge Devices.