When you start your journey towards data science or data analysis, one thing is for sure that the major task in both these positions is of handling missing values using Python or R whatever platform or language you choose. It’s said that almost 75 – 80% of the time, a data scientist or data analyst […]

## Dimension Reduction with Principal Component Analysis (PCA)

Here, in this post, we’ll see how to perform Dimension Reduction with Principal Component Analysis (PCA) using Sklearn library and also learn basic idea about dimension reduction and Principal Component Analysis (PCA). Data is everywhere. Whatever you do in your day to day life will generate a tremendous amount of data that can be used by […]

## Applying Linear Regression to Boston Housing Dataset

In this post, we will apply linear regression to Boston Housing Dataset on all available features. In our previous post, we have already applied linear regression and tried to predict the price from a single feature of a dataset i.e. RM: Average number of rooms. We are going to use Boston Housing dataset which contains information […]

## How to Split Data for Machine Learning with scikit-learn

In this post, we will see how to split data for Machine Learning with scikit-learn/sklearn as its always a best practice to split your data into train and test set. As in our previous post, we defined Machine Learning as an art and science of giving machines especially computers an ability to learn to make […]

## Building simple Linear Regression model using Python’s Sci-kit library

Here in this post, we will build a simple linear regression model using Python‘s Sci-kit learn/Sklearn library. When it comes to defining Machine Learning, we can say its an art and science of giving machines especially computers an ability to learn to make a decision from data and all that without being explicitly programmed. The […]

## Top 20 Advanced Excel formulas for Data Analysis

Recently released Microsoft Excel 2016 version has 484 functions, out of these, 360 existed prior to Excel 2010. Here in this post, i have gathered most used advanced Excel formulas for Data Analysis in industry. Microsoft Excel since after released in 1987 and after version 5 which was released in 1993, became the widely applied spreadsheet […]

## Creating Time Series with Line Charts using Python’s Matplotlib library

In this post, we will see how we can create Time Series with Line Charts using Python’s Matplotlib library. Basically, in Data Visualization, Time series charts are one of the important ways to analyse data over a time. In general, any chart that shows a trend over a time is a Time series chart and usually […]

## Plotting multiple histograms with different length using Python’s Matplotlib library

In my previous posts, we have seen how we can plot stacked histogram (filled) and a stacked Step histogram (unfilled). In this post, we will see how we can plot multiple histograms with different length using Python’s Matplotlib library on the same axis. Basically, Histograms are a graphical representation of a frequency distribution of numerical data and it’s a […]