ML Time

Improving the Aftermath Management of an Earthquake

Happy Valentine's Day from MATLAB ❤️ Code ⬇️

50+ Statistics Interview Questions and Answers for Data Scientists for 2021

DIY- Multi-Level Dendrogram

How to use ML to perform RF modulation recognition?

Tools for building robust, state-of-the-art machine learning models

English Audio Speech-to-Text Transcript with Hugging Face | Python NLP

Neural Re-rendering for Full-frame Video Stabilization

AI-ML Cheat Sheets

Q4 2020 Food Delivery & Rideshare Sales Report


Microservices Architecture at Netflix!

10 Awesome Data Science Courses to make you an Awesome Data Scientist

Yann LeCun’s Deep Learning Course

Analyzing seasonality with Fourier transforms using Python & SciPy

An open-source text annotation tool for humans

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Indian Flag with Turtle using Python

This Republic day I tried to do something creative and made an Indian Flag with Turtle using Python. Turtle is a pre-installed Python library. It enables users to create pictures and shapes by providing them with a virtual canvas.

You can also refer to the YouTube video tutorial for better understanding:

GitHub Repo:

Blog link:

The Shiny AWS Book

10 Most Stable Linux Distros In 2021

Python’s Pandas vs. R’s dplyr

Top Ten Kaggle Notebooks For Data Science Enthusiasts In 2021

An R package to build your resume or CV

Applied Data Science: Free E-Book

"The Data Engineering Cookbook" by Andreas Kretz!!

Data Cleaning

Wind Forecast

Fundamentals of Python Programming

Mistakes to avoid as a Data Scientist

Fake AI generated blog

ML Interview Questions


1-Introduction to Time Series and Forecasting

2-Statistics and Analysis of Scientific Data

3-Linear Algebra Done Right

4-Linear Algebra


6-Understanding Analysis

7-Understanding Statistics Using R

8-An Introduction to Statistical Learning

9-Statistical Learning from a Regression Perspective


11-Regression Modeling Strategies

12-A Modern Introduction to Probability and Statistics

13-The Python Workbook

14-Machine Learning in Medicine — a Complete Overview

15-Introduction to Data Science

16-Applied Predictive Modeling

  1. Data Science Process

  2. Data Visualization in Business

  3. Know Machine Learning Key Terminology

  4. Understand Machine Learning Implementation

  5. Machine Learning Applications on Marketing and Retail

TOP 10 SQL Concepts for Job Interview

  1. Aggregate Functions (SUM/AVG)

  2. Group By and Order By

  3. JOINs (Inner/Left/Right)

  4. Union and Union All

  5. Date and Time processing

  6. String processing

  7. Window Functions (Partition by)

  8. Subquery

  9. View and Index

  10. Common Table Expression (CTE)

TOP 10 Statistics Concepts for Job Interview

  1. Sampling

  2. Experiments (A/B tests)

  3. Descriptive Statistics

  4. p-value

  5. Probability Distributions

  6. t-test

  7. ANOVA

  8. Correlation

  9. Linear Regression

  10. Logistics Regression

TOP 10 Python Concepts for Job Interview

  1. Reading data from file/table

  2. Writing data to file/table

  3. Data Types

  4. Function

  5. Data Preprocessing (numpy/pandas)

  6. Data Visualisation (Matplotlib/seaborn/bokeh)

  7. Machine Learning (sklearn)

  8. Deep Learning (Tensorflow/Keras/PyTorch)

  9. Distributed Processing (PySpark)

  10. Functional and Object Oriented Programming

Data Science Topics: CRISP – DM - Project Management Methodology Exploratory Data Analytics (EDA) / Descriptive Analytics Statistical Data Business Intelligence and Data Visualization Plots & Inferential Statistics Probability Distributions (Continuous & Discrete) Hypothesis Testing - The ‘4’ Must Know Hypothesis Tests Data Mining Supervised Learning – Linear Regression, OLS Predictive Modelling – Multiple Linear Regression Lasso and Ridge Regressions Logistic Regression – Binary Value Prediction, MLE Multinomial Regression Advanced Regression for Count Data Data Mining Unsupervised Learning - Clustering Data Mining Unsupervised Learning - Dimension Reduction (PCA) Data Mining Unsupervised Learning - Association Rules Recommendation Engine Network Analytics Machine Learning - k - NN Classifier Decision Tree & Random Forest Ensemble Techniques - Bagging and Boosting AdaBoost & Extreme Gradient Boosting Text Mining & Natural Language Processing (NLP) Machine Learning Classifier Technique - Naive Bayes Introduction to Perceptron, Multilayer Perceptron Building Blocks of Neural Network Deep Learning Black Box Technique - Neural Network Deep Learning Black Box Technique - SVM Survival Analytics Forecasting/Time Series – Model Driven Algorithms Forecasting/Time Series – Data Driven Algorithms

Group Normalization
Introduction Drawback of Batch Normalization Introduction to Group Normalization Other Normalization Techniques Group Normalization in detail and comparison to other normalization techniques Group Normalization Explained Benefits of Group Normalization over other techniques Number of Groups hyperparameter in Group Normalization Group Division Experiments Explained Effect of Group Normalization on deeper models Implementation of GroupNorm Does GroupNorm really work in practice? Conclusion References Credits Introduction In this blog post today, we will look at Group Normalization research paper and also look at: The drawback of Batch Normalization for smaller batch sizes Introduction to Group Normalization as an alternative to BN Other normalization techniques available and how does Group Normalization compare to those Benefits of Group Normalization over other normalization techniques Discuss the optimal number of groups as a hyperparameter in GN Discuss effect of Group Normalization on deeper models (eg. Resnet-101) Implement Group Normalization in PyTorch and Tensorflow Implement ResNet-50 with [GroupNorm + Weight Standardization] on Pets dataset and compare performance to vanilla ResNet-50 with BatchNorm layer Batch Normalization is used in most state-of-the art computer vision to stabilise training. BN normalizes the features based on the mean and variance in a mini-batch. This has helped improve model performance, reduce training time and also helped very deep models converge. But this technique also suffers from drawbacks - if batch size is too small, training becomes unstable with BN. The aim of this blog post is not to study BN, many other wonderful posts have been written on that, but to look at other alternatives such as GN. Through this blog post, I hope to introduce Group Normalization as an alternative to Batch Normalization and help the reader develop an intuition for cases where GN could perform better than BN. Drawback of Batch Normalization Knowingly or unknowingly, we have all used BN in our experiments when training a deep learning network. If you have trained a ResNet model or pretty much any other CV model using PyTorch or Tensorflow, you have made use of BN to normalize the deep learning network. From the Group Normalization research paper, We all know that BN has been established as a very effective component in deep learning. BN normalizes the features by the mean and variance computed within a batch. But despite its great success, BN exhibits drawbacks that are also caused by its distinct behavior of normalizing along the batch dimension. In particular, it is required for BN to work with sufficiently large batch size. A small batch size leads to innacurate estimation of the batch statistics and reducing BN’s batch size increases the model error dramatically. Essentially, what that means is that BN is not very effective if the batch sizes are too small. Especially for CV applications other than Image classification such as object detection, segmentation, video classification, the restriction on batch sizes are more demanding and it is difficult to have higher batch sizes. Especially in such cases, GN can be used a strong alternative to BN. Or, there could be cases where you might want to try a bigger capacity model leaving less space in the GPU to fit a bigger batch size. In such cases as well, you might want to try GN as an alternative. Introduction to Group Normalization In the paper, the authors introduce GN as a simple alternative to BN. From the paper: GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. Essentially, GN takes away the dependance on batch size for normalization and in doing so mitigates the problem suffered by BN. There are also other techniques that have been proposed to avoid batch dimension - but we will discuss them later. For now, it is essential for the reader to realize that instead of normalizing accross the batch dimension, GN normalizes accross the groups (channel dimension). This has been further explained in depth later in this post here. First, let’s look at how GN compares to BN for training accross various batch sizes keeping all else same. As can be seen in the image above, because GN does not depend on the batch size, the validation classification error (when the deep learning model is normalized using GN) is stable accross various batch sizes compared to BN. The same trend as in fig-1 can also be observed in fig-2 where the validation error is consistent accross various batch sizes for GN as opposed to BN. Another key thing to note, the validation error for GN as reported in the research paper is very similar to that for BN - therefore, GN can be considered to be a strong alternative to BN. The validation errors (from the research paper) for various batch sizes are presented in table-1 below: While BN performs slightly better than GN for batch size 32, GN performs better for all lower batch sizes. Other Normalization Techniques Group Normalization isn’t the first technique that was proposed to overcome the drawback of BN. There are also several other techniques such as Layer Normalization, Instance Normalization and others mentioned in the references of this blog post. But, GN is the first technique to achieve comparable validation error rates as compared to BN. In this section we look at the most popular normalization tecniques namely - Layer Normalization (LN), Instance Normalization (IN), Batch Normalization (BN) and Group Normalization (GN). Group Normalization in detail and comparison to other normalization techniques The above image presented in the research paper is one of the best ways to compare the various normalization techniques and get an intuitive understanding for GN. Let’s consider that we have a batch of dimension (N, C, H, W) that needs to be normalized. Here, N: Batch Size C: Number of Channels H: Height of the feature map W: Width of the feature map Essentially, in BN, the pixels sharing the same channel index are normalized together. That is, for each channel, BN computes the mean and std deviation along the (N, H, W) axes. As we can see, the group statistics depend on N, the batch size. In LN, the mean and std deviation are computed for each sample along the (C, H, W) axes. Therefore, the calculations are independent of the batch size. In IN, the mean and std deviation are computed for each sample and each channel along the (H, W) axes. Again, the calculations are independent of batch size. Group Normalization Explained Finally, for group norm, the batch is first divided into groups (32 by default, discussed later). The batch with dimension (N, C, W, H) is first reshaped to (N, G, C//G, H, W) dimensions where G represents the number of groups. Finally, the mean and std deviation are calculated along the groups, that is (H, W) and along C//G channels. This is also illustrated very well in fig-4. One key thing to note here, if C == G, that is the number of groups are set to be equal to the number of channels (one channel per group), then GN becomes IN. And if, G == 1, that is number of groups is set to 1, GN becomes LN. I would like for the reader to take a minute here and make sure that he/she understands the differences between these normalization techniques mentioned above. Benefits of Group Normalization over other techniques Also, it is important to note that GN is less restricted than LN, because in LN it is assumed that all channels in a layer make “equal contributions” whereas GN is more flexible because in GN, each group of channels (instead of all of them) are assumed to have shared mean and variance - the model still has flexibility of learning a different distribution for each group. Also, GN is slightly better than IN because IN normalizes accross each sample for each channel, therefore, unlike GN, it misses the opportunity of exploiting the channel dependence. Therefore, due to the reasons discussed above, we can see that the validation and training errors for GN are lower than those for LN and IN. Number of Groups hyperparameter in Group Normalization One key hyperparameter in Group Normalization is the number of groups to divide the channels into. The authors of the research paper ran an experiment to train ResNet-50 model on Imagenet dataset using various number of groups. As can be seen in table-2, setting number of groups to 32 achieves the lowest validation error. In the bottom part of table-2, the authors set a fixed number of channels per group. Essentially, since each layer in a deep learning model can have various number of channels, this means there are varying number of groups per layer. Setting 16 channels per group achieved the lowest score. Group Division Experiments Explained Let’s understand what’s going on with help of VGGNet. As can be seen, there are varying number of channels in different layers of VGGNet (this is also the case for other deep learning models like ResNet, DenseNet etc). The authors essentially in the first experiment, divide each layer into 32 groups. Thus for layer 2 of VGGNet with 128 #channels, there are 128//32, that is, 4 channels per group if group number is set to 32. The authors ran this experiments for varying number of groups and found for number of groups set to 32 to have the lowest validtion error. For the second experiment, the authors set the number of channels per group fixed. For example, if number of channels per group was set to 16, then the second layer with 128 channels had 128//16, that is, 8 groups and the third layer with 256 channels had 256//16, 16 groups and so on. The authors found setting 16 channels per group to have to have the lowest validation error. Effect of Group Normalization on deeper models The authors also ran experiments and trained ResNet-101 architecture for batch size 32 and compared the validation errors with BN and GN implementation. The authors found the BN baseline to have 22.0% validation error and the GN counterpart to have 22.4% validation error. Also, for batch size 2, the authors found the GN error to be 23.0% which is still a very decent result considering the very small batch size. Thus, I think from the results of this experiment, it is safe to say that GN with smaller batch sizes also works for larger models. Implementation of GroupNorm Finally, we are now ready to look at the implementation of GN. The following snippet of code has been provided in the research paper: def GroupNorm(x, gamma, beta, G, eps=1e−5): # x: input features with shape [N,C,H,W] # gamma, beta: scale and offset, with shape [1,C,1,1] # G: number of groups for GN N, C, H, W = x.shape x = tf.reshape(x, [N, G, C // G, H, W]) mean, var = tf.nn.moments(x, [2, 3, 4], keep dims=True) x = (x − mean) / tf.sqrt(var + eps) x = tf.reshape(x, [N, C, H, W]) return x ∗ gamma + beta Essentially, the authors reshape the batch and divide into groups with C // G channels per group where, C: number of channels G: number of groups Finally, as discussed in this section, the authors normalize along the (C//G, H, W) dimension and return the result after reshaping the batch back to (N, C, H, W). I hope that by this time, the implementation should be clear to the reader. If it isn’t, either I have not explained GN very well, or I kindly ask the reader to go back to Group Normalization Explained section and have a quick re-read. Finally, we could rewrite GN in PyTorch like so: import torch import torch.nn as nn class GroupNorm(nn.Module): def __init__(self, num_features, num_groups=32, eps=1e-5): super(GroupNorm, self).__init__() self.gamma = nn.Parameter(torch.ones(1,num_features,1,1)) self.beta = nn.Parameter(torch.zeros(1,num_features,1,1)) self.num_groups = num_groups self.eps = eps def forward(self, x): N, C, H, W = x.size() x = x.view(N, self.num_groups ,-1) mean = x.mean(-1, keepdim=True) var = x.var(-1, keepdim=True) # normalize x = (x-mean) / (var+self.eps).sqrt() x = x.view(N,C,H,W) return x * self.gamma + self.beta PyTorch also inherently supports GroupNorm and can be used by using nn.GroupNorm. Having implemented GN in PyTorch and Tensorflow, we are now ready to run our own experiments and see the results for ourselves in the next section. Does GroupNorm really work in practice? Personally, I wanted to try a little experiment of my own to compare GN with BN and corroborate the findings in the GN research paper. You can find the experiment in this notebook here. Basically, in the experiment, I trained two ResNet-34 architectures on the Pets dataset - one with BN and other with GN. To my surprise, I found that simply replacing BatchNorm with GroupNorm led to sub-optimal results and the model with GroupNorm used as the normalization layer performed much worse than the model normalized with BatchNorm layer even for a very small batch size of 4. This was very different to the results reported in fig-1. Thanks to Sunil Kumar who pointed me to Big Transfer (BiT): General Visual Representation Learning research paper where I noticed that the researchers used a combination of Weight Standardization and GN to achieve SOTA results. So I tried this out with the implementation of Weight Standardization as in the official repository here and very quickly I was able to replicate the results with GN + WS performing significantly better than BN for batch size of 1 here. Conclusion I hope that I have been clear in my explaination of Group Normalization, and also through my experiments, I have been able to provide a way for you to implement GN in PyTorch and Tensorflow and run experiments of your own. As always, constructive feedback is always welcome at @amaarora. Also, feel free to subscribe to my blog here to receive regular updates regarding new blog posts. Thanks for reading! References Group Normalization by He et al Batch Normalization by Ioffe et al Instance Normalization: The Missing Ingredient for Fast Stylization Layer Normalization Weight Standardization Implementation of Weight Standardization from the official repository Deep Residual Learning for Image Recognition Credits Thanks to @AryMob for pointing out errata in this post.

Machine Learning with Graphs, Leskovec

1-The Elements of Statistical Learning

2-Introductory Time Series with R

3-A Beginner’s Guide to R

4-Data Structures and Algorithms with Python

5-Introduction to Statistics and Data Analysis

6-Principles of Data Mining

7-Computer Vision

8-Data Mining

9-Robotics, Vision and Control

10-Statistical Analysis and Data Display

11-Statistics and Data Analysis for Financial Engineering

12-Stochastic Processes and Calculus

13Statistical Analysis of Clinical Data on a Pocket Calculator

14-Clinical Data Analysis on a Pocket Calculator

15-The Data Science Design Manual

16-An Introduction to Machine Learning

The Laplace Transform: A Generalized Fourier Transform

’Introduction to Statistics for Data Science, Exploratory Data Analysis in Python‘


Probability Mass Functions

Cumulative distribution functions

Modeling distributions

Probability density functions

Relationships between variables


Hypothesis testing

Linear least squares


Time series analysis

Survival analysis

Analytic methods

Credit: Allen B. Downey

XGBoost Tutorials: 1-Introduction to Boosted Trees 2-Distributed XGBoost with AWS YARN 3-Distributed XGBoost with XGBoost4J-Spark 4-DART booster 5-Monotonic Constraints 6-Random Forests in XGBoost 7-Feature Interaction Constraints 8-Text Input Format of DMatrix 9-Notes on Parameter Tuning 10-Using XGBoost External Memory Version (beta)