Sampling distribution of statistic is the main step in statistical inference.

In classic statistics, the statisticians mostly limit their attention on the inference, as a complex procedure on drawing a conclusion for a population based on the limited number of samples. For example, we want to know what the gender pay gap is in Australia. Strictly speaking, what is the Australians’ average pay gap? In order to answer similar questions, we collect data called sampling from larger data set known as population which is usually theoretical or imaginary in the sense that we do not access to the whole data…

Drawing a valid conclusion depends heavily on how you collect data.

In many statistical analysis and data-driven decisions, we need to draw an actionable conclusion, supported by the data collected already or needed to be collected. **However, the quality of conclusion derived from the data heavily depends on the quality of the data you collected. **During data collection, you sample data usually from a larger data set. In most of cases, you do not have luxury to collect the data from all cases and hence measure the intended metrics truely. However, you can infer some characteristics of the larger dataset…

Explore how the data is distributed overall.

In the first part, we found the basis for the exploratory data analysis (EDA) with focus on the statistics summary. In this part, we will exclusively look at the data distribution and investigate different approaches in representing the data.

Using the statistics summary, we gain an insight on the abstract of the data in the form of numerical values such as location and variability. This is particularly important in univariate analysis where only one single variable is of interest. …

The first step in any data science project is exploratory data analysis (EDA)!

It’s been around 60 years since the eminent American statistician, **John W. Tuckey** (1915–2000), published a revolutionary paper called “**The Future of the Data Analysis**” which proposes new scientific discipline called data analysis. In classic statistics, the statisticians mostly limit their attention on the inference, as a complex procedure on drawing a conclusion for a population based on the limited number of samples. John Tuckey, established a link between statistics and computational science such as the computer science and engineering. …

Confounding can be increasingly misleading!

We as humans, infer many rules and associations between two or more phenomena based on the observations we perceive via our sensing system. ** Confounding might cause illusion of understanding since humans get used to come to a quick conclusion based on very limited observations. **So, what is the confounding effect? We are going to answer this question and also propose some approach to lessen its effect on our decision process.

Suppose that you’d like to infer if there is any causal…

Canonical correlation analysis (CCA)is a statistical technique to derive the relationship between two sets of variables. One way to understand the CCA, is using the concept of multiple regression. In multiple regression, the relationship between one single dependent variable and a set of independent variables are investigated. In CCA, we extend the multiple regression concept to more than one dependent variable. In some applications, we confront with more than one dependent variable which are inter-correlated, so it is not sensible to ignore dependency. For example, in depression study, the Centre for Epidemiological Studies Depression (CESD) and health status are inter-correlated…

Common language is the pillar of an effective communication. When a set of common languages leads to a set of common behaviour, the *protocol *is born. In the *New Oxford American Dictionary*, the protocol is defined as

“The official procedure or system of rules governing affairs of state or diplomatic occasions”.

At higher layer a collection of related protocols is called protocol suite. Today’s Internet is based on the TCP/IP suite which is inspired by the early model draws its origin from *ARPANET* reference model (ARM) which was a part of DARPA project.** The core objective of the ARM was…**

Boxplot is a powerful visual tool to give a statistical summary of the underlying distribution such as location, scale, skew and tails. It also can be served to identify potential outliers which needed to be treated differently than majority of data. Although the boxplot does not make any assumption on the sample distribution, many analysts are not aware of the common assumptions on identifying possible outliers based on the boxplot visualization. In Tukey’s boxplot, it is assumed that the underlying distribution is a symmetric with light tail in both sides much like a normal distribution. So, it gives the same…

One of the main goals of statistical analysis is to find the location and scale parameters for a statistical distribution. The location parameter specifies the typical value, i.e., the central value of the distribution while the scale parameter is used to measure the dispersion or variation of the distribution.

For **location parameter**, three common definitions can be used:

this is the arithmetic mean of data samples which is usually referred to as average of data samples. The mean value is affected by the extreme values in tails easily.*Mean value:*

2. ** Median value: **this value represents the middle data point…

Have you ever faced a situation where you set a goal and resolution for the new year but when it came to action, you gave up? Have you ever felt that you are not determined enough in sticking to your action plan?

If you concern about the situations alike, you are lacking self-discipline. This is a short summary of the book “Self Discipline Mindset: Why Self Discipline Is Lacking In Most And How To Unleash It Now” by Curtis Leone. I found this book’s material useful in achieving self-discipline. Hope you will be on my side after reading this note.

…

Technology enthusiast, Futuristic, Telecommunications, Machine learning and AI savvy, work at Dolby Inc.