# Histogram & 9 steps to implement a histogram ## Histogram definition-What is a histogram?

A histogram is a statistical tool. Histograms are mostly used to analyze process behavior. The histogram looks similar to the bar chart, but there is a difference between both. The histogram shows continuous data on X-Axis, whereas in the bar-graph there is no need to indicate continuous data. In this blog, we will learn about types of histogram and 9 steps to implement a histogram.

## Bar Graph

If the data is to be indicated in the categories bar-graph is the best option. In the bar-graph, there is no class interval required. In the following bar-graph, the rent of 1000 square-feet home is shown. No class interval has been shown between two columns.

## When to use a histogram (Application of Histogram)

1. To understand process behavior, whether the process is ‘normally distributed process’ or not?
2. To analyze whether the process meets output/customer requirements or not?
3. To analyze whether any data points fall beyond the control limit or not?
4. Whether the process is subjected to drift/change due to environmental or other input parameters over time.
5. To analyze special causes in the process.
6. To establish verification and validation of the process.
7. Histogram data can help to determine the key root causes of process failure/defects by using problem-solving tools like fish-bone/Ishikawa or 5-Why Analysis or CAPA.

## What is the meaning of ‘normally distributed process’?

1. In a normally distributed process, data points generate a bell-shaped curve
2. The highest point on the center-line of the normally distributed curve acts as an average.
3. Center-line divides a bell-shaped curve into two symmetrical sections.
4. Most of the data points appear near average.
5. Max and min points appear at less frequency.

## How to make a histogram (Steps for drafting a histogram)

Step 1 Data collection: Collect data of the process which you are planning to analyze.  For better results, collect data points more than 50 to 150.

Step 2 Calculate the number of data point ‘P’ as per following details,

No of Rows = 11 (A to k)

No of Columns = 6 (1 to 6)

Number of Data Points ‘P’ = No of Rows (11) X No. of columns (6) = 66

Step 3 Calculate the total range ‘R’ as per following details,

Range = Highest Value – Lowest Value

Range = 30.5 -28.5 = 2

Step 4 Choose the number of bins/columns: The shape of the histogram depends on the number of columns. As a rule of thumb, columns/bin should not be too large or too small. It’s a best practice to select number of column approximately square root of the data point (P)

Choose the number of bins/columns = Square root of data point ‘P’ (66) = 8.12

*Consider 8 columns for histogram.

Step 5 Calculate Column/bins width: Calculate column width as per the following formula

Column Width = Range (2) / number of columns (8)

Column Width = 0.25

Step 6 Calculate column/bins intervals as per the following formula

Column Intervals = Smallest Values + Column width

Eg. 28.5 – 28.75

Step 7 Draft a histogram count sheet as shown in the following table.

Step 8 Draw and label X & Y-axis: Add characteristics on X-axis and frequency on Y-axis. Draw the histogram based on histogram count table/data.

Step 9 Connect each column/bins cell’s midpoints by a curve. Determine, whether histogram generated bell-shaped curve or not.

Histogram shapes and it’s analysis: The first phase of drafting histogram we have covered so far, now let’s understand different types of a histogram and it’s analysis.

## Skewed Distribution:

There are two types of skewed distribution, right side skewed distribution, and left side distribution. If the tail exists towards the left side of the histogram, it is considered as a negatively skewed distribution. If the tail exists towards the right side of the histogram, it is considered as a positively skewed distribution. The skewed distribution indicates uneven quality/output of the process. If the skewness exists in the process, it means the process capability must be verified. Skewness left or right indicates that the process may go out of control. Without corrective action, process output/products will have defects.

• Left Side / Negative Skew – Mean is less than the median
• Right Side / Positive Skew – Mean is greater than the median

## Histogram with special causes:

If you observe one or two-columns in the histogram is showing a shift/spike in frequency, it’s known as a histogram with special causes. To obtain a normally distributed process, special causes in the process must be eliminated.