## Histogram definition-**What is a histogram? **

A histogram is a statistical tool. Histograms are mostly used to analyze process behavior. The histogram looks similar to the bar chart, but there is a difference between both. The histogram shows continuous data on X-Axis, whereas in the bar-graph there is no need to indicate continuous data. In this blog, we will learn about types of histogram and 9 steps to implement a histogram.

## Bar Graph

If the data is to be indicated in the categories bar-graph is the best option. In the bar-graph, there is no class interval required. In the following bar-graph, the rent of 1000 square-feet home is shown. No class interval has been shown between two columns.

## Histogram Graph Vs. Bar Graph

Histogram | Bar Graph |

The histogram indicates the frequency of occurrences. | The bar graph indicates different data categories |

Class Interval required in the histogram | No class interval is required in the bar graph. |

In histogram columns/blocks cannot be re-arranged | In a bar graph columns/blocks can be re-arranged in ascending or descending order or lowest to highest. |

In a histogram, column width may vary. | The column width in the bar graph will always remain the same |

Histogram useful in calculating process capability. | Bar graph useful for comparison of different data categories in one graph. |

## When to use a histogram (**Application of Histogram**)

- To understand process behavior, whether the process is ‘normally distributed process’ or not?
- To analyze whether the process meets output/customer requirements or not?
- To analyze whether any data points fall beyond the control limit or not?
- Whether the process is subjected to drift/change due to environmental or other input parameters over time.
- To analyze special causes in the process.
- To establish verification and validation of the process.
- Histogram data can help to determine the key root causes of process failure/defects by using problem-solving tools like fish-bone/Ishikawa or 5-Why Analysis or CAPA.

**Types of Histograms** (histogram shapes)

### 1.Bar Graph

### 2.Column Graph

### 3.Normal Histogram

### 4.Bimodal histogram (Polymodal)

### 5.Negatively Skewed Distribution

### 6.Truncated Histogram

**What is the meaning of ‘normally distributed process’?**

- In a normally distributed process, data points generate a bell-shaped curve
- The highest point on the center-line of the normally distributed curve acts as an average.
- Center-line divides a bell-shaped curve into two symmetrical sections.
- Most of the data points appear near average.
- Max and min points appear at less frequency.

## How to make a histogram (**Steps for drafting a histogram**)

**Step 1 Data collection:** Collect data of the process which you are planning to analyze. For better results, collect data points more than 50 to 150.

1 | 2 | 3 | 4 | 5 | 6 | |

A | 30.2 | 30.1 | 28.5 | 29.5 | 29.4 | 30.3 |

B | 29.8 | 30.3 | 29.8 | 30.1 | 29.1 | 28.7 |

C | 30.1 | 29.4 | 29.5 | 30.3 | 29.9 | 30.3 |

D | 28.7 | 30.2. | 29.8 | 30.1 | 28.5 | 28.8 |

E | 29.4 | 29.9 | 29.9 | 29.4 | 28.8 | 30.3 |

F | 30.1 | 30.1 | 30.3 | 28.9 | 29.1 | 29.8 |

G | 30.1 | 30.1 | 29.8 | 29.8 | 28.9 | 29.2 |

H | 30.3 | 28.8 | 30.1 | 29 | 28.5 | 30.3 |

I | 29 | 28.5 | 29.1 | 29.6 | 29.3 | 29.8 |

J | 30 | 29.6 | 30.4 | 29.7 | 29.9 | 30 |

K | 30.4 | 29.4 | 30 | 29.1 | 29.1 | 29.9 |

**Histogram – Data Collection**

**Step 2 Calculate the number of data point ‘P’** as per following details,

No of Rows = 11 (A to k)

No of Columns = 6 (1 to 6)

Number of Data Points ‘P’ = No of Rows (11) **X** No. of columns (6) = 66

**Step 3 Calculate the total range ‘R’** as per following details,

Range = Highest Value – Lowest Value

Range = 30.5 -28.5 = **2**

**Step 4 Choose the number of bins/columns:** The shape of the histogram depends on the number of columns. As a rule of thumb, columns/bin should not be too large or too small. It’s a best practice to select number of column approximately square root of the data point (P)

Choose the number of bins/columns = Square root of data point ‘P’ (66) = 8.12

*Consider 8 columns for histogram.

**Step 5 Calculate Column/bins width:** Calculate column width as per the following formula

Column Width = Range (2) / number of columns (8)

Column Width = 0.25

**Step 6 Calculate column/bins intervals** as per the following formula

Column Intervals = Smallest Values + Column width

Eg. 28.5 – 28.75

**Step 7 Draft a histogram count sheet** as shown in the following table.

Column | Column interval | Tally / Count |

1 | 28.5 – 28.75 | |

2 | 28.76 – 29.1 | |

3 | 29.2 – 29.45 | |

4 | 29.46 – 29.71 | |

5 | 29.72 – 29.97 | |

6 | 29.98 – 30.23 | |

7 | 30.24 – 30.49 | |

8 | 30.5 |

**Histogram Count Sheet**

**Step 8 Draw and label X & Y-axis:** Add characteristics on X-axis and frequency on Y-axis. Draw the histogram based on histogram count table/data.

**Step 9** Connect each column/bins cell’s midpoints by a curve. Determine, whether histogram generated bell-shaped curve or not.

**Histogram shapes and it’s analysis**: The first phase of drafting histogram we have covered so far, now let’s understand different types of a histogram and it’s analysis.

** Skewed Distribution**:

There are two types of skewed distribution, right side skewed distribution, and left side distribution. If the tail exists towards the left side of the histogram, it is considered as a negatively skewed distribution. If the tail exists towards the right side of the histogram, it is considered as a positively skewed distribution. The skewed distribution indicates uneven quality/output of the process. If the skewness exists in the process, it means the process capability must be verified. Skewness left or right indicates that the process may go out of control. Without corrective action, process output/products will have defects.

**Left Side / Negative Skew**– Mean is less than the median**Right Side / Positive Skew**– Mean is greater than the median

**Histogram with special causes: **

If you observe one or two-columns in the histogram is showing a shift/spike in frequency, it’s known as a histogram with special causes. To obtain a normally distributed process, special causes in the process must be eliminated.

**Bimodal Histogram: **

This histogram chart has unique features, it has two peaks and one valley. A bimodal histogram is useful to measure the machine performance/output in different shift-operation or in a slight variation in input parameters.

We hope, you have gained knowledge from this blog. If you liked this content and suggesting improvement in this content, please comment below. We are happy to improve this blog with your valuable suggestion.

## One thought on “Histogram & 9 steps to implement a histogram”