Analyzing and Visualizing Data

Analyzing and Visualizing Data

Selecting a Graph

Selecting a Graph

Pie Charts

Compare a certain sector to the total.

Useful when there are only two sectors, for example yes/no or queued/finished.

Instant understanding of proportions when few sectors are used as dimensions.

When you use 10 sectors, or less, the pie chart keeps its visual efficiency.

Selecting a Graph cont.

Bar Charts/Plots

Ordinal and nominal data sets

Compare things between different groups or to track changes over time

Measure change over time, bar graphs are best when the changes are larger

Display and compare the number, frequency or other measure (e.g. mean) for different discrete categories of data

Flexible chart type and there are several variations of the standard bar chart including horizontal bar charts, grouped or component charts, and stacked bar charts.

Frequency for each category of a categorical variable

Relative frequency (%) for each category

Selecting a Graph cont.


It plots the frequencies that data appears within certain ranges.

Underlying frequency distribution (shape) of a set of continuous data.

Underlying distribution (e.g., normal distribution), outliers, skewness, etc

Plot the frequency of score occurrences in a continuous data set that has been divided into classes

Selecting a Graph cont.

Box plots

Distribution of a continuous measure by some grouping variable

They measure the spread of the data, sort of like standard deviation.

The line in the middle of the box is the median.

The box itself represents the middle 50% of the data.

The box edges are the 25th and 75th percentiles.

The vertical size of the boxes are the interquartile range, or IQR.

The tops and bottoms of the boxes are referred to as “hinges”.

Whiskers: They represent the reasonable extremes of the data. That is, these are the minimum and maximum values that do not exceed a certain distance from the middle 50% of the data.

If no points exceed that distance, then the whiskers are simply the minimum and maximum values.

Outliers, data points that are too big or too small compared to the rest of the data.

Selecting a Graph cont.

Scatter plots

View the potential relationship of two continuous variables

Graphical view of the relationship between two sets of numbers.

Scatter plots show how much one variable is affected by another.

The relationship between two variables is called their correlation.

Find potential relationships between values, and to find outliers in data sets.