ggplot2 tutorial: Box Plots

ggplot2 tutorial: Box Plots {Celebrity |Famous |}%title%{ Net Worth| Wealth| Profile}
Web Reference: A system for declaratively creating graphics, based on "The Grammar of Graphics". You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. ggplot2 is an R package for producing visualizations of data. Unlike many graphics packages, ggplot2 uses a conceptual framework based on the grammar of graphics. This allows you to ‘speak’ a graph from composable elements, instead of being limited to a predefined set of charts. More complete information about how to use ggplot2 can be found in the book, but here you’ll find a brief ... All ggplot2 plots begin with a call to ggplot(), supplying default data and aesthetic mappings, specified by aes(). You then add layers, scales, coords and facets with +. To save a plot to disk, use ggsave().
YouTube Excerpt: Learn more about ggplot2 in R at: https://www.datacamp.com/courses/data-visualization-with-ggplot2-part-3 In this chapter we'll take a look at our first class of specialised plots - what I like to call statistical plots. Actually, all the plots we've made so far can be called statistical plots, since data visualisation operates at the intersection of design and statistics. However, in contrast to the plots we've seen previously, these are more well-suited to an academic audience. We're going to look at two very common plot types for visualising the distribution of continuous data: box plots and density plots. We'll return to box plots in the case study where we develop a new stats layer to produce a two-dimensional box plot. Box plots were originally described by John Tukey in his 1977 classic text "exploratory data analysis", so actually they are not that old. He described them as a way of visualising the so-called 5-number summary. Let's begin with a continuous variable, typical descriptive statistics for the location and spread are the mean and standard deviation, as shown here. This is ok if our data is normally distributed, which we dealt with in the best practices chapter of the last course. Recall that the mean and standard deviation may not be the best representation of the data, in particular if we have extreme values that result in a skewed distribution. That's because the mean and standard deviation are not robust, so they'll be influenced by extreme values. Here, the measure of location and spread is the median, which is the second quartile, and the inter-quartile range, that is the IQR, which is the difference between the third and first quartiles. This is what is shown in a box plot, which you can think of as the robust equivalent of showing the mean and standard deviation. The difference is that here, our summary consists of 5 values, so we get a better impression of the distribution of the data set. We get an impression of the skew because each of the four segments, the two outer whiskers and the two parts of the inner box, reflect 25% of the data. There is a further subtle point to box plots, and that is the ability to show extreme values as distinct features. An extreme values is defined as falling outside the range 1.5-times the IQR, either below Q1 or above Q3. This is called the fence and is shown here as a dotted blue line. This is one definition of an extreme value. We can also change this threshold, although typically there is no reason to do so. however you should be aware that the fences can be adjusted and the default also depends on the software. To understand how the fence works, we'll use the same data set we' we've been working on so far but we'll pull the maximum value - coloured blue on our dot plot - further and further away from it's starting position, causing the dataset to slowly become positively-skewed. Any values that surpass the fence will be drawn as a dot. The boxplot is also updated: the whisker is drawn up to the highest observations within the fence. It doesn’t matter how many values are outside the fence, or how far away they are. They all get represented as dots. The whiskers never extend beyond the fence. If you're not clear about how the plot is drawn, you can actually get a very wrong view of the data's distribution. Many people who commonly use or encounter box plots are not aware that the mid-line is the median and not the mean. Nor are they aware that the dots represent extreme values that are part of the data, and therefore should not be disregarded. So use these plots with caution! Ok, let's take a look at the box plot geom in ggplot2 and some other problems you may encounter with boxplots in the exercises.

Learn more about ggplot2 in R at: https://www.datacamp.com/courses/data-visualization-with-ggplot2-part-3 In this chapter we'll take a look at our...

Read Full Article 🔍

Curious about Ggplot2 Tutorial: Box Plots's Color? Explore detailed estimates, salary breakdowns, and financial insights that reveal the true scope of their profile.

color style guide

Source ID: j2CfvNXqv2k

Category: color style guide

View Color Profile 🔓

Disclaimer: %niche_term% estimates are based on publicly available data, media reports, and financial analysis. Actual numbers may vary.

Sponsored
Sponsored
Sponsored