Ggplot Don t Know How to Automatically Pick Scale for Object of Type List Defaulting to Continuous
Data Visualization with ggplot
A great resource for data visualuation in
R
is the R Graph Gallery. The examples and information below is a small sample ofR
data visualization basics.
ggplot
ggplot2
(referred to as ggplot
) is a powerful graphics package that can be used to make very impressive data visualizations (see contributions to #TidyTueday on Twitter, for example). The following examples will make use of the Learning R Survey data, which has been partially processed (Chapters 2 and 3) and the palmerpenguins
data set, as well as several of datasets included with R
to show the basic principles of using ggplot
. Then, we will put these basics together to make several beautiful visualizations.
Grammar of Graphics
The "gg" of ggplot
refers to the "grammar of graphics". For ggplot
, this means a visualization must have specific elements to make a complete graphic, just as an utterance or written line must have specific elements to make a grammatically correct sentence.
A simple plot contains the following elements:
Data
There are several ways to refer to a data object in ggplot
. You can call data within ggplot
(e.g.ggplot(rsurvey)
) or you can call it outside of ggplot
within a dplyr
chain (e.g rsurvey %>% ggplot()
). The advantage of this is you can easily manipulate data directly into ggplot
without saving it as a data object:
You can also call data for individual shapes. This would allow you to use different data objects to form your graphic, or to use the same data object but filter it to display different information. The following graphic demonstrates this.
Please note this graphic is for demonstration purposes only and does not represent a useful data visualization.
A note about
+
Across the
Tidyverse
, the%>%
"pipe operator" is used to chain commands into simple codes. It stands for "and then". However, inggplot
, despite being part of the tidyverse, different elements are connected with the plus sign+
.
Aesthetics
Aesthetics are the way you connect data to the elements inside the graphic. Aesthetics tell ggplot
what should be on the x-axis, what should be on the y-axis, and what the colors should be.
Different geometries (shapes) may have different aesthetics, but x, y, and color/fill are the most common.
-
color=
is used for:-
geom_point()
- dots, circles, scatterplots -
geom_line()
- line charts
-
-
fill=
is used for:-
geom_col()
/geom_bar()
- column/bar charts -
geom_area()
- area charts
-
Using color=
or fill=
to refer to a categorical variable (called a "discrete" variable in ggplot
) allows you to separate the shape by that category. Here is an example with and without specifiying a color:
Geometries
Geometries are the different shapes one can make using ggplot
. They all start with geom_
and can be stacked together by simply using +
. The first geom
is always first layer and any additional layers are stacked on top of it. (See [Lollipop Charts][Lollipop Charts] for an example.)
Bar Charts
You can make bar charts with either geom_bar()
or geom_col()
.
geom_bar
geom_bar
requires:
- an x-value and is useful if you are just getting **a count of the data*
geom_bar
may also have:
- a y-value and a
stat=
value if you want to specify how the y-value data should be shown-
stat="identity"
- gives a sum of all the values of y -
stat="summary"
- gives a mean of the values of y
-
Compare:
geom_col
geom_col
requires x and y values. It does not use stat=
.
Horizontal Bar Chart
To make a horizontal bar chart, add coord_flip()
:
## Warning: Removed 2 rows containing non-finite values ## (stat_summary).
## No summary function supplied, defaulting to `mean_se()`
note: coord_flip()
can be placed anywhere after ggplot()
Stacked Bar Chart
To make a stacked bar chart, include fill=
## No summary function supplied, defaulting to `mean_se()`
100% Stacked Bar Chart
To make a 100% stacked bar chart, include fill=
and position="fill"
after aes()
Side-by-Side Bar Chart
To make a side-by-side bar chart, include fill=
and position="dodge"
after aes()
Histograms
Histograms can be made with geom_histogram
. They only require an x-value. You can decide the bin width by adding binwidth=
after aes()
Boxplots
A boxplot uses geom_boxplot()
. It requires x-values. Y-values are optional but useful if you want to compare multiple boxplots.
Note: Use coord_flip()
to make it easier to read.
Scatterplots
We can use geom_point()
for scatterplots. This requires x and y values, both continuous.
Point Size
Point size can be based on a single number or the data itself.
You can control overall point size by adding size=
after aes()
:
You can make the values in the data also determine point size by using size=
inside aes()
:
Point Color
You can differentiate the dots by adding color=
Shapes
Shapes are controlled like color and size. Inside aes()
means that shapes are mapped to data. Outside aes()
means there is one shape.
**A note on
alpha=
alpha
is called afteraes()
to set the transparency of overlapping shapes. An alpha of0
is complete transparency while an alpha of1
is no transparency. An alpha of.5
, set above, is 50% transparency.
Scatterplots for Categorical Variables
If you want to make a scatterplot for categorical variables, you will simply get a line of dots for each variable unless you use geom_jitter()
, which adds random fluctuation in the variables.
Compare:
You can also use position_jitter(width = NULL, height = NULL, seed = NA)
inside of geom_point()
to achieve a similar effect.
Barbell Charts
Barbell charts compare plot two related variables with a dot and show the distance between them with a line.
You can combine geom_point()
with geom_linerange()
to make a simple lollipop chart. geom_linerange()
should be called first, as it must go below the dots layer for its line ends to be hidden by the dot. First, we will summarize the penguin data and then compare.
The following code builds the graphic by combining different data layers and different geometry layers.
ggplot()+ coord_flip()+ geom_linerange(data=penguins %>% group_by(species, sex) %>% summarize(body_mass = mean(body_mass_g, na.rm=T)) %>% drop_na(sex) %>% pivot_wider(names_from = sex, values_from=body_mass), aes(x=species, ymin=male, ymax=female))+ geom_point(data=penguins %>% group_by(species, sex) %>% summarize(body_mass = mean(body_mass_g, na.rm=T)) %>% drop_na(sex), aes(x=species, y=body_mass, color=sex), size= 5)
## `summarise()` regrouping output by 'species' (override with `.groups` argument) ## `summarise()` regrouping output by 'species' (override with `.groups` argument)
Line Charts
You can use geom_line()
for line charts to display values over time. geom_line()
requires an additional group=
aesthetic. If there should be only 1 line because there is only 1 time variable, then use group=1
. If you want to split the lines based on another variable, use group=variable_name
.
For the below example, we will use the AirPassengers
data that comes with R
and transform it into a dataframe following an example from StackOverflow
A line graph displaying a single line for year
## `summarise()` ungrouping output (override with `.groups` argument)
A line graph displaying 1 line per month
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous. ## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
We can add labels to the ends of the line using geom_label()
(see Labels) but the lines are very close together, so we will use ggrepel()
instead. This gives the labels space and connects them with their lines.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous. ## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
Colors
For colors related to values in a data set, see Aesthetics
You can change the color of all the chart elements of a geometry using fill=
outside of aes()
. Here, color is not mapped to the data, thus it is inside geom_col
but not in aes()
. You can use R
color names (e.g. "blue", "black", "grey80"), hex values (e.g. "#cccccc" or "#a85001", or RGB values (e.g.rgb(0, 155, 255)
).
## `summarise()` ungrouping output (override with `.groups` argument)
See here for a list of R color names.
You can also use other color palettes by installing viridis
or hrbrmst themes
Labels
You can add labels with geom_label
or geom_text
. geom_text
is just text and geom_label
is text inside a rounded white box (this, of course, can be changed). Compare:
Note: Because there is no y
value, these graphics use y=..count..
to get the total number and stat="count"
to say you will use the sum in the aesthetic.
Multiple Plots
Faceting
You can break a graphic into multiple plots (or facets) using facet_wrap(~variable)
. Here is an example:
Note: The months are not in order. To put them in order, you would first need to use factor()
inside a mutate()
command.
You can control the number of rows/columns use nrow=
or ncol=
:
airpassengers %>% mutate(month = factor(month, levels= c( "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Oct", "Nov", "Dec" ))) %>% ggplot()+ geom_line(aes(x=year, y=AirPassengers, group=month))+ facet_wrap(~month, ncol= 3)+ labs(title= "Months ordered and placed in 3 columns")
patchwork
You can also use the patchwork
package to connect different plots using +
, /
, and |
.
First, save your plots as data objects:
Using +
- Side-by-Side
Using /
- Stacked
Using |
- Nested
Themes
Themes control the overall look and feel of ggplot
. If there is a specific theme, it is called using theme_name()
. If you are modifying theme elements, you will use theme()
Pre-Installed Themes
Here are some examples:
ggthemes
and other theme packages
ggthemes
and hrbrthemes
are two popular theme packages. Here are some examples:
Controlling theme elements
If you simply want to remove lines, change the legend position, etc, you can use theme()
. Here are two quick examples.
There are a lot of ways you can customize a theme. See https://ggplot2.tidyverse.org/reference/theme.html.
Source: https://bookdown.org/aschmi11/RESMHandbook/data-visualization-with-ggplot.html
Post a Comment for "Ggplot Don t Know How to Automatically Pick Scale for Object of Type List Defaulting to Continuous"