Data Analyst’s Recipe | How to create a scatter plot in R

Nilimesh Halder, PhD
Analyst’s corner
Published in
4 min readMar 1, 2023

--

A step-by-step tutorial on data visualisation in R using ggplot2

Creating a scatter plot is a powerful way to visualize relationships between two continuous variables. In R, scatter plots can be easily created using the ggplot2 package. In this tutorial, we will walk through the steps to create a scatter plot in R using the ggplot2 package.

1. Loading the data

First, we need to load the data that we want to use for our scatter plot. For this tutorial, we will be using the iris dataset which is included in the datasets package in R.

# Load the iris dataset
data(iris)

2. Creating a scatter plot using ggplot2

Next, we will create a scatter plot using the ggplot2 package. We will use the ggplot() function to create the basic plot object and then add layers to customize the plot.

# Load the ggplot2 package
library(ggplot2)

# Create a basic scatter plot
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point()

In the above code, we first loaded the ggplot2 package using the library() function. Then, we created a basic scatter plot using the ggplot() function and specified the iris dataset as the data source. We used the aes() function to specify the variables to be plotted on the x and y axes. Finally, we added a layer to the plot using the geom_point() function to create the scatter plot itself.

3. Customizing the scatter plot

Now that we have created a basic scatter plot, we can customize it to make it more visually appealing and informative. Here are a few examples:

Changing the color of the points based on a third variable

# Create a scatter plot with points colored by Species
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point()

In the above code, we added a new argument to the aes() function to specify that the points should be colored by the Species variable. This creates a scatter plot where each species is represented by a different color.

Adding a title and axis labels

# Create a scatter plot with a title and axis labels
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
labs(title = "Sepal Length vs. Sepal Width", x = "Sepal Length", y = "Sepal Width")

In the above code, we added a new layer to the plot using the labs() function to specify the title and axis labels.

Changing the point shape

# Create a scatter plot with different point shapes based on Species
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, shape = Species)) +
geom_point() +
labs(title = "Sepal Length vs. Sepal Width", x = "Sepal Length", y = "Sepal Width")

In the above code, we added a new argument to the aes() function to specify that the point shapes should be based on the Species variable. This creates a scatter plot where each species is represented by a different point shape.

Another example:

Here, as an example, we will use the mtcars dataset from the datasets package in R.

# Load the mtcars dataset
data(mtcars)

# Create a scatter plot of mpg vs. wt
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point() +
labs(title = "MPG vs. Weight", x = "Weight", y = "Miles Per Gallon")

In the above code, we created a scatter plot of mpg (miles per gallon) vs. wt (weight) using the mtcars dataset. We added a title and axis labels using the labs() function.

I hope this helps you to create scatter plots in R !

--

--