A Gentler Introduction to Exploratory Data Analysis

Data Scientist Dude
4 min readFeb 16, 2023
Everyone loves the Iris dataset because it is clean and easy to use.

If you have meager experience with the programming language R and the IDE R-Studio, here is an easy practice template for taking an initial look at your data. These are some of the functions that can be used in a rudimentary exploratory data analysis (EDA) with the built-in Iris dataset.

library(datasets) #ensure all the base R datasets are loaded
names(iris)
dim(iris)
str(iris)
summary(iris)
head(iris)

The results should look like this:

This tells you the names of the variables, how many rows and columns there are, the data types, important statistics and even give you a sense of what individual entries look like
# it is a good practice to always look at your data in tabular form
View(iris)
pro tip: glimpse( ) is a function from the pillar package re-exported by the dplyr package. Some people prefer this to View( )

Before you can try this on your own dataset, save your dataset as .csv file in your computer. Very often I will save a new dataset right on my computer desktop. This ensures my dataset is in an accessible place.

Next check your working directory. Your working directory is where R looks for data that you want to edit and manipulate.

--

--

Data Scientist Dude
Data Scientist Dude

Written by Data Scientist Dude

Data Scientist, Linguist and Autodidact - I help people understand and use data models.

No responses yet