A dataset, or data set, is simply a collection of data.
The simplest and most common format for datasets you’ll find online is a spreadsheet or CSV format — a single file organized as a table of rows and columns. But some datasets will be stored in other formats, and they don’t have to be just one file. Sometimes a dataset may be a zip file or folder containing multiple data tables with related data.
Different datasets are created in different ways. Some data sets will machine-generated data. Some will be data that’s been collected via surveys. Some may be data that’s recorded from human observations. Some may be data that’s been scraped from websites or pulled via APIs.
Whenever you’re working with a dataset, it’s important to consider: how was this dataset created? Where does the data come from? Don’t jump right into the analysis; take the time to first understand the data you are working with.