install.packages("remotes", repos="https://cloud.r-project.org/")
speedsR data package
A Comprehensive R Package Providing Sports-Specific Datasets for the AIS SPEEDS Project.
speedsR
Overview
SPEEDS - Sport and Exercise Science Excellence Through Data Science - is a project funded by the Australian Institute of Sport that aims to elevate data science literacy among Australian sports professionals. One of the key missions of the SPEEDS initiative is to curate a repository of de-identified, sport-specific datasets. These datasets are crucial for developing data science educational resources tailored to Australian sport and exercise science professionals.
The speedsR
R data package, specifically designed for the SPEEDS project, offers an extensive collection of sports-specific datasets. It facilitates easy access to these datasets, enabling analysis and research within the sports science context. speedsR
can be easily downloaded from the AIS SPEEDS repository and integrated as a library in any R-based Integrated Development Environment (IDE).
Synthetic Data
The speedsR
package includes a combination of open-source and synthetic datasets. For certain datasets collected from sport and exercise science professionals, confidentiality constraints make it unsuitable to include the original data directly in the speedsR
package. In these cases, we provide synthetic equivalents generated using the Synthetic Data Vault
(SDV) and synthcity
libraries in Python, as well as the synthpop
package in R. These synthetic datasets are designed to reflect the structure and statistical properties of the original data, preserving the privacy and confidentiality of individuals involved.
Please note that these synthetic datasets are intended primarily for teaching purposes and may not be suitable for formal statistical analyses.
Installing speedsR
from 
speedsR
is available for installation directly from , using the
remotes
package. If you have not already, please install remotes
:
Once installed, load remotes
into your R session:
library(remotes)
In order to install packages from , ensure
is also set up on your machine. If it is not installed, download and install Git. This will allow
remotes
to clone and install packages from directly.
The easiest way to check if is installed on your computer is by using the terminal (on macOS or Linux) or the command prompt (on Windows).
macOS and Linux:
- Open the Terminal.
- Type
git --version
and press Return. - If you see a version number, it means Git is installed. If you see a message like “command not found,” then
is not installed.
Windows:
- Open the Command Prompt.
- Type
git --version
and press Enter. - If you see a version number, it means Git is installed. If you see a message like ” ‘git’ is not recognized as an internal or external command”, then
is not installed.
Now, the speedsR
package can be installed using the following command:
::install_github("ais-speeds/speedsR") remotes
Once installed, load speedsR
as you would any R package::
library(speedsR)
List of Available Datasets
To explore the datasets included in the speedsR
package, you can use several functions. Each offers a different level of detail about the datasets:
To view the list of datasets: Use
data(package = "speedsR")
. This command displays a simple list of all datasets in thespeedsR
package.For detailed documentation: Use
help(package = "speedsR")
. This provides more comprehensive information, including documentation for each dataset.For specific information: Use
?speedsR
to access details about thespeedsR
package itself, and?dataset_name
(replacing ‘dataset_name’ with your dataset of interest) for information on a specific dataset.
Data Format in speedsR
Datasets in speedsR
are provided as tibbles. A tibble is a modern take on the dataframe, part of the tidyverse
in R. It’s similar to a dataframe and is particularly convenient for data analysis and manipulation in R, offering a user-friendly structure.
Here’s how you can manipulate data from speedsR
:
library(speedsR)
# Load and store a dataset with a descriptive variable name.
# Here, we use the HbmassSynth dataset as an example.
<- HbmassSynth
descriptive_variable_name
# View the first few rows of the dataset
head(descriptive_variable_name)
# Get a basic descriptive statistics summary of the dataset
summary(descriptive_variable_name)