7 Module 1.4: Reading and Writing Data

So far, we’ve created data inside R. But usually, your breeding data exists in external files, like Excel spreadsheets or CSV files. We need to get this data into R and save our results out of R.

7.1 Common Data File Formats

CSV (Comma Separated Values - .csv): Plain text file where columns are separated by commas. Very common, easily readable by many programs (including R and Excel). Often the best format for sharing data.
TSV (Tab Separated Values - .tsv): Similar to CSV, but uses tabs to separate columns.
Excel Files (.xls, .xlsx): Native Microsoft Excel format. Can contain multiple sheets, formatting, formulas. Requires specific R packages to read/write.
Text Files(.txt): Additionally, data can be saved as a simple text file. This file type can support comma or tab separated values. You would simply need to specify your separator when reading the file.

7.2 Paths, Working Directory, and RStudio Projects (Best Practice!)

R needs to know where to find your files.

Working Directory: The default folder location R looks in. You can see it with getwd() and set it with setwd("path/to/folder"), but setting it manually is usually bad practice because it makes your code non-portable.
Absolute Path: The full path from the root of your computer (e.g., "C:/Users/YourName/Documents/BreedingData/trial1.csv"). Avoid this! It breaks if you move folders or share your code.
Relative Path & RStudio Projects (RECOMMENDED):
1. Organize your work using an RStudio Project. Create one via File -> New Project -> Existing Directory... and select your main course folder (course_project_baku).
2. When you open the .Rproj file, RStudio automatically sets the working directory to that project folder.
3. Keep your data files inside the project folder, ideally in subdirectories like data/raw (original data) or data/example (cleaned data for examples).
4. Refer to files using relative paths starting from the project root, like "data/example/phenotypes.csv". This makes your analysis reproducible and easy to share!

7.3 Reading Data into R

We’ll use functions from the readr (for CSV/TSV) and readxl (for Excel) packages. Make sure they are installed (see Module 1.1).

# Load the necessary libraries
library(readr)
library(readxl)
library(dplyr) # for glimpse

# --- Reading a CSV file ---
# Assumes you have a file 'sample_phenotypes.csv' in the 'data/example' folder
# relative to your project root.
pheno_file_path <- "data/sample_phenotypes.csv"

# Check if file exists before trying to read (good habit)
if (file.exists(pheno_file_path)) {
  # Use read_csv from the readr package (generally preferred)
  phenotype_data <- read_csv(pheno_file_path)

  print("CSV data loaded successfully:")
  head(phenotype_data)   # Look at the first 6 rows
  glimpse(phenotype_data) # See column names and data types

} else {
  print(paste("Error: Phenotype file not found at", pheno_file_path))
  phenotype_data <- NULL # Set to NULL if file not found
}

# Note: Base R has read.csv() - it works but readr::read_csv() is often faster
# and handles data types more consistently (e.g., doesn't default strings to factors).

# --- Reading an Excel file ---
# Assumes you have 'sample_trial.xlsx' in 'data/example'
excel_file_path <- "data/sample_trial.xlsx" # You'll need to create this file

if (file.exists(excel_file_path)) {
  # See what sheets are in the workbook
  excel_sheets(excel_file_path)

  # Read data from a specific sheet (e.g., "YieldData")
  # yield_data_excel <- read_excel(excel_file_path, sheet = "YieldData")

  # Or read by sheet number (first sheet is 1)
  # yield_data_excel <- read_excel(excel_file_path, sheet = 1)

  # print("Excel data loaded:")
  # glimpse(yield_data_excel)

} else {
  print(paste("Warning: Example Excel file not found at", excel_file_path))
}

Always inspect your data after loading! Use head(), str(), glimpse(), summary(). Did R read the column names correctly? Are the data types what you expected (numeric, character, etc.)?

7.4 Writing Data out of R

After cleaning data or performing analysis, you’ll want to save results.

# Load the necessary libraries
library(readr)
library(readxl)
library(dplyr) # for glimpse

# --- Reading a CSV file ---
# Assumes you have a file 'sample_phenotypes.csv' in the 'data/example' folder
# relative to your project root.
pheno_file_path <- "data/sample_phenotypes.csv"

# Check if file exists before trying to read (good habit)
if (file.exists(pheno_file_path)) {
  # Use read_csv from the readr package (generally preferred)
  phenotype_data <- read_csv(pheno_file_path)

  print("CSV data loaded successfully:")
  head(phenotype_data)   # Look at the first 6 rows
  glimpse(phenotype_data) # See column names and data types

} else {
  print(paste("Error: Phenotype file not found at", pheno_file_path))
  phenotype_data <- NULL # Set to NULL if file not found
}

# Note: Base R has read.csv() - it works but readr::read_csv() is often faster
# and handles data types more consistently (e.g., doesn't default strings to factors).

# --- Reading an Excel file ---
# Assumes you have 'sample_trial.xlsx' in 'data/example'
excel_file_path <- "data/sample_trial.xlsx" # You'll need to create this file

if (file.exists(excel_file_path)) {
  # See what sheets are in the workbook
  excel_sheets(excel_file_path)

  # Read data from a specific sheet (e.g., "YieldData")
  # yield_data_excel <- read_excel(excel_file_path, sheet = "YieldData")

  # Or read by sheet number (first sheet is 1)
  # yield_data_excel <- read_excel(excel_file_path, sheet = 1)

  # print("Excel data loaded:")
  # glimpse(yield_data_excel)

} else {
  print(paste("Warning: Example Excel file not found at", excel_file_path))
}

Exercise: If you have a simple Excel file with some breeding data (e.g., Plot ID, Variety, Yield), try reading it into R using read_excel(). Inspect the loaded data frame using glimpse().