R Programming for Data Science

Course Description Overview

Course Number:
094025
Course Length:
5 days
Course Description Overview:

In our data-driven world, organizations need the right tools to extract valuable insights from that data. The R programming language is one of the tools at the forefront of data science. Its robust set of packages and statistical functions makes it a powerful choice for analyzing data, manipulating data, performing statistical tests on data, and creating predictive models from data. Likewise, R is notable for its strong data visualization tools, enabling you to create high-quality graphs and plots that are incredibly customizable.


This course will teach you the fundamentals of programming in R to get you started. It will also teach you how to use R to perform common data science tasks and achieve data-driven results for the business.


Course Objectives:

In this course, you will use R to perform common data science tasks.


You will:

  • Set up an R development environment and execute simple code.
  • Perform operations on atomic data types in R, including characters, numbers, and logicals.
  • Perform operations on data structures in R, including vectors, lists, and data frames.
  • Write conditional statements and loops.
  • Structure code for reuse with functions and packages.
  • Manage data by loading and saving datasets, manipulating data frames, and more.
  • Analyze data through exploratory analysis, statistical analysis, and more.
  • Create and format data visualizations using base R and ggplot2.
  • Create simple statistical models from data.
Target Student:

This course is designed for students who want to learn the R programming language, particularly students who want to leverage R for data analysis and data science tasks in their organization. The course is also designed for students with an interest in applying statistics to real-world problems.


A typical student in this course should have several years of experience with computing technology, along with a proficiency in at least one other programming language.

Prerequisites:

To ensure your success in this course, you should be comfortable with basic computer programming concepts, including but not limited to: syntax, data types, conditional statements, loops, and functions. You can obtain this level of skills and knowledge by taking the United States Career Campus Introduction to Programming with Python®course.


You should also have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.

Course-specific Technical Requirements Software:

Each computer requires the following software:

  • Microsoft® Windows® 10 64-bit.
  • R version 4.3.2 (R-4.3.2-win.exe).

    R is distributed with the course data files under version 2 of the GNU General Public License (GPL).

  • RStudio® Desktop version 2023.12.0-369 (RStudio-2023.12.0-369).

    RStudio is distributed with the course data files under version 3 of the Affero General Public License (AGPL).

  • If necessary, software for viewing the course slides. (Instructor machine only.)


Dataset


This course uses a modified version of a third-party dataset to demonstrate data science concepts. The dataset was retrieved from: https://www.kaggle.com/aungpyaeap/supermarket-sales.

Course-specific Technical Requirements Hardware:

For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:

  • 1 gigahertz (GHz) 64-bit (x64) processor.
  • 8 gigabytes (GB) of Random Access Memory (RAM).
  • 32 GB available storage space.
  • Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
  • Bootable DVD-ROM or USB drive.
  • Keyboard and mouse or a compatible pointing device.
  • Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
  • IP addresses that do not conflict with other portions of your network.
  • Internet access (contact your local network administrator).
  • (Instructor computer only) A display system to project the instructor's computer screen.
Certification reference (where applicable)
-
Course Content:

Lesson 1: Setting Up R and Executing Simple Code

Topic A: Set Up the R Development Environment

Topic B: Write R Statements


Lesson 2: Processing Atomic Data Types

Topic A: Process Characters

Topic B: Process Numbers

Topic C: Process Logicals


Lesson 3: Processing Data Structures

Topic A: Process Vectors

Topic B: Process Factors

Topic C: Process Data Frames

Topic D: Subset Data Structures


Lesson 4: Writing Conditional Statements and Loops

Topic A: Write Conditional Statements

Topic B: Write Loops


Lesson 5: Structuring Code for Reuse

Topic A: Define and Call Functions

Topic B: Apply Loop Functions

Topic C: Manage R Packages


Lesson 6: Managing Data in R

Topic A: Load Data

Topic B: Save Data

Topic C: Manipulate Data Frames Using Base R

Topic D: Manipulate Data Frames Using dplyr

Topic E: Handle Dates and Times


Lesson 7: Analyzing Data in R

Topic A: Examine Data

Topic B: Explore the Underlying Distribution of Data

Topic C: Identify Missing Values


Lesson 8: Visualizing Data in R

Topic A: Plot Data Using Base R Functions

Topic B: Plot Data Using ggplot2

Topic C: Format Plots in ggplot2

Topic D: Create Combination Plots


Lesson 9: Modeling Data in R

Topic A: Create Statistical Models in R

Topic B: Create Machine Learning Models in R


Appendix A: Handling Issues in Code


Appendix B: R Resources

Registration
Register Now