R Programming for Data Science
Course Description Overview
In our data-driven world, organizations need the right tools to extract valuable insights from that data. The R programming language is one of the tools at the forefront of data science. Its robust set of packages and statistical functions makes it a powerful choice for analyzing data, manipulating data, performing statistical tests on data, and creating predictive models from data. Likewise, R is notable for its strong data visualization tools, enabling you to create high-quality graphs and plots that are incredibly customizable.
This course will teach you the fundamentals of programming in R to get you started. It will also teach you how to use R to perform common data science tasks and achieve data-driven results for the business.
In this course, you will use R to perform common data science tasks.
You will:
- Set up an R development environment and execute simple code.
- Perform operations on atomic data types in R, including characters, numbers, and logicals.
- Perform operations on data structures in R, including vectors, lists, and data frames.
- Write conditional statements and loops.
- Structure code for reuse with functions and packages.
- Manage data by loading and saving datasets, manipulating data frames, and more.
- Analyze data through exploratory analysis, statistical analysis, and more.
- Create and format data visualizations using base R and ggplot2.
- Create simple statistical models from data.
This course is designed for students who want to learn the R programming language, particularly students who want to leverage R for data analysis and data science tasks in their organization. The course is also designed for students with an interest in applying statistics to real-world problems.
A typical student in this course should have several years of experience with computing technology, along with a proficiency in at least one other programming language.
To ensure your success in this course, you should be comfortable with basic computer programming concepts, including but not limited to: syntax, data types, conditional statements, loops, and functions. You can obtain this level of skills and knowledge by taking the United States Career Campus Introduction to Programming with Python®course.
You should also have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.
Each computer requires the following software:
- Microsoft® Windows® 10 64-bit.
- R version 4.3.2 (R-4.3.2-win.exe).
R is distributed with the course data files under version 2 of the GNU General Public License (GPL).
- RStudio® Desktop version 2023.12.0-369 (RStudio-2023.12.0-369).
RStudio is distributed with the course data files under version 3 of the Affero General Public License (AGPL).
- If necessary, software for viewing the course slides. (Instructor machine only.)
Dataset
This course uses a modified version of a third-party dataset to demonstrate data science concepts. The dataset was retrieved from: https://www.kaggle.com/aungpyaeap/supermarket-sales.
For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:
- 1 gigahertz (GHz) 64-bit (x64) processor.
- 8 gigabytes (GB) of Random Access Memory (RAM).
- 32 GB available storage space.
- Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
- Bootable DVD-ROM or USB drive.
- Keyboard and mouse or a compatible pointing device.
- Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
- IP addresses that do not conflict with other portions of your network.
- Internet access (contact your local network administrator).
- (Instructor computer only) A display system to project the instructor's computer screen.
Lesson 1: Setting Up R and Executing Simple Code
Topic A: Set Up the R Development Environment
Topic B: Write R Statements
Lesson 2: Processing Atomic Data Types
Topic A: Process Characters
Topic B: Process Numbers
Topic C: Process Logicals
Lesson 3: Processing Data Structures
Topic A: Process Vectors
Topic B: Process Factors
Topic C: Process Data Frames
Topic D: Subset Data Structures
Lesson 4: Writing Conditional Statements and Loops
Topic A: Write Conditional Statements
Topic B: Write Loops
Lesson 5: Structuring Code for Reuse
Topic A: Define and Call Functions
Topic B: Apply Loop Functions
Topic C: Manage R Packages
Lesson 6: Managing Data in R
Topic A: Load Data
Topic B: Save Data
Topic C: Manipulate Data Frames Using Base R
Topic D: Manipulate Data Frames Using dplyr
Topic E: Handle Dates and Times
Lesson 7: Analyzing Data in R
Topic A: Examine Data
Topic B: Explore the Underlying Distribution of Data
Topic C: Identify Missing Values
Lesson 8: Visualizing Data in R
Topic A: Plot Data Using Base R Functions
Topic B: Plot Data Using ggplot2
Topic C: Format Plots in ggplot2
Topic D: Create Combination Plots
Lesson 9: Modeling Data in R
Topic A: Create Statistical Models in R
Topic B: Create Machine Learning Models in R