Using Data Science Tools in Python®

Course Description Overview

Course Number:
094001
Course Length:
2 days
Course Description Overview:
More and more organizations are turning to data science to help guide business decisions. Regardless of industry, the ability to extract knowledge from data is crucial for a modern business to stay competitive. One of the tools at the forefront of data science is the Python® programming language. Python's robust libraries have given data scientists the ability to load, analyze, shape, clean, and visualize data in easy to use, yet powerful, ways. This course will teach you the skills you need to successfully use these key libraries to extract useful insights from data, and as a result, provide great value to the business.
Course Objectives:

In this course, you will use various Python tools to load, analyze, manipulate, and visualize business data.


You will:

  • Set up a Python data science environment.
  • Manage and analyze data with NumPy arrays.
  • Manipulate and modify data with NumPy arrays.
  • Manage and analyze data with pandas DataFrames.
  • Manipulate, modify, and visualize data with pandas DataFrames.
  • Visualize data with Matplotlib and Seaborn.
Target Student:

This course is designed for students who wish to expand their ability to extract knowledge from business data. The target student for this course understands the principles and benefits of data science and has used basic data-driven tools like Microsoft® Excel® and Structured Query Language (SQL) queries, but wants to take the next steps into more advanced applications of data science.


So, the target student may be a programmer or data analyst looking to solve business problems using powerful programming libraries that go beyond the limitations of prepackaged GUI tools or database queries; libraries that give the data scientist more fine-tuned control over the analysis, manipulation, and presentation of data.


A typical student in this course should have several years of experience with computing technology, along with a proficiency in programming.

Prerequisites:

To ensure your success in this course, you should have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.


You should also be proficient in programming with Python. You can obtain this level of skills and knowledge by taking the following United States Career Campus courses:

  • Python® Programming: Introduction
  • Python® Programming: Advanced
Course-specific Technical Requirements Software:

Each computer requires the following software:

  • Microsoft® Windows® 10 64-bit.
  • Oracle® VM VirtualBox version 6.0.10 ( VirtualBox-6.0.10-132072-Win.exe).

    VirtualBox is distributed with the course data files under version 2 of the GNU General Public License (GPL).

  • Anaconda® for Python 3 version 2020.02.

    Anaconda is distributed with the course data files under a Berkeley Software Distribution (BSD) license.

  • If necessary, software for viewing the course slides. (Instructor machine only.)

Note: While it is possible to run VirtualBox on other operating systems, this course was written and tested using Windows 10. If your classroom computers will use a different operating system, it is highly recommended that you install and test VirtualBox and the course VM on the computers to make sure you can key through the course successfully before delivering a class.


Note: The Linux operating system is already installed on the VM that will be loaded in VirtualBox. Specifically, this VM runs the Debian 10 ("Buster") distribution.


Note: The system on the VM is configured to log the user in automatically. If you or your students are prompted at any time to log in, the account is named student and the password is Pa22w0rd.


Dataset:


This course uses a modified version of a third-party dataset to demonstrate data science concepts. The dataset was retrieved from: https://www.kaggle.com/aungpyaeap/supermarket-sales.


Course-specific Technical Requirements Hardware:

For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:

  • 2 gigahertz (GHz) 64-bit (x64) processor that supports the VT-x or AMD-V virtualization instruction set and Second Level Address Translation (SLAT).
  • 8 gigabytes (GB) of Random Access Memory (RAM).
  • 32 GB available storage space.
  • Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
  • Bootable DVD-ROM or USB drive.
  • Keyboard and mouse or a compatible pointing device.
  • Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
  • IP addresses that do not conflict with other portions of your network.
  • Internet access (contact your local network administrator).
  • (Instructor computer only) A display system to project the instructor's computer screen.
Certification reference (where applicable)
-
Course Content:

Lesson 1: Setting Up a Python Data Science Environment

Topic A: Select Python Data Science Tools

Topic B: Install Python Using Anaconda

Topic C: Set Up an Environment Using Jupyter Notebook


Lesson 2: Managing and Analyzing Data with NumPy

Topic A: Create NumPy Arrays

Topic B: Load and Save NumPy Data

Topic C: Analyze Data in NumPy Arrays


Lesson 3: Transforming Data with NumPy

Topic A: Manipulate Data in NumPy Arrays

Topic B: Modify Data in NumPy Arrays


Lesson 4: Managing and Analyzing Data with pandas

Topic A: Create Series and DataFrames

Topic B: Load and Save pandas Data

Topic C: Analyze Data in DataFrames

Topic D: Slice and Filter Data in DataFrames


Lesson 5: Transforming and Visualizing Data with pandas

Topic A: Manipulate Data in DataFrames

Topic B: Modify Data in DataFrames

Topic C: Plot DataFrame Data


Lesson 6: Visualizing Data with Matplotlib and Seaborn

Topic A: Create and Save Simple Line Plots

Topic B: Create Subplots

Topic C: Create Common Types of Plots

Topic D: Format Plots

Topic E: Streamline Plotting with Seaborn


Appendix A: Scraping Web Data Using Beautiful Soup

Registration
Register Now