Using Data Science Tools in Python®
Course Description Overview
In this course, you will use various Python tools to load, analyze, manipulate, and visualize business data.
You will:
- Set up a Python data science environment.
- Manage and analyze data with NumPy arrays.
- Manipulate and modify data with NumPy arrays.
- Manage and analyze data with pandas DataFrames.
- Manipulate, modify, and visualize data with pandas DataFrames.
- Visualize data with Matplotlib and Seaborn.
This course is designed for students who wish to expand their ability to extract knowledge from business data. The target student for this course understands the principles and benefits of data science and has used basic data-driven tools like Microsoft® Excel® and Structured Query Language (SQL) queries, but wants to take the next steps into more advanced applications of data science.
So, the target student may be a programmer or data analyst looking to solve business problems using powerful programming libraries that go beyond the limitations of prepackaged GUI tools or database queries; libraries that give the data scientist more fine-tuned control over the analysis, manipulation, and presentation of data.
A typical student in this course should have several years of experience with computing technology, along with a proficiency in programming.
To ensure your success in this course, you should have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.
You should also be proficient in programming with Python. You can obtain this level of skills and knowledge by taking the following United States Career Campus courses:
- Python® Programming: Introduction
- Python® Programming: Advanced
Each computer requires the following software:
- Microsoft® Windows® 10 64-bit.
- Oracle® VM VirtualBox version 6.0.10 ( VirtualBox-6.0.10-132072-Win.exe).
VirtualBox is distributed with the course data files under version 2 of the GNU General Public License (GPL).
- Anaconda® for Python 3 version 2020.02.
Anaconda is distributed with the course data files under a Berkeley Software Distribution (BSD) license.
- If necessary, software for viewing the course slides. (Instructor machine only.)
Note: While it is possible to run VirtualBox on other operating systems, this course was written and tested using Windows 10. If your classroom computers will use a different operating system, it is highly recommended that you install and test VirtualBox and the course VM on the computers to make sure you can key through the course successfully before delivering a class.
Note: The Linux operating system is already installed on the VM that will be loaded in VirtualBox. Specifically, this VM runs the Debian 10 ("Buster") distribution.
Note: The system on the VM is configured to log the user in automatically. If you or your students are prompted at any time to log in, the account is named student and the password is Pa22w0rd.
Dataset:
This course uses a modified version of a third-party dataset to demonstrate data science concepts. The dataset was retrieved from: https://www.kaggle.com/aungpyaeap/supermarket-sales.
For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:
- 2 gigahertz (GHz) 64-bit (x64) processor that supports the VT-x or AMD-V virtualization instruction set and Second Level Address Translation (SLAT).
- 8 gigabytes (GB) of Random Access Memory (RAM).
- 32 GB available storage space.
- Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
- Bootable DVD-ROM or USB drive.
- Keyboard and mouse or a compatible pointing device.
- Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
- IP addresses that do not conflict with other portions of your network.
- Internet access (contact your local network administrator).
- (Instructor computer only) A display system to project the instructor's computer screen.
Lesson 1: Setting Up a Python Data Science Environment
Topic A: Select Python Data Science Tools
Topic B: Install Python Using Anaconda
Topic C: Set Up an Environment Using Jupyter Notebook
Lesson 2: Managing and Analyzing Data with NumPy
Topic A: Create NumPy Arrays
Topic B: Load and Save NumPy Data
Topic C: Analyze Data in NumPy Arrays
Lesson 3: Transforming Data with NumPy
Topic A: Manipulate Data in NumPy Arrays
Topic B: Modify Data in NumPy Arrays
Lesson 4: Managing and Analyzing Data with pandas
Topic A: Create Series and DataFrames
Topic B: Load and Save pandas Data
Topic C: Analyze Data in DataFrames
Topic D: Slice and Filter Data in DataFrames
Lesson 5: Transforming and Visualizing Data with pandas
Topic A: Manipulate Data in DataFrames
Topic B: Modify Data in DataFrames
Topic C: Plot DataFrame Data
Lesson 6: Visualizing Data with Matplotlib and Seaborn
Topic A: Create and Save Simple Line Plots
Topic B: Create Subplots
Topic C: Create Common Types of Plots
Topic D: Format Plots
Topic E: Streamline Plotting with Seaborn