Project

Project Instructions

This project involves analyzing and visualizing COVID-19 data collected by Johns Hopkins University using Python libraries such as pandas, matplotlib, and geopandas. It will assess your ability to load and manipulate real-world datasets using pandas, perform filtering and aggregation operations, dynamically present information based on user input, and create informative visualizations. Please use the provided Jupyter Notebook, which includes the necessary datasets and libraries for the project.

Your task is to develop a program that prompts the user for a U.S. state and produces a report with relevant COVID-19 statistics and visualizations for that state. The program must be dynamic and capable of handling input for any state correctly.

General Requirements

  • Develop your program based on the Program Summary and Sample Outputsection below.
  • Output must match the sample format in terms of structure, organization, spacing, indentation, text casing, and the type of information displayed.
  • Avoid hardcoding specific values; reference variables appropriately within print statements.
  • This is a research-driven project. Expect to spend time reviewing documentation and debugging.
  • You are responsible for handling formatting details not explicitly covered in class such as:
  • Formatting large numbers with commas (e.g., 1,000 instead of 1000)
  • Formatting y-axis tick values in plots to avoid scientific notation
  • Your code must include at least eight brief comments to explain key parts of the script and help the reviewer understand what your code is doing.
  • Use clear, descriptive variable names to make your code easy to understand. Abbreviations are acceptable, but single-letter variable names are not.
  • You are not allowed to use:
  • User-defined functions
  • List comprehensions
  • Lambda functions

Additional requirements are detailed below.

Coding Guidelines

  • You may reuse syntax that has been covered in the course and modify as needed.
  • However, for new syntax—such as formatting values, customizing axis ticks, and working with dates—you will need to review documentation and apply what you learn.
  • You are encouraged to work on the project using multiple code cells while developing the program. After completing each part, consolidate your program code into the one code cell and test it to ensure it works as intended (see Submission Instructions below).

Submission Instructions

Your final Jupyter Notebook must contain two code cells:

  1. First code cell (provided): Includes the necessary library imports and data sets.
  2. Second code cell: Contains all of your program code.

This second cell must include the program output based on the following inputs:

  • For the name prompt: enter your first name
  • For the state prompt: enter any U.S. state of your choice
  • For the data visualization prompt: enter 1

To be clear, your notebook should consist of only two code cells, where the second cell contains your full program and displays statistical output related to the chosen state, and Option 1 for the data visualization.

Your notebook will be downloaded, and all cells will be run to test for functionality and accuracy.

Before submitting to Canvas:

  1. Save and close your Jupyter Notebook.
  2. Reopen the Jupyter Notebook file and run all cells to verify that your code executes properly from start to finish.
  3. Produce a pdf version of your completed Jupyter Notebook.
  4. Rename both files (.ipynb and .pdf) using your GMU username and upload both files.

Academic Integrity

This project must be completed independently and in full accordance with the university’s Honor Code. You may refer to course materials and documentation, but you may not receive help from anyone when writing or debugging your code.

By submitting your work, you affirm that the code is your own and that you have not collaborated with anyone in any form.

All submissions will be analyzed for code similarity. Projects found to have significant overlap may be considered violations of academic integrity and will be reported.

Program Summary and Sample Output

Below is a sample of how your program’s output should appear when executed in a Jupyter Notebook code cell. Gray-highlighted portions indicate values that should be dynamically generated from reference variables, not hardcoded.

In this example, the user enters REVIEWER as their name and TEXAS as the state of interest. These inputs will appear throughout the output where relevant. For the state input, the title method is used to change the casing format according to the data.The data’s casing format is modified using the title method for the state input.

The first section displayed is the Timeline, showing the date associated with Day 0 of COVID-19 in the selected state. For Texas, this date is March 5, 2020, and it must be displayed in that exact format.

Following the timeline, the program summarizes COVID-19 case and death data for 2020 and 2021, along with overall totals through the latest available data/time period (note: sample output does not display actual values).

After this, the user is prompted to choose a data visualization by entering 1 or 2. Based on the user’s selection:

  • Option 1 will display four subplots showing trends in total and daily reported cases and deaths. The program output sample below provides the types of charts to be displayed and the specific data each chart should represent.
  • Option 2 will display an interactive choropleth map of reported cases and deaths by county (as of December 31, 2021), using the explore method from GeoPandas. The choropleth map pop-ups must display only the following information/labels and their respective values:
    County Name
    •  State Name
    •  Total Cases
    •  Total Deaths

The program ends after displaying the chosen visualization. It can then be re-run for the same or a different state.

Note: You do not need to build input validation for this program. Assume the user enters correct values (e.g., no typos, extra spaces, or invalid inputs).

SAMPLE  OF PROGRAM OUTPUT:

Hello. Please enter your name:  REVIEWER

Which state’s COVID-19 information would you like to see?

Enter the state:  TEXAS

COVID-19 in Texas: Key Statistics

Timeline:

Day 0 of COVID-19 in Texas: March 5, 2020

 

Texas Data by Year:

2020 (from March 5):

– Total reported cases: _ _ _ _

– Average daily new cases: _ _ _ _

– Total reported deaths: _ _ _ _

– Average daily deaths: _ _ _ _

 

2021:

– Total reported cases: _ _ _ _

– Average daily new cases: _ _ _ _

– Total reported deaths: _ _ _ _

– Average daily deaths: _ _ _ _

 

Overall Totals in Texas (as of December 31, 2021):

– Total cases: _ _ _ _

– Total deaths: _ _ _ _

 

 

REVIEWER, please select a data visualization option for Texas

  1. View four subplots showing COVID-19 trends in Texas(2020-2021):

* Total reported cases

* Daily new cases

* Total reported deaths

* Daily new deaths

  1. View a choropleth map showing total reported cases and deaths by county in Texasas of December 31, 2021.

 

Enter your choice (1 or 2):

Note: Below is a sample output when the user selects option 1 for the data visualization (actual chart data is not shown). Notice the input values are referred to on the figure title. The size of the figure is set to 12 by 9.

Charts

 

Note: The following image represents a potential output of a choropleth map, assuming the code is executed again and the user selects “option 2” for data visualization. Note that this uses sample data, so the colors displayed do not represent actual results.

Requirement: Use the explore method to create the map. Set the cmap to “YlOrRd” and the scheme to “equalinterval.”

 

Requirement: The pop-up displayed on the choropleth map must only contain the following details: County Name, State Name, Total Cases, and Total Deaths.

Project Updates

April 24:

  1. The Texas Day 0 date was adjusted to March 5, 2020, and the sample output was updated to reflect this change.
  2. Round all decimal values to two decimal places.
  3. Numbers should be displayed as integers if they don’t contain decimals.

April 25:

  1. Some counties may not appear when displaying the interactive choropleth map of counties for a given state. This is due to missing GIS data or inconsistent FIPS numbers during the merge of the two datasets. For example, not all geometry shapes for Virginia counties will display on the map. This is expected and no further action is needed.
  2. No need to create separate variables for each state or county—use a single variable to store user input and subset the DataFrame accordingly. Check out the example demo videofor guidance.