LoadDatasets Class#

class LoadDatasets.LoadDataSets(to_load, nation)#

Bases: object

This class will load datasets from the data CSV files, clean the datasets and make them accessable through callable functions.

This class will offer the datasets up in two formats;

  1. As a list

  2. In a dataframe

Data is converted into a list is to make it accessable to non data scientists who haven’t worked with dataframes before. I offer it up in dataframes for data scientists and to allow for easier analysis/incoroporation into computer models.

For usage examples view the examples scripts in the root folder of the github repo.

get_age_cat_string_list()#

Returns age strings for each age categories used to label graphs.

Returns:

List of strings used for age profiled charts, these are the string descriptions of each age groups used in the legends of charts.

get_age_groups(age_group)#

Returns a given age group for a given age_group index, age groups from 0-18

Args:
age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Returns:

Integer Value corresponding to the age group that you assigned to age_group. For instance:

get_age_groups(0)

will return 00_04:

get_age_groups(12)

will return 60_64

get_age_groups_literal()#

Returns a list of age groups used in the aged data dataframe.

Returns:

List of labels that represent each age group within the dataframe.

get_aged_case_data(age_group)#

Returns an list of case values ready to plot, age groups from 0-18

Args:
age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down.

Returns:

List of cases by specimen date for the given age group.

get_aged_data_cases(age_group)#

returns aged case data for a given age group as a dataframe.

Args:
age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down

Returns:

Dataframe representing the csv file dataAge.csv for cases.

get_aged_data_deaths(age_group)#

returns aged death data for a given age group as a dataframe.

Args:
age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down

Returns:

Dataframe representing the csv file dataAge.csv for age deaths.

get_aged_data_frames()#

Returns full dataframes for aged cases and aged deaths.

Returns:

Dataframe representing the csv file dataAge.csv for cases and age deaths.

get_aged_death_data(age_group)#

Returns an list of death values ready to plot, age groups from 0-18.

Args:
age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down.

Returns:

List of deaths by death date for the given age group.

get_aged_gov_date_series()#

Returns a date list to be used for the xAxis on charts when using age separated data.

Returns:

List of dates to be used with age profiled graphs.

Note

Use this when creating time series graphs that use age profiled data.

get_case_data_by_age(dayOfYear, age_group)#

This will return the number of cases for a given day and a given age group. Age groups go from 0 to 18 and days go from 0 to len(n)

Args:
dayOfYear

Integer Value, this will be from 0 to how many days you have in your dataset.

age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down.

Returns:

Integer Value of the amount of cases on a given day for a given age group.

get_cum_second_dose()#

Returns the amount of 2nd doses administered

Returns:

List of second vaccine doses administered.

get_death_data_by_age(dayOfYear, age_group)#

This will return the number of deaths for a given day and a given age group. Age groups go from 0 to 18 and days go from 0 to len(n).

Args:
dayOfYear

Integer Value, this will be from 0 to how many days you have in your dataset.

age_group

Integer Value, This will be from 0 to 18 and increases in 5 year increments.

Note

To calculate the age group divide the target age by 5 and round down.

Returns:

Integer Value of the amount of deaths on a given day for a given age group.

get_death_data_by_age_all()#

Returns a multidimensional list with all death data by date.

Returns:

List[dayOfYear][age_group][deaths] list of all deaths spilt into age groups.

get_deaths_by_report_date()#

Returns all deaths by reported date.

Returns:

List of daily deaths by reported date.

get_full_data_frame()#

Returns the full dataframe from the non-age profiled csv file.

Returns:

Dataframe representing the csv file data + nation.csv e.g. dataEngland.csv, this data is not age profiled.

get_gov_date_Series()#

Returns a date arra to be used for the xAxis on charts.

Returns:

list of dates to be used with non age profiled graphs.

Note

Use this when creating time series graphs that use non-age profiled data.

get_hospital_cases()#

Returns an list with all detailing total people in hospital with COVID

Returns:

List of hospital cases for all age groups, hospital cases are people in hosiptal with a positive COVID test. These are not necessarily people being treated for COVID. This data is not released.

get_line_colour_list()#

Returns line colours for graphs making all graphs look the same, used for age profiled graphs.

Returns:

List of colours used the age profiled graphs. If you want different colours change the line_colour list.

get_new_LFD_cases()#

Retuturns all positive LFD tests.

Returns:

List of cases found by LFD tests.

get_new_LFD_tests()#

Returns all LFD tests conducted.

Returns:

List of how many LFD tests have been conducted.

get_new_PCR_tests()#

Returns all PCR test conducted.

Returns:

List of how many PCR test have been conducted.

get_new_admssions()#

Returns an list with all hospital admissions.

Returns:

List of hosiptal admissions, these are people going into hospital per day. The hospital data accessed through the ReadHospitalData class splits these admissions into admissions and diagnoses.

get_new_cases()#

Returns an list detailing all new COVID cases found by PCR and LFD.

Returns:

List of new COVID cases found by PCR and LFD by specimen date.

get_new_cases_by_report_date()#

Return new cases by report date.

Returns:

List of new cases by reported date for both PCR and LFD.

get_new_deaths()#

Returns an list with daily COVID deaths; these are deaths by death date and could be out of date for up to a week.

Returns:

List of deaths from people with COVID. These are deaths by death date and not reported date.

get_new_pillar_one_tests_by_publish_date()#

Returns all pillar 1 tests.

Returns:

List of how many pillar one tests have been conducted per day.

get_pillar_two_tests()#

Returns all pillar 2 tests that have been conducted.

Returns:

List of how many pillar 2 tests have been conducted per day.

get_population_number_list()#

Returns the population list giving populaiton figures for each age group

Returns:

List of population figures from the ONS for each age group, i.e. populaiton[0] will be for 0_4 year olds, population[4] will be for 15_19 year olds, etc.

get_positive_LFD_confirmed_by_PCR()#

Returns all positive LFD confirmed by PCR.

Returns:

List of positive LFD tests that have been confirmed by PCR.

get_positive_PCR_tests()#

Returns all positive PCR tests.

Returns:

List of positive PCR tests per day.

get_year_dates()#

Returns dates in a year for use with the year comparrisons and requres the CSV file dates.csv.

Returns:

list of days in the year, these are used for yearly comparisons.

unpack_data(field)#

Unpacks the aged data in to a dataframe.

Args:
field

String Value, this is the column name in the dataframe that you want to access. Values for this can be either newCasesBySpecimenDateAgeDemographics or newDeaths28DaysByDeathDateAgeDemographics.

Returns:

Aged data in a dataframe that can be accessed using dataframe functions, for either cases or deaths using the above column names as the field.