API Load

These are the import flags for the Python3 script. Necessary libraries should be loaded via pip3 and ran in the virtual env:

import requests
import pandas as pd
from sqlalchemy import create_engine, types as sql_types
from datetime import *
from urllib.parse import urlencode, urljoin
import sys

This function is used to construct the API call. To load a different data set, change the BASEURL variable and the app_identifier (and any other compulsory vars):

def build_url():
        BASEURL = "https://hapi.humdata.org/api/v2/"
        ENDPOINT = sys.argv[2]
 
        params = {
                "app_identifier": "RUNVRm9vZFNlYzphdG5ldmluQG91ci5lY3UuZWR1LmF1",
                "start_date": sys.argv[1],
                "end_date": date.today(),
                "output_format":"json",
                "offset":"0",
                "location_code":"MOZ"
                }
        full_path = BASEURL + ENDPOINT + "?" + urlencode(params)
        return full_path

This function will map a JSON data type to an SQL equivalent for table creation:

def map_dtype(dtype):
    if pd.api.types.is_integer_dtype(dtype):
        return sql_types.INTEGER()
    elif pd.api.types.is_float_dtype(dtype):
        return sql_types.FLOAT()
    elif pd.api.types.is_datetime64_any_dtype(dtype):
        return sql_types.DATETIME()
    elif datetime_valid(dtype):
        return sql_types.DATETIME()
    else:
        return sql_types.VARCHAR(length=255)

Use this to scan for an ISO datetime variable (this is an obscure data type and needs it's own func):

def datetime_valid(dt_str):
        print(dt_str)
        try:
                datetime.fromisoformat(dt_str)
        except:
                return False
        return True

Input validation is then done (and kicks off the main part of our function):

if(len(sys.argv) < 4):
        print("Not enough arguments provided")
        print("python3 load.py <start_date> <endpoint> <table_name>")
        print("python3 load.py 2020-01-01 food-security-nutrition-poverty/poverty-rate IDX_PovRate")
        sys.exit("")

Build endpoint using our function above, then fire the web request, save the data as JSON:

response = requests.get(build_url())
response.raise_for_status()
raw_data = response.json()

Normalize and flatten data using pandas library:

if isinstance(raw_data, dict):
    records = raw_data.get("data", raw_data)
else:
    records = raw_data
 
 
df = pd.json_normalize(records)

Connect to the database, create the table, print output, finish:

engine = create_engine(f"mysql+mysqldb://{username}:{password}@{host}/{database}")
 
# Step 5: Create table and insert data
df.to_sql(name=table_name, con=engine, if_exists='replace', index=False, dtype=dtype_mapping)
 
print("Table created and data inserted successfully!")

Specific database connection details are omitted.

This python script can be called recursively to load multiple different endpoints, country, or date time. An example of loading IDX endpoints:

#!/bin/bash                                                                                                                                                                                                                                                                                                                                             python3 load.py 2020-01-01 affected-people/refugees-persons-of-concern IDX_Refugees                                                                                         python3 load.py 2020-01-01 affected-people/humanitarian-needs IDX_Humanitarian                                                                                              python3 load.py 2020-01-01 affected-people/returnees IDX_returnees                                                                                                          echo "Loaded 'Affected people'"                                                                                                                                                                                                                                                                                                                         python3 load.py 2020-01-01 coordination-context/operational-presence IDX_OperationalPresence                                                                                python3 load.py 2020-01-01 coordination-context/funding IDX_Funding                                                                                                         python3 load.py 2020-01-01 coordination-context/conflict-events IDX_ConflictEvents                                                                                          python3 load.py 2020-01-01 coordination-context/national-risk IDX_NationalRisk                                                                                              echo "Loaded 'Coordination Context'"                                                                                                                                                                                                                                                                                                                    python3 load.py 2020-01-01 food-security-nutrition-poverty/food-security IDX_FoodSecurity                                                                                   python3 load.py 2020-01-01 food-security-nutrition-poverty/food-prices-market-monitor IDX_FoodPrices                                                                        python3 load.py 2020-01-01 food-security-nutrition-poverty/poverty-rate IDX_PovertyRate                                                                                     echo "Loaded 'Food Security Nutrition Poverty'"                                                                                                                                                                                                                                                                                                         python3 load.py 2020-01-01 geography-infrastructure/baseline-population IDX_BaselinePopulation                                                                              echo "Loaded 'Geography Infrastructure'"                                                                                                                                                                                                                                                                                                                python3 load.py 2020-01-01 climate/rainfall IDX_Rainfall                                                                                                                    echo "Loaded 'Climate Rainfall data'"