ECU Food Security

This is an old revision of the document!

API Load

These are the import flags for the Python3 script. Necessary libraries should be loaded via pip3 and ran in the virtual env:

import requests
import pandas as pd
from sqlalchemy import create_engine, types as sql_types
from datetime import *
from urllib.parse import urlencode, urljoin
import sys

This function is used to construct the API call. To load a different data set, change the BASEURL variable and the app_identifier (and any other compulsory vars):

def build_url():
        BASEURL = "https://hapi.humdata.org/api/v2/"
        ENDPOINT = sys.argv[2]
 
        params = {
                "app_identifier": "RUNVRm9vZFNlYzphdG5ldmluQG91ci5lY3UuZWR1LmF1",
                "start_date": sys.argv[1],
                "end_date": date.today(),
                "output_format":"json",
                "offset":"0",
                "location_code":"MOZ"
                }
        full_path = BASEURL + ENDPOINT + "?" + urlencode(params)
        return full_path

This function will map a JSON data type to an SQL equivalent for table creation:

def map_dtype(dtype):
    if pd.api.types.is_integer_dtype(dtype):
        return sql_types.INTEGER()
    elif pd.api.types.is_float_dtype(dtype):
        return sql_types.FLOAT()
    elif pd.api.types.is_datetime64_any_dtype(dtype):
        return sql_types.DATETIME()
    elif datetime_valid(dtype):
        return sql_types.DATETIME()
    else:
        return sql_types.VARCHAR(length=255)

Use this to scan for an ISO datetime variable (this is an obscure data type and needs it's own func):

def datetime_valid(dt_str):
        print(dt_str)
        try:
                datetime.fromisoformat(dt_str)
        except:
                return False
        return True

Input validation is then done (and kicks off the main part of our function):

if(len(sys.argv) < 4):
        print("Not enough arguments provided")
        print("python3 load.py <start_date> <endpoint> <table_name>")
        print("python3 load.py 2020-01-01 food-security-nutrition-poverty/poverty-rate IDX_PovRate")
        sys.exit("")

Build endpoint using our function above, then fire the web request, save the data as JSON:

response = requests.get(build_url())
response.raise_for_status()
raw_data = response.json()

Normalize and flatten data using pandas library:

if isinstance(raw_data, dict):
    records = raw_data.get("data", raw_data)
else:
    records = raw_data
 
 
df = pd.json_normalize(records)

Connect to the database, create the table, print output, finish:

engine = create_engine(f"mysql+mysqldb://{username}:{password}@{host}/{database}")
 
# Step 5: Create table and insert data
df.to_sql(name=table_name, con=engine, if_exists='replace', index=False, dtype=dtype_mapping)
 
print("Table created and data inserted successfully!")

Specific database connection details are omitted.