This is an old revision of the document!
API Load
These are the import flags for the Python3 script. Necessary libraries should be loaded via pip3 and ran in the virtual env:
import requests import pandas as pd from sqlalchemy import create_engine, types as sql_types from datetime import * from urllib.parse import urlencode, urljoin import sys
This function is used to construct the API call. To load a different data set, change the BASEURL variable and the app_identifier (and any other compulsory vars):
def build_url(): BASEURL = "https://hapi.humdata.org/api/v2/" ENDPOINT = sys.argv[2] params = { "app_identifier": "RUNVRm9vZFNlYzphdG5ldmluQG91ci5lY3UuZWR1LmF1", "start_date": sys.argv[1], "end_date": date.today(), "output_format":"json", "offset":"0", "location_code":"MOZ" } full_path = BASEURL + ENDPOINT + "?" + urlencode(params) return full_path
This function will map a JSON data type to an SQL equivalent for table creation:
def map_dtype(dtype): if pd.api.types.is_integer_dtype(dtype): return sql_types.INTEGER() elif pd.api.types.is_float_dtype(dtype): return sql_types.FLOAT() elif pd.api.types.is_datetime64_any_dtype(dtype): return sql_types.DATETIME() elif datetime_valid(dtype): return sql_types.DATETIME() else: return sql_types.VARCHAR(length=255)
Use this to scan for an ISO datetime variable (this is an obscure data type and needs it's own func):
def datetime_valid(dt_str): print(dt_str) try: datetime.fromisoformat(dt_str) except: return False return True
Input validation is then done (and kicks off the main part of our function):
if(len(sys.argv) < 4): print("Not enough arguments provided") print("python3 load.py <start_date> <endpoint> <table_name>") print("python3 load.py 2020-01-01 food-security-nutrition-poverty/poverty-rate IDX_PovRate") sys.exit("")
Build endpoint using our function above, then fire the web request, save the data as JSON:
response = requests.get(build_url()) response.raise_for_status() raw_data = response.json()
Normalize and flatten data using pandas library:
if isinstance(raw_data, dict): records = raw_data.get("data", raw_data) else: records = raw_data df = pd.json_normalize(records)
Connect to the database, create the table, print output, finish:
engine = create_engine(f"mysql+mysqldb://{username}:{password}@{host}/{database}") # Step 5: Create table and insert data df.to_sql(name=table_name, con=engine, if_exists='replace', index=False, dtype=dtype_mapping) print("Table created and data inserted successfully!")
Specific database connection details are omitted.
