On the Horizon

Jun 25, 2018 00:00 · 534 words · 3 minute read

Introduction

Time series analysis is about driving forward while looking backwards. In addition to extrapolating past data to create forecasts, time series models may include factors of influence such as holidays. When considering events such as holidays, however, what was recorded in the past must also be recorded in the future for the model to “adjust” predictions. The following post shows how I usually create a set of features, including holidays, for the horizon – the future time frame for prediction.

Creating the horizon

Python’s pandas library can handle time series “out-of-the-box”. It includes a data type for dates and pairs well with libraries created for time series analysis.

import pandas as pd

# calendar libraries
from datetime import datetime
import calendar
from dateutil.relativedelta import relativedelta
from dateutil.relativedelta import MO, TU, WE, TH, FR, SA, SU
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar, nearest_workday, MO, TU, FR
from pandas.tseries.holiday import USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay

# markdown outputs - for blog
import tabulate

Set the time frame

The horizon is set for two years from 1 January 2016 to 31 December 2017.

index = pd.date_range(
    datetime(2016, 1, 1), # start
    datetime(2017, 12, 31), # end
    freq='D')

df = pd.DataFrame({'Index': pd.date_range(
    datetime(2016, 1, 1),
    datetime(2017, 12, 31),
    freq='D')})
    
print(tabulate(df.head(), headers=['Index', 'Date'], tablefmt='pipe'))

The next step is to add discrete features like the day of week and whether or not the day was a weekend or weekday. These features can be included discriminately based upon contextual understanding of what you are trying to predict. In competitions like Corporación Favorita Grocery Sales Forecasting, distinguishing weekday/weekends helped the models capture differences in consumer shopping behaviors.

These discrete features can be label or one-hot-encoded when fitting into your favorite machine learning models.

# make descrete features on index
df['Day of Week'] = df['Index'].dt.weekday_name
df.loc[df['Day of Week'].isin(['Saturday', 'Sunday']), 'Day Type'] = 'Weekend'
df.loc[~df['Day of Week'].isin(['Saturday', 'Sunday']), 'Day Type'] = 'Weekday'

print(tabulate(df.astype(str).head(), headers=df.columns, tablefmt='pipe'))
Index Day of Week Day Type
0 2016-01-01 Friday Weekday
1 2016-01-02 Saturday Weekend
2 2016-01-03 Sunday Weekend
3 2016-01-04 Monday Weekday
4 2016-01-05 Tuesday Weekday

Setting holiday rules

Holidays sometimes fall on the same day annually while others fall on the first Monday of the third week. Pandas, thankfully, can handle both simple and complex rules. Below is an example of holiday rules for the state of Hawaii including both state and federal holidays.

# define holiday rules
def election_observance(dt):
    if dt.year % 2 == 1:
        return None
    else:
        return dt + pd.DateOffset(weekday=TU(1))
    
class HolidayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday("New Year's Day", month=1, day=1),
        USMartinLutherKingJr,
        USPresidentsDay,
        Holiday('Prince Kuhio Day', month=3, day=26),
        GoodFriday,
        USMemorialDay,
        Holiday('King Kamehameha Day', month=6, day=11),
        Holiday('Independence Day', month=7, day=4, observance=nearest_workday),
        Holiday('Admission Day', month=8, day=1, offset=pd.DateOffset(weekday=FR(3))),
        USLaborDay,
        Holiday('Election Day', month=11, day=2, observance=election_observance),
        Holiday("Veteran's Day", month=11, day=11),
        USThanksgivingDay,    
        Holiday('Christmas', month=12, day=25, observance=nearest_workday)
    ]
    
cal = HolidayCalendar()
holidays = cal.holidays(start=index.min(), end=index.max())

# make holiday feature
df['Holiday'] = df['Index'].isin(holidays)

print(tabulate(df.astype(str).head(), headers=df.columns, tablefmt='pipe'))

Index Day of Week Day Type Holiday
0 2016-01-01 Friday Weekday True
1 2016-01-02 Saturday Weekend False
2 2016-01-03 Sunday Weekend False
3 2016-01-04 Monday Weekday False
4 2016-01-05 Tuesday Weekday False

Conclusion

And that’s all there is to it. Now you know how to create a horizon that consists of additional and possibly important features.