Alternate titles: "Holiday Internet Usage" or... "Post-Holiday Season Internet Blues"
Why, after only 10 days into my new billing cycle, would I be receiving an alert that 90% of my 125 GB monthly internet plan has been used up and I still have 20 days to go? I know that my average household usage over the last eight months has been 70-80 GB. I also know that my daughter worked in the city over the summer but came home for the December holiday break, (arrived December 21).
After accessing my ISP account and downloading the usage data for the last two months, I see an opportunity for some "basic statistics" that I can then share with my daughter as we approach the subject of... Wanton Internet Streaming Abuse (aka "WISA").
Definitely not normal (but one might expect that). A median daily usage of 2.8 Gigabytes (GB) and a lot of dispersion. So, with 30 days times a 2.8 GB per day, one expects total monthly consumption to be in the neighborhood of 60... Perhaps even 70 GB, which is very much in step with historic average monthly usage. Nothing revealing so far. Let's generate a run chart to illustrate the usage over time;
Hmmm... Something doesn't smell right here. Usage spikes up after December 21st and stays elevated over the next fourteen days. When did our daughter come home for the holidays? Oh, yes! December 21st! (Note: For simplicity, we won't get into the p-values associated with the run chart... I don't think my daughter would appreciate that point in the discussion that will ensue as a result of this "study"). Just for fun, we'll generate a control chart on a "normally" transformed version of the data.
The control chart supports that assumption that something has "changed in the process". There is evidence of special cause that has affected the process. Observation: Daughter comes home and, interestingly, process changes. Correlation? Causation? We might as well play this out to the end. Since she arrived home on December 21st, we will add a categorical identifier, ("Home - Yes", "Home - No"), so we can subgroup the data for further analysis.
The preceding comparative histogram and descriptive statistics help to illustrate the difference between "daughter home" versus "daughter not home". One thing that jumps out is the difference in the sample medians: 1 GB per day when daughter is not home and 9.3 GB when daughter is home. Another observation is the spread... far more variation in daily usage when daughter is home. Now, for one of my favourite tools... the box plot.
The differences really jump out in the preceding chart. Notice that both the difference in variation (dispersion) and central tendency (median) stand out in this simple yet effective chart? Well, we have the