\[\\[0.2in]\]

Introduction

This site was created with the intention of investigating the education system in British Columbia, Canada. Although test scores can be quite informative, I believe they can be misleading–the difficulty of tests can fluctuate dramatically, effectively resulting in student grades also fluctuating. As such, our metric of choice was grade-to-grade transition rates. These rates do not reflect the quality of the education too accurately, but they are a very good indicator of how much support the British Columbian government is providing to its youth. As such, our questions are as follows:

How has the British Columbian Government’s educational support (as indicated by grade-to-grade transition rates) in the past 30 years differed in different kinds of schools, as well as for different types of British Columbian students (i.e., Indigenous? Special needs?)? Are there any other specific attributes that is strongly related to a lower/higher quality of educational support?

Note that this is quite a complex question, and so naturally, a lot of data is required to answer it. Thus, this website uses various open data from British Columbia. Our main dataset of interest is the 1992-2021 data on grade-to-grade transition rates published by British Columbia Education Analytics. This dataset consists of information on a provincial, district, and school level. Additional information gathered from supplementary datasets. A dataset on class Sizes from 2006 to 2021 provides us information on all three levels, but here we mainly use it on the school level. For the district level, we used a dataset regarding each district’s office and a dataset regarding British Columbian Teachers for each district.

If you wish to see the Github repository for this site, you can find it here.

If you wish to see a PDF version of this report, you can find a copy here. The report is created after the creation of this website, so it shares many of the same content. However, do note that there are some slight differences–for instance, the PDF report features some maps that are not found here, but the site features the interactive figures that are not found there.

\[\\[0.2in]\]

Methods

Data Prep

All files that were used are of the ‘.csv’ format. Our main data came in the form of 3 files containing information regarding BC grade-to-grade transitions on a provincial, district and school level (the 3 files for 3 different decades). It is worth noting that few of the files were tidy–many of the files had some rows that recorded data on “all students” whilst other rows recorded data on “Indigenous students”, so each row should not be treated as its own observation. As such, we had to be careful when working with it. We merged and filtered the data from all of the files into three data tables–one for the provincial level, one for the district level, and one for the school value. Rather than imputing, we removed entries in our table with missing data, as I felt as though imputation would lead to my figures being misleading.

Procedure

Before doing anything meaningful, we should first do some exploratory data analysis. Based on the structure of the data, I felt as though it would be most appropriate to look at the situation from all three levels. We begin by making interactive figures to understand the situation from a provincial and district level. We will also perform basic model fitting on a school level. By mimicking the structure of our data, we can make the most out of it! Our data exploration is primarily for the purpose of answering our first question: How has the British Columbian Government’s educational support (as indicated by grade-to-grade transition rates) in the past 30 years differed in different kinds of schools, as well as for different types of British Columbian students (i.e., Indigenous? Special needs?)?

After doing some basic Exploratory Data Analysis, we will fit some machine learning models (namely regression trees, bagging, random forests, boosting and extreme-gradient boosting) on a school level, as there are not enough features or number of observations on the district and provincial levels to fit these machine-learning models in a meaningful way. The primary focus of this section is to answer our second question: Are there any other specific attributes that are strongly related to a lower/higher quality of educational support?

\[\\[0.2in]\]

Results

Let us begin with some exploratory data analysis.

Exploratory Data Analysis

You can see the plots and analysis that were made for the provincial, district and school levels by clicking on the corresponding tabs.

Provincial Level

Below is an interactive plot that shows the grade-to-grade transition rates over time. Using the drop menus located on the lower right corner of the grid, we can tweak our population of interest. The axis is fixed for ease of comparison, but you can always zoom in. We can observe various things by tweaking the settings of the plot. Below are just a few discoveries that I found.

  • In the Province-Total and All Students setting, we can see that overall, as time went by, the percentage of students that successfully transitioned to the next grade increased. This is the most prominent for students in grades 8-11. It is also worth noting that the percentage of students in grades 10/11 that successfully transition to the next grade are significant lower than the percentage of students in the lower grades that successfully transition.
  • Keeping the bottom dropdown as All Students but altering the top dropdown between BC Public School and BC Independent School, we can see that the rates of grade transitions for students of BC public schools are lower than that of BC independent schools, but the increase in the rate of successful grade transitions is significantly higher in public schools. Furthermore, private schools (i.e., independent schools) seem to have a much higher rate of transition, especially for grade levels 8 and above. This does make sense, as private schools are often times more expensive, and so the students attending them naturally live in my affluent families.
  • Keeping the top dropdown as Province-Total but altering the bottom dropdown between Indigenous and Non Indigenous, we can see that the rate of successful grade transitions is significantly lower in Indigenous students compared to their non-Indigenous counterparts. We can also see that the increase in the rates of grade transitions over the years is higher for Indigenous students compared to non-Indigenous students.
  • Keeping the top dropdown as Province-Total but altering the bottom dropdown between Non Special Needs and Special Needs, we can see that the rate of successful grade transitions is significantly lower in special needs students compared to students without special needs.

\[\\[0.1in]\]

Figure 1: An interactive graph depicting the percentage of students in BC of each grade level that successfully transition to the next grade between the years 1993 and 2020.

\[\\[0.1in]\]

District Level

Below are two interactive plots. Figure 2 is an interactive heatmap that displays all the data we have regarding the transition rates for each group, district, and year. Below is a short explanation of the abbreviations used.

  • If there are no bracketed abbreviation in the figure, then you are looking at “All Students”
  • (NSN) is an abbreviation for “Non Special Needs”
  • (SN) is an abbreviation for “special Needs”
  • (NI) is an abbreviation for “Non Indigenous”
  • (I) is an abbreviation for “Indigenous”

Figure 3 consists of an interactive boxplot that summarizes the data for all the districts for each year. The axis is fixed for ease of comparison, but you can always zoom in (or hover your mouse to see more details). There are many interesting things you can find with the interactive plots. Below are just a few findings from our figures:

  • The percentage of students who transitioned to the next grade for Grade 11 fluctuates very dramatically. Playing with the interactive boxplot, one can see that the variation in the rate of transitions from grade 11 to 12 from 2015 to 2020 has been significantly reduced (when compared with the past years). The median of the rates has also increased over time. As such, these plots suggest that there is more equality (due to the decrease in variation) in terms of education/opportunities across the districts, and the education/opportunities offered is of a higher quality (as indicated by the higher median).
  • Playing with the settings of Figure 3, we can see that for most grades (particularly the higher grades, like grade 11) the median percentage of Indigenous students per district who transitioned is significantly lower than that of non-Indigenous students in 1993. This difference is much smaller in 2020, but the median percentage is still less. The same can be said about the variation: the interquartile range (IQR) for the Indigenous 11th graders in different districts was much larger than that of the non-Indigenous counterparts in 1993. In 2020, this different has indeed shrunk, but the IQR for the Indigenous children is still roughly twice that of their non-Indigenous counterparts. As such, we can certainly say that there’s an improvement in the quality of education/opportunities for both Indigenous children and non-Indigenous children, but more work needs to be done.
  • The relationship between students with special needs and those without (again, particularly for the higher grades) is very similar to what was seen between Indigenous and non-Indigenous children–the variation for both of the rates decreased over time, but the variation of the rates for the students with special needs was typically higher than the rate for students with special needs. The difference now is minimal though (at least when compared to the past). Likewise, the median rates for both increased over time (although there was a period of time in which the special needs rates were decreasing), but the non-special needs students had a higher median rate for a while now. That said, the difference in the median rate in 2020 was very small.

\[\\[0.1in]\]

Figure 2: Heatmap displaying the trend in overall transition rates for students of each district over the years.

Figure 3: An interactive boxplot depicting the percentage of high school BC students from each district that successfully transitioned to the next grade between the years 1993 and 2020.

School Level

Exploratory Data Analysis for Question 1

I ended up fitting three linear models that predicted grade-to-grade transition rates (i.e., the grade-to-grade transition rate of a specific school, grade level, and a certain population). The first model further split the groups into Indigenous students and non-Indigenous students, while the second model split the group into ones with special needs and without. The third model simply dealt with all students (i.e., we did not split into subgroups). The predictors I used was the year, the grade level of interest, and whether the school was public or private. I also used the sub-population of interest as a predictor in my first two models.

Note that the point of creating these models was not to create a model for future prediction–we will do that at the machine learning section (which is below this section) instead. The models created here were purely for interpretability. As such, although my first, second, and third models had \(R^2\) values of 0.2164, 0.1542 and 0.1551, respectively–which are very low \(R^2\) values–I was not bothered.

All of the predictors had an extremely small p-values (less than 1e-16), which hints at the fact that it is very likely that there is a difference in grade-to-grade transitions between people of different grades, between Indigenous and non Indigenous students, between special needs and non special students, and between students of public and private/independent schools. The coefficients of our models tell us more about these differences. The first model suggests that Indigenous students had a 2.502% lower mean rate of successful transitions, while our second model suggested that students with special needs have a 1.126% lower mean rate of successful transitions compared to their non-Indigenous and non-special needs counterparts, respectively. Looking at the coefficients for the grades for all of our models, we can also see that our model suggests that students in high school (BC high school starts at grade 8) are significantly less likely to transition to their next grade compared to their elementary school counterparts. This is probably due to the riskier behaviors certain high school students have, which may result them getting in all sorts of trouble. Lastly, our model suggests that the transition rate for public schools is smaller than that of private schools, but this different is fairly negligible. This provides a nice answer to our first question at a school level.

Exploratory Data Analysis for Question 2

As we will be training a machine learning model on a dataset at the school level, it is worth doing some exploratory data analysis on that dataset. Note that as the dataset that we will use for the machine learning model is the result of merging several datasets, and as some datasets cover a different range of years, the machine learning dataset will only cover years from 2006 and 2016.

Below is an interactive plot showing how the number of full time educators in a district and the average class size for a grade group in a school may have an effect on the grade transition rates. You can change the year and the grade of interest through the dropdown menu. One observation we can make is that the lower grade transition rates are often located on the bottom (i.e., relatively low number of students for a grade group) and for relatively high grades, like grade 11.

Figure 4: A interactive plot that depicts how the number of full time educators in a District, and the average class size for a grade group for a school may be related to the grade transition rates. Note that there are 3 grade groups here: grades 1-3, 4-7, and 8-11.

Machine Learning

Prior to feeding our data in, we removed province specific data—for instance, we removed school names and district names. This reduces overfitting, so our model can be somewhat generalized to schools outside British Columbia. Our machine learning dataset has data from 2006 to 2016–as such, my train test split involved me taking data from the final 3 years (2016, 2015, 2014) and placing them into the test dataset. Everything else was placed into the training dataset. By doing this, we are in essence predicting the “future” based on our current information. I chose to this for 3 years because this makes things roughly a 70-30 train-test split, which is typical train-test split in machine learning. Another important thing to note is that all schools examined here are public schools, as we could not access the relevant information for independent schools.

Or at least, that was the original plan. When constructing our bagged and random forest models, we realized that there were computational constraints, and we were not able to produce the models using the full train dataset (which had 107885 observations). As such, I ended up sampling 2500 observations from our train and test data sets, and used these samples for model creation.

Obviously, this does make a difference in terms of results. We were able to make a decision tree with the full datasets (but not a random forest), and the mean square error on the full test data set was 11.7754406. On the other hand, for our decision tree based on the two samples, our test mean square error was 18.2714136. This is a very significant difference in performance, so we should take our results with a grain of salt while also knowing we can do quite a bit better when we have more computational power.

We ended up making 5 models: a regression tree, a bagged model, a random forest, a gradient boosting model, and an extreme-gradient boosting model. Their attributes are described in the table below.

Table 1: Test MSE of each of our models
Model Most Important Predictor Second Most Important Predictor Test MSE
Regression Tree Grade Level Number of Students of the Grade in the School 18.27141
Bagging Grade Level Number of Students of the Grade in the School 15.58609
Random Forest Grade Level Number of Classes in School 15.12862
Gradient Boosting Grade Level Number of Students of the Grade in the School 18.85486
Extreme Gradient Boosting Number of Classes in School Number of Students of the Grade in the School 15.59384

Note that for our extreme gradient boosting model, the grade predictor was actually split into 11 different indicator predictors, each one for the grades between 1 to 11, so its importance was not as high in the model. As such, one could argue that the grade level of students is a very important if not the most important factor to account for when we are considering the grade-to-grade transitions for students of a certain grade from a certain school at a certain year. The second most important predictor for most of our models is the number of students of our grade of interest in our school of interest. This makes sense to me, as from our data exploration from earlier, we clearly saw how high school students (grade 8-11) were less likely to have a successful grade transition. Furthermore, the number of students of a grade of interest is a proxy variable for the general size of a school. As large public schools tend to be in big cities whilst small public schools are in relatively rural locations, it makes sense that a smaller number of students for a given grade could indicate a lower likelihood of transition.

\[\\[0.1in]\]

Conclusions and Summary

In conclusion, through exploratory data analysis, we could see that over the years, the rate of students successfully transitioning from grade-to-grade has increased over the years. That said, there is still a gap in transition rates between students from public schools and students from private schools, students with special needs and students without, as well as Indigenous students and non Indigenous students. These gaps are decreasing, though, which is a good sign for the future. Another thing we found is that students in high school are significantly less likely to transition compared to students in lower grades, likely due to the rebellious nature of teenagers. From our machine learning models, by looking at the most important predictors, we saw that the grade of a student and the size of a school is related to the grade transition rate. We just explained the reason why the grade would matter significantly, so I will not repeat myself–as for the size of the school, it can be explained to be a proxy variable of whether or not a school is rural or not.

In the future, we could also look at data on various exam results and assessment statistics provided by the British Columbian government and relate them to our findings in this study.