Demo Video

Embedded MP4 demo video using the HTML5 <video> tag. For example, this screen recording Prof. Cody Dunne made of Mike Bostock's flexible transitions in D3 slide:

Visualization explanation

Final visualization screenshots (PNG images), design justifications, UI walk-through.

Our two main final visualizations are a scatterplot depicting the relationship between ridership and reliability and a map of the MBTA lines whose data we included. The user is able to use brushing to highlight individual points on the scatterplot, the purpose of this would be to highlight specific planar regions of the scatterplot in a collaborative or presentational context. The user can also acquire information on demand by hovering the mouse over individual points on the scatterplot, doing so provides the user with information about the line, ridership/reliability, and month of the data point. This aids the user in exploring the various data points. It is also possible for the user to filter by line by using the select lines option on the top right and select specific data points by year and month using the calendar. This allows the user to examine individual lines more effectively and examine the data from a temporal perspective.

The map of the lines aids the user in processing the information from a spatial perspective. Hovering the cursor over a given line yields a tooltip with information about the average ridership/reliability for that line, providing a source of aggregative information. The same line filter that filters points on the scatterplot by line highlights the corresponding line on the map, which further aids the user in ascertaining the geographical context of the data presented in the scatterplot.

Data Analysis

Summary of data, data types, and data preprocessing.

Our data source consists of a spreadsheet, of which the following fields are utilized:

year_month: reflects the year and month of the data entry
route_or_line: reflects the line of the data entry
average_monthly_ridership: reflects the average daily ridership over the given month for the line on that year
expected_time: reflects the expected wait time
actual_time: reflects the actual wait time (both expected and actual time are sums over the entire month)

The two variables whose relationship we are examining are average monthly ridership and reliability. We followed the MBTA's lead by considering reliability to be the quotient of the expected time and the actual time. This means that a perfectly reliable (100%) data entry would have an identical expected time and actual time (early trains don't appear to be particularly common in the MBTA system).

Our data source for the expected/actual time was MBTA Bus, Commuter Rail, and Rapid Transit Reliability.

From this table, we extracted a date, route_category column, and 'otp_numerator' (expected time) / 'otp_denominator' (actual time).

Our data source for ridership was MBTA Monthly Ridership by Mode.

This contained a date, line, and average monthly ridership. The relevant columns of these two tables were then joined in Excel based on the data and route/line. The date was converted from a 'complete date' to a string containing only the year/month in order to better work with the data points in JavaScript.

Task Analysis

Summary of task table.

Our task definition for our project initially consisted of more concrete tasks that would naturally develop into the solution of our overarching goal. These tasks essentially aimed to determine the extent to which there was a relationship between reliability and ridership.

Two of our tasks involved identifying times during which there were higher and lower rates of reliability and ridership. Although this was initially defined in terms of time, over the course of the project, this evolved to include a spatial element as well, examining different lines in addition to different time periods. Identifying which spatial and temporal groupings had elevated levels of reliability and ridership, we reasoned, would allow us to investigate whether these two phenomena were related. This formed the basis for our third task, into which the first two tasks dovetail.

Once we developed mechanisms to analyze reliability and ridership across different time periods and spatial groups, the next task was to determine the extent of overlap between high-reliability data points and high-ridership data points. Such an overlap would indicate a correlation between high reliability and high ridership, with significant implications.

Our final and most abstract task was to determine whether there was a relationship between train reliability and train ridership based on the visualizations we created.

Expectation: Clearly describes domain tasks, processes, goals and abstract tasks for domain problems.

Here is our original task table after conducting user interviews:

Design Process

Sketches and design choices to justify final visualization.

We elected to use a scatter plot to convey the relationship between ridership and reliability because it has been established that position on a common scale is the most effective way to convey information about ordered attributes (such as the continuous variables of ridership and reliability). The scatter plot also allows us to use color to distinguish between the categorical value of rails, which is one of the most effective ways to do so.

For the MBTA rail line map, it is a visual aid that promotes exploration and discovery. Through filtration, navigation, brushing, and linking, users are able to visualize the connection between the lines, their interaction with other lines, as well as their placement in regards to Massachusetts and the Greater Boston area and how that reflects on the data. We especially wanted users to be able to interact with the map themselves by navigating through the railway system.

We included an interactive map of the various rail lines in the MBTA system to provide details on demand for individual lines such as to allow the user to access information about the individual lines that may shed light on the trends between ridership and reliability showcased by the scatterplot, such as average monthyly ridership or average reliability for each line. This is further assisted through the usage of the filter menus.

Conclusion

In this project, we successfully collected data about reliability and ridership on different months for different MBTA lines and constructed a website using HTML, css, and JavaScript that incorporated brushing and linking in order to represent that relationship which we found. We created and successfully linked together a calendar, a scatterplot, and a map of the MBTA rail system

In the future, it could be useful to find data for and update our visualizations to reflect the relationship between ridership and reliability for individual stations, rather than for lines in aggregate. The higher level of granularity could potential review different relationships, but would at minimum result in more datapoints and a new perspective. A smaller-scale change would be to include the option to show or hide the trend lines on the scatteplot, providing our users with the option for a less cluttered visualization. Another potential change to consider would be adding aggregate information on demand while using the brushing function. This would entail a pop-up with the average ridership/reliability for a given swath of highlighted points on the scatterplot.

Ultimately, I believe that our group succeeded in creating a visually stimulating and informative website. I believe that our level of interactivity is engaging and that our various visualizations allow users to inform themselves on the relationship between time period, line, ridership, and reliability.

Acknowledgments

Canva
Figma
Building a calendar with D3 Tutorial
Credits to d3noob for clarifying d3 concepts and complexities
The brushing and linking was heavily dependent on the work done by Gabriel Oscar in HW4
Gabriel Oscar used ChatGPT to assist with errors for certain portions of the code

Project Team 4: Ridership vs Reliability, COSI 116A F24

Motivation

Visualization