Documenting the Properties of Implemented Travel Forecasting Models

3 posts / 0 new
Last post
KenCervenka
Documenting the Properties of Implemented Travel Forecasting Models

The wide range of responses received to my original TMIP Forum post (from last Thursday) with the title of "Showing how Travel Forecasting Models Work," suggest a need to modify the original request.  Preparation of various tutorials to help students (or anyone) understand how "theory" has been translated into actual computations is clearly of value, but I am concerned that this may not be going far enough towards helping the end-users of a specific model understand how that particularly implemented model works, where, as Andrew Rohne astutely observed in his Friday post, the data available to support a specific implementation could be somewhat limited.

While there are dozens of reasons why there will probably never be (and never should be) a "standard model" structure that will be deemed suitable for all possible travel forecasting and decision-making interests, is there value in having a somewhat-standard format for documenting how specific implemented models make their calculations, that would at least make it easier to compare one implemented model to another?

Consider an applications-ready mode choice model that is part of a trip-based or tour-based regional model: each row (or maybe each column) of a spreadsheet might be for each specific socio-economic group (e.g., a breakdown by the "auto sufficiency" of the person's household, by household income range, based on the age group of the person traveling, etc.) and travel purpose that is separately-represented in the mode choice model.  The columns (or maybe the rows) contain all of the information needed to calculate the zone-to-zone shares for each separately-represented mode of travel.  So the spreadsheet shows all of the coefficients and constants that are applicable to a specific zone pair and time period, including the "extra complication" associated with some zone pairs having calibrated constants that are different from those for other zone pairs (as may happen with travel made to/from specific geographic areas, such as the CBDs of big central cities).  A motivated end-user could then "play around" with the spreadsheet to get a handle on the impacts that a change in one or mode choice model input variables will have on the calculated mode share.  If two or more implemented models have the same spreadsheet-based format, this also opens up opportunities to conduct a reasonably-quick "compare and contrast" analysis. While there have been previous efforts to show how mode choice models differ in regards to the estimated or asserted level-of-service (LOS) coefficients, there seem to be few efforts that include a publicly-available investigation into what it really "means" to come across situations where the model's computations are driven by very large constants or other model adjustments, rather than changes in LOS.  Or situations where there is a huge range in calibrated constants for the different traveler markets, for which no plausible behavioral explanation can be found.  But regardless of what one might do next to assess a mode choice model's likely usefulness towards the prediction of a change in person travel by mode in response to a change in the model's inputs (in advance of being in a somewhat-rare position to conduct a real-world predicted-actual investigation), it seems there could be value to the end-users of a developed model, to be able to (quickly) see for themselves, via a simple spreadsheet format, how that specific implemented model works, including the warts associated with what might be referred to as an over-calibrated model.  I heard from one consultant who has indeed prepared such implementation-specific spreadsheets (for both auto ownership and mode choice models), but it was done to assist with the model development process (a quick way to check that the modeling software is calculating what was intended), rather than for public sharing as "model documentation." 

A similar "put it on a spreadsheet for illustrative purposes" approach could be used for other components of a full regional model implementation, but that comes with complications associated with trip/tour distribution and traffic assignment models where what gets calculated for one zone pair is influenced by the calculations for other zone pairs. The UTOWN-type approaches might be a good way for students to understand the underlying theory, but maybe there is an even-simpler format to show, as with simple spreadsheet-based mode choice examples, the impacts that the purpose-specific calibrated constants in destination choice models are having on the flow calculations?  For traffic assignment, a spreadsheet-based simplification might be to avoid showing the details of an equilibration process and focus instead on what the model is doing in regards to calculating the uncongested (free) speed travel time for different types of links and the extra travel time due to volume-based delays throughout an average weekday.

I feel compelled to repeat that I am not concerned with those who are pointing out previous development of tutorials to explain modeling theory, I am instead concerned that actual implemented models (trip and tour-based) are not being described/presented in a manner that makes it easy for anyone but the original model developer to understand how that particular model really works, or how that particular model compares to models that have been developed in other regions with a similar forecasting need.

Thank you!

Pilo Willumsen

Ken as asked a valid question and we have provided only disappointing
answers. There are several reasons why Ken¹s question is a good one. There
is a long trend to increase the complexity of our models on the grounds that
they would provide more realistic behavioural responses to changes in
policies and networks. Reality is very complex, so it seems natural that we
need complex models to match it. This is a valid objective of research as it
would help understanding how we make decisions. It is not a valid
forecasting objective.
The short answer to Ken¹s question seems to be that us modellers do not use
this type of simple tool to explain to users how the model works. Perhaps
this was not critical when the models we used were simple aggregate ones. I
know that we do use these simple tools to test whether the parameters we
have estimated (or asserted) for a more complex model result in a model that
makes sense. But we certainly do not include this or a better simple model
to communicate results or validate the model. The argument is that this
would be a very good idea!
Indeed, the more disaggregate our models, the more difficult it is to make
sure the results make sense and are not a feature of congested assignment.
The problem is not so much with one individual part of the model; each
presumably makes sense. The problem is what happens when we run a series of
deeply nested choice models connected by logsums and with coefficients, some
estimated, some borrowed, some asserted. The simpler models suggested by Ken
should be a very desirable tool to ensure all parts of the model work well;
their simpler specification should probably be part of any guidelines on
model development and validation.

ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠ
Luis Willumsen
Director
Willumsen Advisory Services
Kineo Mobility Analytics
London & Madrid
T: +44 7979 53 88 45

www.Kineo-analytics.com
ŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠŠ..

From: on behalf of KenCervenka

Date: Monday, 30 October 2017 at 17:58
To: TMIP
Subject: [TMIP] Documenting the Properties of Implemented Travel
Forecasting Models

The wide range of responses received to my original TMIP Forum post (from
last Thursday) with the title of "Showing how Travel Forecasting Models
Work," suggest a need to modify the original request. Preparation of
various tutorials to help students (or anyone) understand how "theory" has
been translated into actual computations is clearly of value, but I am
concerned that this may not be going far enough towards helping the
end-users of a specific model understand how that particularly implemented
model works, where, as Andrew Rohne astutely observed in his Friday post,
the data available to support a specific implementation could be somewhat
limited.

While there are dozens of reasons why there will probably never be (and
never should be) a "standard model" structure that will be deemed suitable
for all possible travel forecasting and decision-making interests, is there
value in having a somewhat-standard format for documenting how specific
implemented models make their calculations, that would at least make it
easier to compare one implemented model to another?

Consider an applications-ready mode choice model that is part of a
trip-based or tour-based regional model: each row (or maybe each column) of
a spreadsheet might be for each specific socio-economic group (e.g., a
breakdown by the "auto sufficiency" of the person's household, by household
income range, based on the age group of the person traveling, etc.) and
travel purpose that is separately-represented in the mode choice model. The
columns (or maybe the rows) contain all of the information needed to
calculate the zone-to-zone shares for each separately-represented mode of
travel. So the spreadsheet shows all of the coefficients and constants that
are applicable to a specific zone pair and time period, including the "extra
complication" associated with some zone pairs having calibrated constants
that are different from those for other zone pairs (as may happen with
travel made to/from specific geographic areas, such as the CBDs of big
central cities). A motivated end-user could then "play around" with the
spreadsheet to get a handle on the impacts that a change in one or mode
choice model input variables will have on the calculated mode share. If two
or more implemented models have the same spreadsheet-based format, this also
opens up opportunities to conduct a reasonably-quick "compare and contrast"
analysis. While there have been previous efforts to show how mode choice
models differ in regards to the estimated or asserted level-of-service (LOS)
coefficients, there seem to be few efforts that include a publicly-available
investigation into what it really "means" to come across situations where
the model's computations are driven by very large constants or other model
adjustments, rather than changes in LOS. Or situations where there is a
huge range in calibrated constants for the different traveler markets, for
which no plausible behavioral explanation can be found. But regardless of
what one might do next to assess a mode choice model's likely usefulness
towards the prediction of a change in person travel by mode in response to a
change in the model's inputs (in advance of being in a somewhat-rare
position to conduct a real-world predicted-actual investigation), it seems
there could be value to the end-users of a developed model, to be able to
(quickly) see for themselves, via a simple spreadsheet format, how that
specific implemented model works, including the warts associated with what
might be referred to as an over-calibrated model. I heard from one
consultant who has indeed prepared such implementation-specific spreadsheets
(for both auto ownership and mode choice models), but it was done to assist
with the model development process (a quick way to check that the modeling
software is calculating what was intended), rather than for public sharing
as "model documentation."

A similar "put it on a spreadsheet for illustrative purposes" approach could
be used for other components of a full regional model implementation, but
that comes with complications associated with trip/tour distribution and
traffic assignment models where what gets calculated for one zone pair is
influenced by the calculations for other zone pairs. The UTOWN-type
approaches might be a good way for students to understand the underlying
theory, but maybe there is an even-simpler format to show, as with simple
spreadsheet-based mode choice examples, the impacts that the
purpose-specific calibrated constants in destination choice models are
having on the flow calculations? For traffic assignment, a
spreadsheet-based simplification might be to avoid showing the details of an
equilibration process and focus instead on what the model is doing in
regards to calculating the uncongested (free) speed travel time for
different types of links and the extra travel time due to volume-based
delays throughout an average weekday.

I feel compelled to repeat that I am not concerned with those who are
pointing out previous development of tutorials to explain modeling theory, I
am instead concerned that actual implemented models (trip and tour-based)
are not being described/presented in a manner that makes it easy for anyone
but the original model developer to understand how that particular model
really works, or how that particular model compares to models that have been
developed in other regions with a similar forecasting need.

Thank you!
--
Full post:
https://tmip.org/content/documenting-properties-implemented-travel-forec...
ng-models
Manage my subscriptions: https://tmip.org/mailinglist
Stop emails for this post: https://tmip.org/mailinglist/unsubscribe/3467

KenCervenka

I agree with the observations Luis (Pilo) Willumsen made in his October 31 post on the "model documentation" topic. I have had outside-the-Forum conversations with other forecasting consultants, and will offer some of the highlights (highly paraphrased by me), to see if these inspire any additional observations:

 1) Primarily for the somewhat-complex models with substantial interactions between model components, but this can be done for any models, is for model documentation to include the findings from a set of carefully-constructed sensitivity tests that enable model-based arc elasticities to be calculated, that can then be compared to elasticities obtained from other sources.  So tests related to the model sensitivities to predicting changes in person travel between transportation alternatives and due to changes in zonal demographics, which might follow a standard testing format that enables comparisons across different implemented models.

 2) While finding a way to make it easier (via working spreadsheets) for a non-model-developer to understand how specific components of an implemented model works is good, there is also value for documentation to include information that helps the reader, including future model updaters, understand what led to the specific model implementation decisions (e.g., to not just identify which modes are represented in mode choice, but to explain what led to the specific choices represented, and the data available to the model implementation exercise, as well as why other commonly-represented choices, such as kiss-and-ride, were perhaps not included due to lack of data.  Plus which coefficients were asserted, and the source for the assertions).

 3) Although there are already validation-related guidance reports available (e.g., an FHWA-sponsored report), there is value in having a somewhat-standard format available for not only summarizing the findings but in describing likely reasons for any anomalies encountered.