This function returns a data.frame where each row provides one or several goodness-of-fit measures between a simulated and an observed Origin-Destination matrix.
Usage
gof(
sim,
obs,
measures = "all",
distance = NULL,
bin_size = 2,
use_proba = FALSE,
check_names = FALSE
)
Arguments
- sim
an object of class
TDLM
(output ofrun_law_model()
,run_law()
orrun_model()
). A matrix or a list of matrices can also be used (see Note).- obs
a squared matrix representing the observed mobility flows.
- measures
a vector of string(s) indicating which goodness-of-fit measure(s) to chose (see Details). If
"all"
is specified, then all measures will be calculated.- distance
a squared matrix representing the distance between locations. Only necessary for the distance-based measures.
- bin_size
a numeric value indicating the size of bin used to discretize the distance distribution to compute CPC_d (2 "km" by default).
- use_proba
a boolean indicating if the
proba
matrix should be used instead of the simulated OD matrix to compute the measure(s). Only valid for the output fromrun_law_model()
with argumentwrite_proba = TRUE
(see Note).- check_names
a boolean indicating if the ID location are used as matrix rownames and colnames and if they should be checked (see Note).
Value
A data.frame providing one or several goodness-of-fit measure(s) between simulated OD(s) and an observed OD. Each row corresponds to a matrix sorted according to the list (or list of list) elements (names are used if provided).
Details
With \(n\) the number of locations, \(T_{ij}\) the
observed flow between location \(i\) and location \(j\)
(argument obs
), \(\tilde{T}_{ij}\) a simulated flow
between location \(i\) and location \(j\) (a matrix from
argument sim
), \(N=\sum_{i,j=1}^n T_{ij}\) the
sum of observed flows and
\(\tilde{N}=\sum_{i,j=1}^n \tilde{T}_{ij}\)
the sum of simulated flows.
Several goodness-of-fit measures have been considered
measures = c("CPC", "NRMSE", "KL", "CPL", "CPC_d", "KS")
. The Common Part
of Commuters (Gargiulo et al. 2012; Lenormand et al. 2012; Lenormand et al. 2016)
,
\(\displaystyle CPC(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n min(T_{ij},\tilde{T}_{ij})}{N + \tilde{N}}\)
the Normalized Root Mean Square Error (NRMSE),
\(\displaystyle NRMSE(T,\tilde{T}) = \sqrt{\frac{\sum_{i,j=1}^n (T_{ij}-\tilde{T}_{ij})^2}{N}}\)
the Kullback–Leibler divergence (Kullback and Leibler 1951) ,
\(\displaystyle KL(T,\tilde{T}) = \sum_{i,j=1}^n \frac{T_{ij}}{N}\log\left(\frac{T_{ij}}{N}\frac{\tilde{N}}{\tilde{T}_{ij}}\right)\)
the Common Part of Links (CPL) (Lenormand et al. 2016) ,
\(\displaystyle CPL(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n 1_{T_{ij}>0} \cdot 1_{\tilde{T}_{ij}>0}}{\sum_{i,j=1}^n 1_{T_{ij}>0} + \sum_{i,j=1}^n 1_{\tilde{T}_{ij}>0}}\)
the Common Part of Commuters based on the disance
(Lenormand et al. 2016)
, noted CPC_d. Let us consider
\(N_k\) (and \(\tilde{N}_k\)) the
sum of observed (and simulated) flows at a distance comprised in the bin
[bin_size
*k-bin_size
, bin_size
*k[.
\(\displaystyle CPC_d(T,\tilde{T}) = \frac{2\cdot\sum_{k=1}^{\infty} min(N_{k},\tilde{N}_{k})}{N+\tilde{N}}\)
and the Kolmogorv-Smirnov statistic and p-value (Massey 1951) , noted KS. It is based on the observed and simulated flow distance distribution and computed with the ks_test function from the Ecume package.
Note
By default, if sim
is an output of run_law_model()
the measure(s) are computed only for the simulated OD matrices and
not the proba
matrix (included in the output when
write_proba = TRUE
). The argument use_proba
can be used to compute the
measure(s) based on the proba
matrix instead of the simulated
OD matrix. In this case the argument obs
should also be a proba matrix.
All the inputs should be based on the same number of
locations sorted in the same order. It is recommended to use the location ID
as matrix rownames and matrix colnames and to set
check_names = TRUE
to verify that everything is in order before running
this function (check_names = FALSE
by default). Note that the function
check_format_names()
can be used to control the validity of all the inputs
before running the main package's functions.
References
Lenormand M, Bassolas A, Ramasco JJ (2016). “Systematic comparison of trip distribution laws and models.” Journal of Transport Geography, 51, 158-169.
Gargiulo F, Lenormand M, Huet S, Baqueiro Espinosa O (2012). “Commuting network model: getting to the essentials.” Journal of Artificial Societies and Social Simulation, 15(2), 13.
Lenormand M, Huet S, Gargiulo F, Deffuant G (2012). “A Universal Model of Commuting Networks.” PLoS ONE, 7, e45985.
Kullback S, Leibler RA (1951). “On Information and Sufficiency.” The Annals of Mathematical Statistics, 22(1), 79 -- 86.
Massey FJ (1951). “The Kolmogorov-Smirnov test for goodness of fit.” Journal of the American Statistical Association, 46(253), 68--78.
Author
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
data(mass)
data(distance)
data(od)
mi <- as.numeric(mass[, 1])
mj <- mi
Oi <- as.numeric(mass[, 2])
Dj <- as.numeric(mass[, 3])
res <- run_law_model(
law = "GravExp", mass_origin = mi, mass_destination = mj,
distance = distance, opportunity = NULL, param = 0.01,
model = "DCM", nb_trips = NULL, out_trips = Oi, in_trips = Dj,
average = FALSE, nbrep = 1, maxiter = 50, mindiff = 0.01,
write_proba = FALSE,
check_names = FALSE
)
gof(
sim = res, obs = od, measures = "CPC", distance = NULL, bin_size = 2,
use_proba = FALSE,
check_names = FALSE
)
#> Simulation CPC
#> 1 replication_1 0.4574413