Abstract
Validation assessments should respond to a diverse set of stakeholders, each with different questions and needs relevant to validation. A single metric designed to address each validation need can inadvertently lead to convoluted, misleading interpretations, especially as individual stakeholders interpret the details of the assessment in terms of their own (often unevaluated) intended use. We propose a validation assessment workflow composed of four necessary and distinct components: (1) model accuracy, (2) model acceptability, (3) validation evaluation, and (4) validation recommendations. We discuss the necessity and purpose of each component in the validation workflow and demonstrate the intent of each component with an example from high-energy-density physics.
1 Introduction
We present a validation assessment workflow to respond to many validation questions and needs of stakeholders, such as
“What accuracy can I expect?”
“Is the model sufficient for my requirements?”
“How good is good enough?”
“What confidence can I expect under these conditions?”
“In what conditions should I be concerned with using this model?”
“How can I improve the model or experiment?”
“What predictive uncertainty can I expect?”
A diverse set of users, such as decision-makers, consumers of simulations, customers, code developer, and analyst, may have interest in the complete validation assessment.
This workflow provides a basic path to produce a validation assessment commensurate with the definition of “validation” as given by ASME V&V 10 [1]: (1) the accuracy of a model as compared to reality, (2) the degree of confidence in a comparison of model to reality, and (3) the acceptability of a model given the needs and uses of the model.
In addition, provision has been made in the workflow for steps that can be used for defining validation requirements, engaging with stakeholders, working with model and code developers, and integrating with decision makers.
The workflow is meant to be “practical” in several senses. First, it is meant to provide a basis that can be used for each of several, specific intended uses. It is our experience that very seldom is the expense of creating computer simulations incurred unless the code is meant to answer several questions as found in the first paragraph. Second, it is “practical” in the sense that the workflow outlines the key components needed for a complete validation assessment, especially when an evaluation of predictive capability is required. Third, the workflow is meant to provide a framework that is adaptable. No one workflow addresses the validation needs of all the communities wherein VVUQ is needed. Links to the appropriate components can be made as required, the steps themselves could be expanded as their own workflow with explicit details on how that step is to be accomplished, and/or the steps can be modified to call out how decision making is conducted for a specific organization or application. Fourth, we propose four key components needed to make the workflow a practical way to fully develop the evidence base for a multitude of intended uses.
The workflow presented here is thus meant to provide a simple, diagrammatically easy to follow flow that can be modified to meet specific needs of a given situation. The four key components we suggest are the essential steps needed for a complete validation assessment and providing a body of evidence to stakeholders for decision making.
1.1 Model Accuracy.
Determination of a model's accuracy for a set of model prediction quantities of interest (QoI) as compared to experiment, including quantification of the degree of uncertainty in the model accuracy.
“What accuracy can I expect?”
“What certainty can I expect from my predictions?”
“Where does the model apply given experimental evidence?”
1.2 Model Acceptability.
Evidence of acceptance or rejection of a model given validation evidence (i.e., model accuracy and uncertainty in the model accuracy) and model requirements based on the intended uses of the model.
“Is the model sufficient for my requirements?”
“How good is good enough?”
1.3 Validation Evaluation.
Quantitative evaluation of the model, experiment, and validation process based on the intended use and predictive applications of the model.
“In what conditions should I be concerned with using this model?”
“What predictive uncertainty can I expect?”
1.4 Validation Recommendations.
Recommendations intended to guide future validation strategies, resource allocation, model-use, and prioritization of validation investments.
“How can I improve the model or experiment?”
The validation assessment workflow is summarized in Fig. 1 (modified from Oberkampf and Roy [1]). We designate two relevant ways of using the term validation assessment in this paper: (1) as a verb, indicating the entire workflow indicated in Fig. 1 and (2) as a noun specifying the outcome of the workflow.
In this paper, we outline a practical workflow for performing a meaningful, comprehensive validation assessment and encouraging validation best practices. The workflow does not address specific validation methods (e.g., extending validation away from validation set points, acceptability metrics, and sparse data uncertainty quantification). However, if a stakeholder's need or use warrants specific evaluation methods, then those methods can be incorporated into the key components of the workflow.
Every validation assessment may not fit the proposed workflow perfectly and we find modification is sometimes merited, but we find that the proposed workflow fits much of the work we conduct. Examples where modification is merited are discussed in the Appendix.
Validation frameworks in the literature highlight aspects of the proposed workflow, e.g., Roy and Oberkampf [2] propose a framework quantifying model accuracy and uncertainty in the model accuracy; Athe and Dinh [3] use goal structuring notation to determine predictive maturity (validation recommendations) and aid in prioritization of physics components (required before implementation of the workflow); and Bayarri et al. [4] develop a Bayesian framework exercising many of the subitems and highlight thoughts complementary to the full workflow.
The purpose of this paper is to describe the overall workflow shown in Fig. 1. In-depth details will be the subject of the following papers: a rigorous demonstration quantifying model accuracy and uncertainty in the model accuracy [5]; additional methods of validation evaluation using analytic derivations [6]; and a set of hypothesis tests to inform decision-making in model acceptability and validation evaluation [7]. Succeeding papers will include the evaluation of a simulation code's utility for addressing specific questions, risk assessment using the workflow, and validation recommendations concerning both intended use and anticipating intended use through assessment of the potential for extrapolation.
This paper is outlined as follows. A brief discussion of model accuracy, model acceptability and validation evaluation in the V&V literature is provided in Sec. 3. This is followed by a discussion of the purpose of each component illustrated by examples from a high-energy-density physics validation assessment. Background for these examples can be found in Wilson and Koskelo [5] and Michel et al. [8,9].
2 The Validation Workflow in the Literature
Validation and uncertainty quantification methods satisfying individual workflow components, such as model accuracy and model acceptability, can be found in the literature. Diversity in approaches and methods have evolved dependent on the needs and focus of each discipline, e.g., operations research (OR), systems modeling, computational physics (CP) [10–13], regulations [3], quality assurance, and social sciences.
Early V&V research (as early as the 1960s) was limited to systems modeling and OR. In general, models in these fields are empirical, statistical, or mathematical. A comprehensive review of OR V&V is provided by Sargent and Balci [14]. This community developed rigorous, statistical methods meant to quantify model accuracy [15,16] and acceptability [17].
During the late 1990s and early 2000s, CP researchers began to focus on V&V. CP can be generalized as models developed from physics laws and solved on numerically discretized partial differential equations (PDEs) (see Oberkampf and Roy [1]).
The CP V&V literature is rich with metrics and methodologies to accurately and unambiguously quantify model accuracy (see the ASME V&V 10 and 20 Standards [18,19] and Oberkampf and Roy [1]). While several works acknowledge the necessity of assessing model acceptability [1,10,18–22], few provide explicit methods for statistically quantifying model acceptability. Notable exceptions include both Bayesian methods (e.g., Mullins et al. [23]) and frequentist methods (e.g., Hyunh et al. [24]).
Even with over 50 years of foundational research from the OR community, CP and OR V&V remain somewhat disjoint. The proposed validation assessment workflow is not meant to replace previous validation research across many disciplines. Instead, it encourages validation best practices by structuring a validation assessment that addresses the practical needs of validation stakeholders using a diverse set of application-appropriate methodologies.
3 Validation Assessment Workflow
The proposed validation assessment workflow consists of four distinct and necessary components to provide evidence to stakeholders:
model accuracy,
model acceptability,
validation evaluation, and
validation recommendations.
In Secs. 3.1–3.4, we outline each component. We use a laser-driven imploding shell validation example to demonstrate the necessity and differences of each component in the validation assessment workflow. For a detailed background of the example and quantification of model accuracy, the reader is referred to Wilson and Koskelo [5] and Michel et al. [8,9].
3.1 Model Accuracy.
Determination of a model's accuracy for a set of model prediction QoI as compared to experiment, including quantification of the degree of uncertainty in the model accuracy.
Model accuracy and the uncertainty in the model accuracy form the validation basis for the workflow. Many methods exist for quantifying model accuracy [1,10,18,19]. In this paper, we follow the ASME V&V standards [18,19]. We use definitions from the ASME V&V 10 standard [18] and follow the ASME V&V 20-2009 standard [19] to quantify model accuracy and validation uncertainty.
From V&V 20-2009 (see Fig. 2 for reference), the comparison error eϕ is the difference between simulation QoI ϕS and experimental QoI ϕD (eϕ = ϕS − ϕD). For any validation, eϕ is probabilistic and belongs to a pdf of possible comparison error responses, i.e., pdf(eϕ), distributed about the mean comparison error ⟨eϕ⟩. The pdf of eϕ is the convolution of the experimental and simulation QoI probability distributions functions (i.e., pdf(ϕS) and pdf(ϕD)). The shape and position of each pdf is sensitive to the QoI variability, random errors, and systematic errors, such as numerical discretization error, model input variability, and experimental errors.
The mean comparison error ⟨eϕ⟩ can be thought of as the accuracy of the model to predict reality. A point other than the expectation value of ⟨eϕ⟩ could be used for “model accuracy” if appropriate for the application of interest.
Where appropriate, model accuracy should be evaluated using a statistically significant number of replicates. The authors acknowledge that doing so may be too restrictive due to time, budget, or computational resources. Where the latter is the case, model accuracy should include a means of assessing the effect on uncertainty due to sparse data and any assumptions used in the validation using sparse data. This could include expert opinion on level of confidence in the underlying mathematical models of the simulation or other means discussed in the literature for treating sparse data in a more rigorous way. This workflow does not disqualify the use of probabilistic or sparse uncertainty methods.
This formulation follows from the convolution of multiple uncorrelated, independent Gaussian distributions, e.g., pdf(ϕS) and pdf(ϕD) (see Lemons [25]).
The uncertainty estimate in Eq. (2) assumes uncorrelated, independent uncertainty sources, which is often not the case in numerical modeling. For example, calculated input uncertainties also contain numerical errors. When adequately reduced, numerical errors can have a less dominant effect on input uncertainties than input parameters. As is, Eq. (2) gives a conservative estimate of the validation uncertainty. V&V 20-2009 [19] further discusses methods of accounting for correlated uncertainty sources.
3.1.1 Model Accuracy in the Validation Example.
For our example, the simulation and experimental QoI (scattered laser energy ϕPs, ablation front trajectory ϕr, and ablation front velocityϕv) are shown in the top plot of Fig. 3. For this particular case, the capsule is driven by a triple picket pulse at early times (t < 1 ns) followed by the primary drive (t > 1 ns). The uncertainties are given as shaded regions and were calculated rigorously (see Wilson and Koskelo [5]).
The model accuracy is quantified using the comparison error. The corresponding comparison error bounded by the validation uncertainty at 1σ, 2σ, and 3σ confidence is shown in the bottom plot of Fig. 3. Qualitatively, the simulation exhibits good agreement with the scattered laser energy during the triple picket pulse (t < 1 ns) and for most of the ablation front trajectory. The uncertainty is relatively small for the scattered laser light and increases with ablation front trajectory and velocity. From the comparison error and validation uncertainty, the model accuracy (given the comparison error and validation uncertainty) quantitatively responds to the questions
“What accuracy can I expect?”
“What certainty can I expect from my predictions?”
“Where does the model apply given experimental evidence?”
However, assessing model accuracy and uncertainty in the model accuracy is usually insufficient. For instance, model accuracy cannot respond to the questions
“Is the model sufficient for my requirements?”
“How good is good enough?”
“In what conditions should I be concerned with using this model?”
“How can I improve the model or experiment?”
These questions generally arise in situations in which the stakeholder wants to use the model for prediction. Thus, the strength of the workflow is to understand and respond to specific stakeholder's needs.
3.2 Model Acceptability.
Evidence of acceptance or rejection of a model given validation evidence (i.e., model accuracy and uncertainty in the model accuracy) and model requirements based on the intended uses of the model.
Model acceptability is determined by the adequacy of model predictions based on model-use requirements. In general, model acceptability should provide evidence of adequacy to a decision-maker. However, under certain conditions or requirements, the process of model acceptability may also include a decision (see Appendix). In this case, the analyst is also the decision-maker or decisions are made by both analyst and decision-maker together.
Ideally, model acceptability should be expressed statistically; however, when a validation assessment is limited (e.g., cost and data availability), probabilistic methods may not be feasible. In this case, the risk and rigor of alternate methods applied to the questions at hand should be assessed.
Requirements are a mathematical representation of stakeholder needs. They should be relevant to intended uses of the model (e.g., application, risk, and previous model accuracy) and determined by validation analysts in collaboration with code-users, stakeholders, developers, and customers. The difficulty in defining relevant and meaningful requirements has limited development of model acceptability methods as noted by Dowding et al. [26].
We find working with the stakeholder through the questions articulated above to be the best means of mitigating this issue. To do so to effectively arrive at requirements places a responsibility of the validation analyst to understand the scientific and/or technical underpinnings of the simulation model as the analyst must interpret stakeholder needs and translate them into useful validation assessment requirements.
Requirements for determining model acceptability can be as simple as evaluating model improvement from previous models for given QoI or as complex as confidently predicting a given set of QoI within a set of model-use tolerances. Regardless of the requirements for an acceptable model (e.g., improvements from previous models), general statistical procedures apply. For now, we proceed assuming symmetrically distributed model-use tolerances (i.e., Δϕ).
Hypothesis testing lends itself well to quantifying model acceptability; however, we do not suggest model acceptability is limited to hypothesis testing. Using hypothesis testing, model acceptability quantifies the level of confidence that a model will adequately predict physical phenomena within the model-use tolerances (null hypothesis H0). The alternate hypothesis H1 implies model predictions are not within model-use tolerances. Some model acceptability hypotheses are outlined below. Note the similarities to the acceptability hypotheses proposed by Balci and Sargent [27] and Rebba and Mahadevan [28].
Typically, one would prefer to accept a model with “high” confidence. To use statistical terminology, we define two forms of model acceptability:
Type I Validation
Given the validation evidence, the model is rejected (unacceptable for our needs). Rejection of the model incurs the risk of a quantifiable Type I error.
Type II Validation
Given the validation evidence, the model is accepted (acceptable for our needs). Acceptance of the model incurs the risk of a quantifiable Type II error.
A Type II validation accepts the model and the model is released.
For example, if a decision-maker would like to accept the model with 90% confidence, the decision-maker can accept any region where the model acceptability lies above 90%. However, there is at least a 10% significance that a Type II error was made (i.e., the model is actually unacceptable). Thus, the validation is a Type II validation.
Model acceptability also informs the subsequent process (i.e., validation evaluation), regardless of a Type I and Type II validation.
3.2.1 Model Acceptability in the Validation Example.
For the demonstration, we consider the questions
“Is the model sufficient for my requirements?”
“How good is good enough?”
We define requirements or model-use tolerances for each QoI, i.e., the scattered light energy ϕps, ablation front trajectory ϕr, and ablation front velocity ϕv. Example model-use tolerances were determined through conversation with developers and designers:
Δϕps = ±0. 025max(ϕpi)
≈ 0.5 TW and 0.25 TW for the high and low energy cases,
Δϕr = ±0. 05ϕD,r
≈ 5% of the experimental ablation front trajectory,
Δϕv = ±0.05ϕD,v
≈ 5% of the experimental ablation front velocity.
Due to the wide range of model applications, these model-use tolerances will not be appropriate for all intended uses.
In Fig. 4, the comparison error and validation uncertainty are normalized by the predefined model-use tolerances (shown as dashed lines). With this additional information, qualitative regions of acceptability are apparent. As a qualitative example, the model predictions are outside the model-use tolerances for the scattered laser energy during the primary drive (t > 1 ns), the ablation front trajectory at very late times (t > 2.5 ns) and, nearly, the entirety of the ablation front velocity. On the other hand, the model adequately predicts the scattered laser energy and ablation front velocity at early times.
Qualitative assessments of model acceptability are inadequate; instead, statistical metrics are encouraged for more quantitative assessments. We suggest a set of hypothesis tests presented in our future paper [7]. In summary, we quantify our confidence that the model predictions are within the model-use tolerances (Model Acceptability). C(H0) defines the minimum acceptable confidence level for H0—the hypothesis that the model is within the model-use tolerances. The minimum acceptable confidence level for the model acceptability hypothesis C(H0) and its counterpart, and the maximum acceptable significance level S(H0) are shown in Fig. 4. Figure 4 suggests high probability (>95%) of a Type II validation during the triple picket pulse (t < 1 ns) for the scattered laser energy; however, a low probability of a Type II validation is observed for all QoI during the primary drive (t > 1 ns).
Although we have assessed the acceptability of the model, several additional questions must be considered. For example, during the primary drive (1 ns < t < 2.5 ns), does the model accuracy suggest a systematic error, such as that caused by model form errors, diagnostic bias, etc.? Also, does the uncertainty need to be reduced before we can arrive at a statistically significant model acceptability?
3.3 Validation Evaluation.
Quantitative evaluation of the model, experiment, and validation process based on the intended use and predictive applications of the model.
During the validation assessment, evaluations concerning model, experiment, or the validation process should be pursued. Evaluations should inform developers and code-users concerning model use. These include model capabilities for a range of conditions of interest, evaluation of model insufficiency, evaluation of bounds that arise from the current experiment accuracy and precision, and the adequacy of the metrics used in the validation process itself. Evaluations diagnose strengths and failings, identify regimes and conditions in which model usage is appropriate and quantifying the adequacy with respect to intended use and applications. This step can also be used to investigate when a model is insufficient and identify how it is insufficient, e.g., by incomplete capturing of relevant physics or inadequate resolution in the numeric implementation.
The weighting of relative importance for the above recommendations may be different based on validation acceptability findings; however, a meaningful validation assessment will include validation evaluations for both Type I and Type II validations. For example, users may be more interested in implications of model usage for a Type II validation. On the other hand, a Type I validation may encourage interest in improving experiments or models by code-developers and experimentalists.
Referring back to the workflow in Fig. 1, we show the Model Acceptability as a process. For some limited validation assessments, this step might include a decision in which one would determine whether the model was acceptable or not, then proceed to the validation evaluation and validation recommendations along the pertinent part of the workflow (see Appendix). However, for more general and/or complicated validation assessments, one may want to conduct evaluations around when the model is acceptable as well as when it fails. This is the strength of the workflow as we hope the example used in this paper illustrates.
Many evaluation techniques exist and should be used, including validation uncertainty quantification [28], experimental uncertainty quantification [29,30], sensitivity analysis [23,31,32], design of experiments, mathematical verification [6], hypothesis testing, model disparity identification [33], and Bayesian inference [34].
3.3.1 Validation Evaluation in the Validation Example.
For the validation example, we address the following validation evaluation questions:
“In what conditions should I be concerned with using this model?”
“Does the model accuracy suggest a systematic error, such as that caused by model form errors, diagnostic bias, etc.?”
“Does the uncertainty need to be reduced before we can arrive at a statistically significant model acceptability?”
Sensitivities to specific physics models, including partial ionization and nonlocal thermal conduction, were investigated using finite difference approximations in Wilson and Koskelo [5]. Sensitivities to model dimensionality (e.g., one-dimensional versus two-dimensional) were also evaluated. Negligible differences were observed.
The model acceptability component has indicated a Type I validation during the primary drive (t > 1 ns). This has encouraged additional validation evaluation to identify the cause of the Type I validation. We encourage performing validation evaluation whether the model passes an acceptance test or not. Using the workflow in this manner is particularly useful for identifying when a model is predictive and when it is not.
One aspect of validation evaluation is to detect validation ambiguity and validation systematic errors. Validation ambiguity arises when the validation uncertainty is much larger than the model-use tolerances, resulting in ambiguous acceptability conclusions. On the other hand, validation systematic errors indicate a statistically significant systematic discrepancy in the validation comparison. We propose two metrics for identifying validation ambiguity and validation systematic errors. In this paper, these metrics are shown as the minimum acceptable confidence level of the validation ambiguity hypothesis C(J0) and minimum acceptable confidence level of the validation systematic error hypothesis C(K0). For more details, see Wilson et al. [7].
In Fig. 5, minimum acceptable confidence level of the validation ambiguity hypothesis C(J0) and minimum acceptable confidence level of the validation systematic error hypothesis C(K0) show the probability of a statistically significant ambiguity and statistically significant systematic error. High probability of a statistically significant systematic error exists for all QoI. Further analysis suggests that this is due to a model form error (e.g., missing physics) in the laser package, namely, cross-beam energy transfer (CBET)—see Wilson and Koskelo [5].
Figure 5 also suggests high probability of ambiguity for the ablation front velocity QoI. Decomposition of the contributing ablation front velocity uncertainties (i.e., experimental uncertainty uD,ϕ, input uncertainty uI,ϕ, and numerical uncertainty uN,ϕ) indicates large experimental uncertainties and input uncertainties at early times and dominant numerical uncertainties at late times See Fig. 6. If the ablation front velocity QoI is important to the intended uses of the model, resources should be allocated to reduce all uncertainties by collaborating with experimentalists, code developers, and users.
As for numerical uncertainties, numerical uncertainties were rigorously quantified from an earlier discretization sensitivity study—see Figs. 5 and 7 in Wilson and Koskelo [5]. In summary, numerical uncertainties begin to be significant at the first laser picket (t ≈ 0.2 ns) and the initial rise in the primary drive (t ≈ 1.2 ns). With the exception of the first laser picket and the initial drive profile, the adaptive mesh refinement (AMR) sufficiently resolves the solution mesh to force numerical uncertainties much smaller than the experimental uncertainties. We emphasize that statements regarding small or negligible numerical uncertainties should be preceded by rigorous assessment of the numerical uncertainty, as is found in Wilson and Koskelo [5].
An alternate source of input and numerical uncertainty is variability in the postprocessing for the ablation front velocity [5].
Given the inability of the model to predict QoI during the primary drive and the high probability of a statistically significant systematic error, two methods for compensating for the CBET systematic error are evaluated: (1) the laser scaling method [35] and (2) the flux limiting method. Both are discussed in detail in Wilson and Koskelo [5] and validation metrics are shown in Fig. 7. With the exception of the ablation front velocity, a significant improvement in model accuracy is observed. However, both methods are calibrated to match the data and the improved accuracy comes at the cost of predictability.
3.4 Validation Recommendations.
Recommendations intended to guide future validation strategies, resource allocation, model-use, and prioritization of validation investments.
Validation recommendations are where the feedback from the validation assessment is communicated to the stakeholders. Here, evidence from the validation evaluation and assessment are combined into a set of recommendations meant to guide future validation strategies, resource allocation, prioritization of validation investments, and model use (see Fig. 1), e.g.,
new experiments designed to constrain physical phenomena in unconstrained regimes,
new experiments to identify and/or exercise covariance between model terms or between underlying computational submodels, i.e., compensating errors,
diagnostic development or experimental replicates meant to reduce experimental uncertainties,
model development to improve model fidelity by addressing model form errors, and
numerical discretization requirements to sufficiently resolve particular physics.
As shown in Fig. 1, this workflow assumes that the process of designing experiments for model creation and calibration has already occurred. Although the design process is not covered here, should the validation evaluation reveal that the model is insufficient, validation recommendations should encourage and inform the design of new experiments for both calibration and validation. Many papers provide guidelines for best practices regarding the design of experiments for validation [1,3,36].
3.4.1 Validation Recommendations in the Validation Example.
Based on the evidence from the validation assessment components (i.e., model accuracy, model acceptability, and the validation assessment), we make several recommendations:
The laser package model (as-is) is not acceptable based on the current validation experiments. Regions of acceptability include low laser energies. If alternate calibration methods are implemented, such as laser energy scaling or flux limiter scaling, satisfactory results are predicted at all laser energies. However, predictability suffers when in interpolative or extrapolative regimes due to the reliance of nonphysical based calibration methods.
There exists high confidence in a systematic error during high laser energies. The likely cause of systematic laser scattering deficiencies is a model form error or missing physics (i.e., cross-beam energy transfer). The laser package should be updated to include CBET predictive capabilities and the validation assessment should be repeated. Additionally, future efforts should invest in additional CBET-sensitive experiments and diversified diagnostics to confirm we are addressing the correct physics.
Uncertainties in the ablation front velocity are restrictively large and indicate validation ambiguity. Further analysis indicates that the largest uncertainties are experimental. Additional resources should be allocated to identify and reduce experimental uncertainties for future experiments and analysis.
Given the current validation experiments, the hydrodynamics and laser package models are insensitive to partial ionization methods, opacity multigroup definitions, and nonlocal thermal conduction when using a flux limiter of f = 0.3.
The numerical model is in the asymptotic regime (not shown) for the given adaptive mesh refinement grid spacing δx (i.e., δx = 32 μm, δxAMR = 0.25 μm). This is true for the laser, hydrodynamics, and combined mesh. For more details, see Wilson and Koskelo [5].
4 Summary
Validation assessments should respond to many questions from a diverse set of users, including code and model developers, experimentalists, managers, customers, and code users. Unfortunately, the majority of validation literature has focused on metrics quantifying the accuracy and confidence of a set of model predictions. Using a single metric to address each validation need can inadvertently lead to convoluted, misleading interpretations.
We propose a validation assessment workflow composed of four distinct, but necessary, components:
model accuracy,
model acceptability,
validation evaluation, and
validation recommendations.
Incorporation of each of these four components will lead to a more complete validation assessment capable of clearly and unambiguously responding to the needs of stakeholders and model use.
This paper is the first of a series of papers detailing the implementation of the validation assessment workflow. In this paper, we have outlined the distinguishing features of each component in the workflow. The necessity of each is discussed and the available methods for quantifying metrics in each component are suggested.
Each component in the workflow is demonstrated using a validation example from high-energy-density physics (see Wilson and coworkers [5,7]). Direct validation recommendations addressing stakeholder's questions were arrived at using the workflow as proposed in this paper. Recommendations are suggested based on the validation basis of rigorous model accuracy, model acceptability, and validation evaluation.
We highlight that these recommendations could not be determined by stopping at model accuracy. Stopping at model accuracy is, in general, too restrictive. Even if you did stop at model accuracy, an analyst would still write up the findings and evidence into validation evaluation and validation recommendations. It is our opinion that this practical workflow does not limit a validation assessment to respond to all stakeholder's questions and needs and captures key components of model acceptability, validation evaluation, and validation recommendations suggested by the V&V standards [18,19].
Acknowledgment
We acknowledge contributions to this work from Joanne Budzien through lengthy discussion and insight into the workflow. This work was supported by the U.S. Department of Energy through the Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract no. 89233218CNA000001).
Funding Data
U.S. Department of Energy (Funder ID: 10.13039/100000015).
Appendix: Examples of Modifying the Workflow for Local Conditions
In this appendix, we provide a couple of examples of modifying the basic workflow to suit a couple of different work environments. We hope that this demonstrates how one can adapt the workflow discussed in the main body of this paper.
Example 1
This example assumes that several conditions exist:
Acceptance criteria have been obtained that are commensurate with how the decision maker will make a decision.
The protocol for decision making requires a decision at the model acceptability stage.
The case for determining model acceptability is sufficient for the decision maker to make a decision.
The decision maker makes the decision communicates that back to the validation analysts.
Here, the step in Fig. 1 labeled “model acceptability” now becomes a decision step. See Fig. 8. For simplicity of viewing, we change this step from a box to a parallelogram to signify that a decision will be made. All of the work for evaluating the level of acceptability is still done in this step but now is explicitly conducted with a predefined acceptance criterion and evaluation of the simulation model's acceptability (as above). Should the work environment do not allow for a more full validation assessment, the process could stop here. However, the documentation delivered should, at a minimum, describe this reduced level of validation.
If the decision is that the model is rejected but the decision maker requires a more certain validation assessment is needed, one would follow rejected path and feed back to either the modelers, the experiment community, the code implementation community, or whatever combination of those is the source of the largest uncertainty.
Example 2
In this example, a decision could be made at any step of the validation workflow (see Fig. 9). Here, the validation analyst is communicating with the stakeholder throughout the validation assessment. The stakeholder is updated at each step of the workflow and it is agreed upon in advance (so that the suitable validation evidence base is produced that meets the needs of the stakeholder) how the simulations need to be evaluated. Now the link to the workflow is explicitly shown (link to entry point 3). The to be determined (TBD) point depends on what the decision is. It could be reentering the workflow at a different point. It could be termination of the validation assessment. It could be a loop back to one or two or both. Whichever is the proper path, that would be captured by amending the modified workflow below.