Conventions for describing reproducible and reusable simulation experiments with SED-ML

Overview

Simulators should support SED-ML L1V3 or later. To accommodate a wide range of modeling frameworks and simulation algorithms, BioSimulators and BioSimulations embrace the additional conventions for SED-ML described below, as well as the conventions for executing SED-ML documents described here.

Model and data descriptor source paths

SED-ML can refer to model and data descriptor files in multiple ways, including via paths to local files, URLs, URI fragments to other models defined in the same SED-ML document, and identifiers for an Identifiers.org namespace such as BioModels. When referencing files via local paths, SED-ML documents should use paths relative to the location of the SED-ML document.

To ensure that COMBINE/OMEX archives are self-contained, we encourage SED-ML documents in COMBINE/OMEX archives to reference files via relative paths within archives or other models within the same SED-document.

Concrete XPath targets for changes to XML-encoded models

SED-ML enables investigators to use XPaths to specify changes to models that are encoded in XML files. This encompasses models described using CellML, SBML, and other languages. SED-ML documents should use valid XPaths that resolve to XML elements. For example, /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A']/@initialConcentration could be used to indicate a change to the initial condition of the species with id A.

In addition, the namespace prefixes used in XPaths should be defined within the SED-ML document as illustrated below.

<sedML xmlns:sbml="http://www.sbml.org/sbml/level3/version1/core">
  <listOfDataGenerators>
    <dataGenerator>
      <listOfVariables>
        <variable target="/sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A']/@initialConcentration" />
      </listOfVariables>
    </dataGenerator>
  </listOfDataGenerators>
</sedML>

Note

The SED-ML L1V3 and earlier specifications suggest that incomplete XPaths such as /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A'] should be used to indicate changes to model elements. We discourage this convention of partial XPaths because these XPaths do not point to the attribute that is intended to be changed. We encourage investigators to use complete XPaths.

Note

SED-ML L1V4 and later documents can use target and symbol together to reference implicit attributes of model elements, such as fluxes of reactions of flux-balance models.

Namespaces for `NewXML` elements of changes to XML-encoded models

SED-ML documents can use sedml:newXML elements of sedml:addXML and sedml:changeXML elements to specify objects that should be added to models or replaced in models. SED-ML documents should define the namespace(s) of the content of these NewXML elements. For example, a parameter that should be added to a SBML model could be described as <sbml:parameter xmlns:sbml="http://www.sbml.org/sbml/level3/version1" id="NewParameter" value="10.0" />.

Note

The SED-ML specifications suggest that namespaces don't need to be defined for the content of NewXML elements. We discourage this convention because XML files which embrace this convention are not consistent with SED-ML's XML schema. We encourage investigators to explicitly define the namespaces involved in the content of NewXML elements.

Data types for model attribute changes and algorithm parameters

SED-ML specifies that the new values of model attribute changes (sedml:changeAttribute/@sedml:newValue) and values of algorithm parameters (sedml:algorithmParameter/@sedml:value) must be encoded into strings. To ensure that SED-ML files are portable across simulation tools, we define several data types for model attribute changes and algorithm parameters and outlines how each data type should be encoded into strings. The data type of each algorithm parameter should be defined in the specification of each simulation tool.

boolean: Represents Boolean values. Should be encoded into strings as true/false or 0/1.
integer: Represents integers. Should be encoded in decimal notation (e.g., 1234).
float: Represents floating point numbers. Should be encoded in decimal (e.g., 1234.567) or scientific (e.g., 1.234567e3) notation.
string: Represents strings. Requires no additional encoding.
kisaoId: Represents a KiSAO term. Should be encoding using the id of the term (e.g., KISAO_0000029).
list: Represents a list of scalar values. Should be encoding using JSON (e.g., ['a', 'b', 'c'] or [1, 2, 3]). For example, the value of the deterministic reactions partition (KISAO_0000534) of the Pahle hybrid discrete/continuous Fehlberg method (KISAO_0000563) should be a list of the ids of the reactions which should be simulated by the Fehlberg sub-method. Its value should be encoded into SED-ML as <algorithmParameter kisaoID="KISAO:0000534" value='["ReactionId-1", "ReactionId-1", ...]' />.
object: Represents key-value pairs. Should be encoding using JSON (e.g., {a: 1, b: 2} or {a: 'x', b: 'y'}).
any: Represents any other data type. Should be encoding using JSON (e.g., [{a: 1, b: 2}]).

Enumerations for the value of an algorithm parameter values can be defined in the specification of a simulator using the recommendedRange attribute. This can be combined with any of the above data types.

Limit use of repeated tasks to the execution of independent simulation runs

In addition to capturing multiple independent simulation runs, sedml:repeatedTask/@resetModel="False" provides limited abilities to describe sets of dependent simulation runs, where each run begins from the end state of the previous run. This provides investigators limited abilities to describe meta simulation algorithms.

Simulation tools are encouraged to support a simpler subset of the features of sedml:repeatedTask that is sufficient to describe multiple independent simulation runs.

sedml:repeatedTask: Simulation tools should support resetModel="True" as described in the SED-ML specifications; the model specifications and initial conditions should be reset. Simulator state such as the states of random number generators should not be reset. When resetModel="False", simulation tools should support limited preservation of the state of simulations between iterations. Simulation tools should accumulate changes to the specifications of the model(s) involved in the task. Simulations tools should not copy the final simulation state from the previous iteration to the initial state of the next iteration.
Sub-tasks (sedml:subTask): Successive subtasks should be executed independently, including when they involve the same model. The final state of the previous sub-task should not be used to set up the initial state for the next sub-task.
Shape of model variables for the results of repeated tasks: Repeated tasks should produce multi-dimensional results. The first dimension should represent the iterations of the main range of the repeated task. The second dimension should represent the sub-tasks of the repeated task. The results of sub-tasks should be ordered in the same order the sub-tasks were executed (in order of their order attributes). The result of each sub-task should be reshaped to the largest shape of its sibling sub-tasks by padding smaller results with NaN. Each nesting of repeated tasks should contribute two additional dimensions for their ranges and sub-tasks. The final dimensions should be the dimensions of the atomic tasks of the repeated task (e.g., time for tasks of uniform time courses).

Canonical order of execution of tasks

For reproducibility, simulation tools should execute tasks in the order in which they are defined in SED-ML files.

Furthermore, because the order of execution can affect the results of simulations, in general, each task should be executed, including tasks which do not contribute to any output. This is particularly important for simulation tools that implement Monte Carlo algorithms. One exception is tasks whose results are invariant to their order of execution, such as most deterministic simulations. Such tasks can be executed in any order or in parallel.

Limit use of symbols to variables of data generators

SED-ML uses symbols to reference implicit properties of simulations that are not explicitly defined in the specification of the model for the simulation. The most frequently used symbol for SBML-encoded models is urn:sedml:symbol:time for the variable time. Such symbols only have defined values for simulations of models and not for models themselves.

Consequently, symbols should only be used in contexts where simulations are defined. Specifically, symbols should only be used in conjunction with variables of sedml:dataGenerator to record predicted values of symbols. Symbols should not be used in conjunction with the variables of sedml:computeChange, sedml:setValue, or sedml:functionalRange. Symbols should also not be used with sedml:setValue to set the values of symbols.

Variable targets for model objects that generate multiple predictions

Some algorithms, such as flux balance analysis (FBA, KISAO_0000437) and flux variability analysis (FVA, KISAO_0000526) generate multiple predictions for each model object. For example, flux variability analysis predicts minimum and maximum fluxes for each reaction. Targets (sedml:variable/@sedml:target) for such predictions should indicate the id of the desired prediction. To ensure portability of SED-ML files between simulation tools, we define the following ids. Please use GitHub issues to suggest additional ids for additional predictions of other algorithms.

FBA (KISAO_0000437), parsimonious FBA (KISAO_0000528), geometric FBA (KISAO_0000527):
- Objective: fbc:objective/@fbc:value
- Reaction flux: sbml:reaction/@fbc:flux
- Reaction reduced cost: sbml:reaction/@fbc:reducedCost
- Species shadow price: sbml:species/@fbc:shadowPrice
FVA (KISAO_0000526):
- Minimum reaction flux: sbml:reaction/@fbc:minFlux
- Maximum reaction flux: sbml:reaction/@fbc:maxFlux

Unique data set labels

To facilitate automated interpretation of simulation results, the data sets within a report should have unique labels (sedml:dataSet/@sedml:label). Note, the same label can be used across multiple reports.

Guides for using SED-ML and the COMBINE/OMEX archive format with specific model languages

Simulation tools should recognize the URNs and IRIs below to identify model languages described in SED-ML files and COMBINE/OMEX archives. The links in the "Info" column below contain more information about how simulation tools should interpret SED-ML in combination with specific model languages.

Language	EDAM id	SED-ML URN	COMBINE/OMEX archive specification URI	MIME type	Extensions
BNGL	3972	urn:sedml:language:bngl	http://purl.org/NET/mediatypes/text/bngl+plain	text/bngl+plain	.bngl
CellML	3240	urn:sedml:language:cellml	http://identifiers.org/combine.specifications/cellml	application/cellml+xml	.xml, .cellml
(NeuroML)/LEMS	9004	urn:sedml:language:lems	http://purl.org/NET/mediatypes/application/lems+xml	application/lems+xml	.xml
SBML	2585	urn:sedml:language:sbml	http://identifiers.org/combine.specifications/sbml	application/sbml+xml	.xml, .sbml
Smoldyn	9001	urn:sedml:language:smoldyn	http://purl.org/NET/mediatypes/text/smoldyn+plain	text/smoldyn+plain	.txt

Example SED-ML files and COMBINE/OMEX archives for all of the languages listed above are available here.

Recommended resources for implementing the execution of simulation experiments

Below are helpful tools for implementing the execution of simulation experiments described with SED-ML:

BioSimulators utils is a Python library which provides functions for implementing command-line interfaces to the above specifications, as well as functions for interpreting COMBINE/OMEX archives and SED-ML files, generating tables and plots of simulation plots, and logging the execution of COMBINE/OMEX archives. BioSimulators utils provides high-level access to some of the lower-level libraries listed below.
libSED-ML is a library for serializing and deserializing SED-ML documents to and from XML files. libSED-ML provides bindings for several languages.
jlibSED-ML is a Java library for serializing and deserializing SED-ML documents to and from XML files. The library also provides methods for resolving models, working with XPath targets for model elements, applying model changes, orchestrating the execution of tasks, calculating the values of data generators, and logging the execution of simulations. Note, jLibSED-ML support SED-ML <= L1V2 and diverges from some of the conventions described here.

Last update: 2023-04-12

Conventions for describing reproducible and reusable simulation experiments with SED-ML

Overview

Model and data descriptor source paths

Concrete XPath targets for changes to XML-encoded models

Namespaces for NewXML elements of changes to XML-encoded models

Data types for model attribute changes and algorithm parameters

Limit use of repeated tasks to the execution of independent simulation runs

Canonical order of execution of tasks

Limit use of symbols to variables of data generators

Variable targets for model objects that generate multiple predictions

Unique data set labels

Guides for using SED-ML and the COMBINE/OMEX archive format with specific model languages

Recommended resources for implementing the execution of simulation experiments

Namespaces for `NewXML` elements of changes to XML-encoded models