Conventions for describing reproducible and reusable simulation experiments with SED-ML
Overview
Simulators should support SED-ML L1V3 or later. To accommodate a wide range of modeling frameworks and simulation algorithms, BioSimulators and BioSimulations embrace the additional conventions for SED-ML described below, as well as the conventions for executing SED-ML documents described here.
Model and data descriptor source paths
SED-ML can refer to model and data descriptor files in multiple ways, including via paths to local files, URLs, URI fragments to other models defined in the same SED-ML document, and identifiers for an Identifiers.org namespace such as BioModels. When referencing files via local paths, SED-ML documents should use paths relative to the location of the SED-ML document.
To ensure that COMBINE/OMEX archives are self-contained, we encourage SED-ML documents in COMBINE/OMEX archives to reference files via relative paths within archives or other models within the same SED-document.
Concrete XPath targets for changes to XML-encoded models
SED-ML enables investigators to use XPaths to specify changes to models that are encoded in XML files. This encompasses models described using CellML, SBML, and other languages. SED-ML documents should use valid XPaths that resolve to XML elements. For example, /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A']/@initialConcentration
could be used to indicate a change to the initial condition of the species with id A
.
In addition, the namespace prefixes used in XPaths should be defined within the SED-ML document as illustrated below.
<sedML xmlns:sbml="http://www.sbml.org/sbml/level3/version1/core">
<listOfDataGenerators>
<dataGenerator>
<listOfVariables>
<variable target="/sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A']/@initialConcentration" />
</listOfVariables>
</dataGenerator>
</listOfDataGenerators>
</sedML>
Note
The SED-ML L1V3 and earlier specifications suggest that incomplete XPaths such as /sbml:sbml/sbml:model/sbml:listOfSpecies/sbml:species[@id='A']
should be used to indicate changes to model elements. We discourage this convention of partial XPaths because these XPaths do not point to the attribute that is intended to be changed. We encourage investigators to use complete XPaths.
Note
SED-ML L1V4 and later documents can use target
and symbol
together to reference implicit attributes of model elements, such as fluxes of reactions of flux-balance models.
Namespaces for NewXML
elements of changes to XML-encoded models
SED-ML documents can use sedml:newXML
elements of sedml:addXML
and sedml:changeXML
elements to specify objects that should be added to models or replaced in models. SED-ML documents should define the namespace(s) of the content of these NewXML
elements. For example, a parameter that should be added to a SBML model could be described as <sbml:parameter xmlns:sbml="http://www.sbml.org/sbml/level3/version1" id="NewParameter" value="10.0" />
.
Note
The SED-ML specifications suggest that namespaces don't need to be defined for the content of NewXML
elements. We discourage this convention because XML files which embrace this convention are not consistent with SED-ML's XML schema. We encourage investigators to explicitly define the namespaces involved in the content of NewXML
elements.
Data types for model attribute changes and algorithm parameters
SED-ML specifies that the new values of model attribute changes (sedml:changeAttribute/@sedml:newValue
) and values of algorithm parameters (sedml:algorithmParameter/@sedml:value
) must be encoded into strings. To ensure that SED-ML files are portable across simulation tools, we define several data types for model attribute changes and algorithm parameters and outlines how each data type should be encoded into strings. The data type of each algorithm parameter should be defined in the specification of each simulation tool.
boolean
: Represents Boolean values. Should be encoded into strings astrue
/false
or0
/1
.integer
: Represents integers. Should be encoded in decimal notation (e.g.,1234
).float
: Represents floating point numbers. Should be encoded in decimal (e.g.,1234.567
) or scientific (e.g.,1.234567e3
) notation.string
: Represents strings. Requires no additional encoding.kisaoId
: Represents a KiSAO term. Should be encoding using the id of the term (e.g.,KISAO_0000029
).list
: Represents a list of scalar values. Should be encoding using JSON (e.g.,['a', 'b', 'c']
or[1, 2, 3]
). For example, the value of the deterministic reactions partition (KISAO_0000534
) of the Pahle hybrid discrete/continuous Fehlberg method (KISAO_0000563
) should be a list of the ids of the reactions which should be simulated by the Fehlberg sub-method. Its value should be encoded into SED-ML as<algorithmParameter kisaoID="KISAO:0000534" value='["ReactionId-1", "ReactionId-1", ...]' />
.object
: Represents key-value pairs. Should be encoding using JSON (e.g.,{a: 1, b: 2}
or{a: 'x', b: 'y'}
).any
: Represents any other data type. Should be encoding using JSON (e.g.,[{a: 1, b: 2}]
).
Enumerations for the value of an algorithm parameter values can be defined in the specification of a simulator using the recommendedRange
attribute. This can be combined with any of the above data types.
Limit use of repeated tasks to the execution of independent simulation runs
In addition to capturing multiple independent simulation runs, sedml:repeatedTask/@resetModel="False"
provides limited abilities to describe sets of dependent simulation runs, where each run begins from the end state of the previous run. This provides investigators limited abilities to describe meta simulation algorithms.
Simulation tools are encouraged to support a simpler subset of the features of sedml:repeatedTask
that is sufficient to describe multiple independent simulation runs.
-
sedml:repeatedTask
: Simulation tools should supportresetModel="True"
as described in the SED-ML specifications; the model specifications and initial conditions should be reset. Simulator state such as the states of random number generators should not be reset. WhenresetModel="False"
, simulation tools should support limited preservation of the state of simulations between iterations. Simulation tools should accumulate changes to the specifications of the model(s) involved in the task. Simulations tools should not copy the final simulation state from the previous iteration to the initial state of the next iteration. -
Sub-tasks (
sedml:subTask
): Successive subtasks should be executed independently, including when they involve the same model. The final state of the previous sub-task should not be used to set up the initial state for the next sub-task. -
Shape of model variables for the results of repeated tasks: Repeated tasks should produce multi-dimensional results. The first dimension should represent the iterations of the main range of the repeated task. The second dimension should represent the sub-tasks of the repeated task. The results of sub-tasks should be ordered in the same order the sub-tasks were executed (in order of their order attributes). The result of each sub-task should be reshaped to the largest shape of its sibling sub-tasks by padding smaller results with
NaN
. Each nesting of repeated tasks should contribute two additional dimensions for their ranges and sub-tasks. The final dimensions should be the dimensions of the atomic tasks of the repeated task (e.g., time for tasks of uniform time courses).
Canonical order of execution of tasks
For reproducibility, simulation tools should execute tasks in the order in which they are defined in SED-ML files.
Furthermore, because the order of execution can affect the results of simulations, in general, each task should be executed, including tasks which do not contribute to any output. This is particularly important for simulation tools that implement Monte Carlo algorithms. One exception is tasks whose results are invariant to their order of execution, such as most deterministic simulations. Such tasks can be executed in any order or in parallel.
Limit use of symbols to variables of data generators
SED-ML uses symbols to reference implicit properties of simulations that are not explicitly defined in the specification of the model for the simulation. The most frequently used symbol for SBML-encoded models is urn:sedml:symbol:time
for the variable time. Such symbols only have defined values for simulations of models and not for models themselves.
Consequently, symbols should only be used in contexts where simulations are defined. Specifically, symbols should only be used in conjunction with variables of sedml:dataGenerator
to record predicted values of symbols. Symbols should not be used in conjunction with the variables of sedml:computeChange
, sedml:setValue
, or sedml:functionalRange
. Symbols should also not be used with sedml:setValue
to set the values of symbols.
Variable targets for model objects that generate multiple predictions
Some algorithms, such as flux balance analysis (FBA, KISAO_0000437
) and flux variability analysis (FVA, KISAO_0000526
) generate multiple predictions for each model object. For example, flux variability analysis predicts minimum and maximum fluxes for each reaction. Targets (sedml:variable/@sedml:target
) for such predictions should indicate the id of the desired prediction. To ensure portability of SED-ML files between simulation tools, we define the following ids. Please use GitHub issues to suggest additional ids for additional predictions of other algorithms.
- FBA (
KISAO_0000437
), parsimonious FBA (KISAO_0000528
), geometric FBA (KISAO_0000527
):- Objective:
fbc:objective/@fbc:value
- Reaction flux:
sbml:reaction/@fbc:flux
- Reaction reduced cost:
sbml:reaction/@fbc:reducedCost
- Species shadow price:
sbml:species/@fbc:shadowPrice
- Objective:
- FVA (KISAO_0000526):
- Minimum reaction flux:
sbml:reaction/@fbc:minFlux
- Maximum reaction flux:
sbml:reaction/@fbc:maxFlux
- Minimum reaction flux:
Unique data set labels
To facilitate automated interpretation of simulation results, the data sets within a report should have unique labels (sedml:dataSet/@sedml:label
). Note, the same label can be used across multiple reports.
Guides for using SED-ML and the COMBINE/OMEX archive format with specific model languages
Simulation tools should recognize the URNs and IRIs below to identify model languages described in SED-ML files and COMBINE/OMEX archives. The links in the "Info" column below contain more information about how simulation tools should interpret SED-ML in combination with specific model languages.
Language | EDAM id | SED-ML URN | COMBINE/OMEX archive specification URI | MIME type | Extensions | Info |
---|---|---|---|---|---|---|
BNGL | 3972 | urn:sedml:language:bngl | http://purl.org/NET/mediatypes/text/bngl+plain | text/bngl+plain | .bngl | |
CellML | 3240 | urn:sedml:language:cellml | http://identifiers.org/combine.specifications/cellml | application/cellml+xml | .xml, .cellml | |
(NeuroML)/LEMS | 9004 | urn:sedml:language:lems | http://purl.org/NET/mediatypes/application/lems+xml | application/lems+xml | .xml | |
SBML | 2585 | urn:sedml:language:sbml | http://identifiers.org/combine.specifications/sbml | application/sbml+xml | .xml, .sbml | |
Smoldyn | 9001 | urn:sedml:language:smoldyn | http://purl.org/NET/mediatypes/text/smoldyn+plain | text/smoldyn+plain | .txt |
Example SED-ML files and COMBINE/OMEX archives for all of the languages listed above are available here.
Recommended resources for implementing the execution of simulation experiments
Below are helpful tools for implementing the execution of simulation experiments described with SED-ML:
-
BioSimulators utils is a Python library which provides functions for implementing command-line interfaces to the above specifications, as well as functions for interpreting COMBINE/OMEX archives and SED-ML files, generating tables and plots of simulation plots, and logging the execution of COMBINE/OMEX archives. BioSimulators utils provides high-level access to some of the lower-level libraries listed below.
-
libSED-ML is a library for serializing and deserializing SED-ML documents to and from XML files. libSED-ML provides bindings for several languages.
-
jlibSED-ML is a Java library for serializing and deserializing SED-ML documents to and from XML files. The library also provides methods for resolving models, working with XPath targets for model elements, applying model changes, orchestrating the execution of tasks, calculating the values of data generators, and logging the execution of simulations. Note, jLibSED-ML support SED-ML <= L1V2 and diverges from some of the conventions described here.