According to the BMBF call, the goal is the establishment of digital workflows in the sense of a decentralized data or simulation concept by active agents within the software environment of the innovation platform. The sustainability of the software solutions and the uniform access to data and tools are decisive. The platform will therefore provide two workflow environments called „pyiron“ and „SimStack“ right from the start. Beneficiaries are expected to design their solutions in such a way that they can be made executable within one of these workflow environments.
A workflow is a chain of well-documented process steps to create or handle data for a specific problem in order to deliver a particular set of outputs. Within PMD we provide the environment for a digitalization of these workflows (i.e., making each individual, often still manual step of this chain accessible, interpretable, and storable by the machine). Advantages of using this workflow environment include:
- Providing engineers, data scientists, etc. a user-friendly interface to a large variety of tools
- Enabling non-experts the usage of standardized computational procedures that are based on complex connections of individual software tools
- Capture of complex individual computational workflows for documentation and distribution (e.g. for a paper, IP application, collaboration, etc.)
- Automated deposition of final results as well as of all relevant intermediate steps (e.g. in database systems, repositories, …)
- Integration and easy access to HPC resources
- Connection to community-wide semantics and knowledge graphs (ontologies) due to the description of input/output of individual tools within a workflow chain
Within the PMD we distinguish between four levels (A, B, C, D) of workflow implementation. Within a project these levels will be exploited step-by-step, with various levels of implementation effort, workflow control and user support. It is, however, possible to combine different levels of implementation for the different steps of a single workflow.
The user provides a script job for the individual task of a workflow with well defined input and output parameters for the individual steps. Input parameters can be passed into the script as a result from other computations or the outputs can be processed in a subsequent computation step. The parameters and the script are stored and documented to ensure reproducibility of the workflow step and to avoid the recomputation of previously computed results. Here, the file formats used by the script for input and output do not need to be identical with the file format used for example by pyiron for storage.
A predefined job type for a simulation tool can be created and integrated into the workflow system, e.g. either pyiron or Simstack. In pyiron, this class defines and handles the import/export as well as the storage of input/output as well as the serialization of the job attributes for communication with HPC. In Simstack, this is accomplished by the WaNo. In this way, well defined problems with a subset of parameters (compared to the full functionality of the tools) can be executed as step of a workflow. The advantage of this approach is that users who are not familiar with a specific software tool do not have to learn attributes that are not essential for the present workflow, as they are provided by a simplified, readable, standard pyiron interface or structured xml format (with a well-documented and easy-to-learn terminology).
At the same time they can easily be extended to new job types, which enable additional or redefined functionality. In this way advanced users can employ the environment to develop workflows, having the same flexibility as provided by scripts. The benefit of using the environment for this purpose is the integration of various analysis, visualization and simulation tools, which can be used for each intermediate step of the workflow as well as the automated execution of transferal and execution on remote compute ressources.
The combination and exchange of different simulation tools in metajobs, the handling of interactive jobs and the loop over a large set of cases are straight-forward at this implementation level.
Once a workflow is established, especially less involved user do not want to bother with command lines and their execution. To this end a graphical user interface can be provided that generates the output based on an often limited, predefined set of input parameters.
The usage of generic input and output parameters for predefined classes allows a description of (part of) a workflow in a notation that is generic, i.e., independent on the specific software tool. Generic parameters are also key to enable interopability between software tools. Such a standard exists in pyiron for atomistic simulations (ASE), but needs to be implemented by domain experts for other communities. For example, the VMAP standard can be used for FEM simulations.
The interplay of workflows and ontologies within the PMD has different facets:
- A workflow can be used to read data from an ontology-based data store as input, modify them according to the process chain and feed the output back into the ontology store.
- The information of a workflow developed within the environment (i.e. the exploration of dependencies that were not known before) can be automatically exported into a materials knowledge-graph.
- If the functionality of a tool (including the input and the output) is described in terms of an ontology, it can be integrated into a workflow environment without a need for tool specific parsers.
- If the description of a workflow (including the input and the output of a simulation module) is generically described (e.g. in terms of a standardized ontology), a tool independent formulation is achieved. Thus individual tools within a complex tool chain can be easily replaced.
You can also follow our activities on PMD GitHub
PMD Workflow Store online
Learn something about pyiron and SimStack! Check out the PMD YouTube channel.