Common understanding and "advertisement" of benefits of workflows

I would like to spark a discussion on benefits of workflows and what the Platform and the Workflow group aspire to provide to the „end users“. In the Workflows jour fixe on 26th of November we shortly discussed whether it’s reasonable to implement a rather specific simulation as a workflow. The mentioned problem was that it’s not reasonable for an individual researcher to integrate one single, very specific simulation configuration into any workflow framework.

The ideal solution would be to identify a more general set of configurations, that can be implemented as a workflow / black box with some modifiable parameters. However, the problem continues in that it is in several circumstances not possible to deduce this general set of configurations and the variable parameters while actively researching a topic. You simply don’t know in the beginning that it is and only is parameters x and y you’re going to change to do all of your upcoming simulations. The meeting minutes for example state that „The benefit of the workflow environment is questioned for FEniCS workflows“. Please correct me, if I understood this specific problem wrong during the jour fixe.

I don’t think that this are issues, which are specific to the project which raised this concern, and FEniCS or some class of tools, that we would like to incorporate into workflows. I rather think this is one specific case of more general issues regarding the questions: What is a workflow or rather what do we (collectively) understand as a workflow? Why should anyone use any kind of workflow tool - what are the benefits? What do we (the platform, the Workflows working group, any one of the projects) specifically want to provide to the end user? Who even is the end user? Considering that inside the projects the group of end users and the group of people, who develop and provide the knowledge and implementations of workflows (or at least whom we want to do that), overlap or are the same, it is especially the question which benefits it is for them not to just run their simulations / experiments / calculations but also implement their „workflow“ into some framework?

In my experience it has been difficult to get a common understanding of what workflow really means with non-IT-affine colleagues. Let aside getting people to see what they could provide to „the workflows“ and how they could benefit from it. But I think it’s especially important to get buy-in from the colleagues, who use their tools on a daily basis, to commit to the idea that it’s helpful for others and for them to integrate their tools into workflows.

4 „Gefällt mir“

Thanks for posting, Simon! Perhaps @hickel @muh-hassani @celso.rego @schaarj would like to comment? There are a lot of points, which have been addressed. On our webpage we also tried to clarify what we mean when we talk about workflows, but this seems to be not sufficient.

2 „Gefällt mir“

Hello @sbekemeier,
thanks for initializing this discussion. These are important questions, which might be beneficial to everyone in the platform. I tried to comment on your questions below, but I have to mention at first that my answers might be mostly relevant to pyiron.

  • At the very start, a distinction should be made between a developer and an end user. A developer integrates a tool, but the end user only uses the integrated tool via the unified language of pyiron. The importance of an integrated tool into pyiron shows itself exactly here, as the end user does not need to be aware of the underlying code, and only needs to know pyiron syntaxes to use it. At the same time, power users are also not hindered, as they can create their own pieces of workflows by inheriting the wrapper classes from pyiron.

  • Additionally, I do not see a workflow as a rigid black boxes, that only accepts x and y. I consider a workflow divided to several steps, where each requires its own computational tool and accepts a set of parameter in its original form. We (the developers) design for each of these tool an API (the wrapper or the job class), which retains the mostly needed inputs while providing room for flexibility. The end-user uses the API to feed the x, y, z, etc., but whenever the x, y, z are not sufficient he/she (or the developer upon request) can simply modify the API in regard to the need, or inherit it into a new class. But the changes are normally minor.
    The flexibility of an API can be well highlighted by an example of a FEniCS workflow, where in the third cell, the user introduced his own function to create a customized domain. I do not see this as a failure for the API to provide this function for the end-user, but I remark this as an advantage of our API where the user can feed his own custom function to it. In some other cases, the user might need a very specific third party tool to create the domain or mesh, the API should be flexible enough to accept such inputs. This is my point of view, maybe @junger has another opinion on this.

1 „Gefällt mir“

Writing a wrapper for a tool is an iterative process, and having an opensource tool like pyiron allows the developers and super users to modify them to fit their own need, while the normal end user does not to worry much about the code behind their workflow.
Pyiron, on top of python, provides its own language to give the users a degree of freedom to conform their pieces of workflow according to their need.
Additionally, a greater benefit of pyiron is the interoperability. At least on the atomistic scale, we can claim that pyiron provides a generic unified interface between various simulation tools such as LAMMPS, VASP, SPHINX-dft, GPAW, phonopy, … On the continuum scale, we are trying to take advantage of precice library for coupling different tools and providing such a generic interface.

  • At the end, I have to remind you that there are multiple level of integration of a workflow into pyiron:
    • You can integrate a tool as a workflow/job class into pyiron as discussed above.
    • or you can use it to run your script. There are certainly a downside to this, but one can use othe advantages of pyiron like sharing the projects and its data, ease of remote submission of jobs, high throughput simulations and data mining.

I hope my comments could answer some of your questions. @hickel, @schaarj, @celso.rego can also add their view or correct me.

1 „Gefällt mir“

I fully agree to the answer provided by Muhammad. Let me add an additional aspect. For me a workflow system is also important to ensure the reproducability of certain (published) data. Together with a plot in a publication, you would like to deliver the way this plot has been produced from some raw data. This means also „one single, very specific simulation configuration“ can have a value in a workflow framework. The advantage of pyiron is that the next user can easily adopt this solution to the needs of a different application.

2 „Gefällt mir“

Maybe a little off-topic here, but the precice library seems quite interesting, what is the current status for the implementation in pyiron? Who is working on that?

1 „Gefällt mir“

It is off the topic, but to address this here as well, the interface between pyiron and precice library is being developed currently, but some examples can be found here.

1 „Gefällt mir“

As a daily user of workflow in pyiron, I try to answer your questions from my perspective. There must be some limits in these answers. Please don’t hesitate to add on or comment.

What is a workflow?

My understanding is that a workflow is a work process which handle your daily research activities. For example, if you want to obtain some physical properties, magnetic properties or thermodynamic properties, of a material from ab initio, one needs to firstly decide the computing code, VASP or WIEN2K, and so on. Within this code, one needs to set up the structure, provide necessary parameters within certain formats, set up running folders, run the calculations, check job status, extract data, process the data into the required physical properties, and plot the data in a publishing-required quality. You may need to repeat this procedure for different materials. In this case, a well-designed workflow can save lots of manual effort and time. It can avoid possible manual mistakes, and make it easy to track back and check the work processes. I myself work on method development. My experience is that once the basic framework is implemented, it really makes the further method development and extension easier and rapid.

Why should anyone use any kind of workflow tool - what are the benefits? What do we (the platform, the Workflows working group, any one of the projects) specifically want to provide to the end user?

Of course, one can program a few scripts to do the work process with the language you are familiar, e.g., fortran, C, python or even bash scripts. But the problem is that how to make the work process or knowledge “sustainable”, i.e., anyone else who is interested in the same work can easily manage to use/repeat the work process without investing the same effort to develop the same thing again. In this case, using a common language, i.e., an unified workflow tool, in the material science community would be a solution. This is the great goal of this platform from my understanding. In return, a well developed workflow can help gain more visibility of our scientific work.

Who even is the end user?

The end user can be any material scientist who is interested in the same or similar topic. For example, we have developed an automated workflow for melting temperature calculation using empirical potentials. This tool was originally only applied for elemental crystal with fcc, bcc, hcp structure. After the work was published, we have received emails from different material scientists. Regarding their requests we have improved the tool also for Si with more complex diamond structure.

2 „Gefällt mir“

Here the link to the PreCICE library in case someone is searching for more information.

1 „Gefällt mir“

Thanks @muh-hassani @hickel @lzhu for your input to this discussion! I think this is an important discussion because I was not looking for answers for myself but for the colleagues, who might not already share this „common“ understanding of workflows and their benefits. Probably, my answers would have been very similar to yours and I agree to most of it.

However, I can’t fully agree to draw a distinction between developers and end users. Sure, in some cases that is valid and an important distinction to consider when developing the platform and workflow frameworks. Nevertheless, I think in other cases there is not that much of a clear distinction between devs and end users. I would assume that’s especially the case in the platform’s and frameworks’ early phase, e.g. the one we are in right now.
Especially in this early phase I consider it important to get the end users / domain experts on board in development as well and get more people to be both, domain „experts“ and, at least a bit, developer at the same time, because: First, domain knowledge is needed to integrate anything into e.g. pyrion. „Pure“ developers probably can’t integrate a new simulation code into pyiron, because they don’t have enough background. Second, early on the end users are first of all the researches in the current projects. They probably need to produce results and maybe care more about getting things done: that’s their research rather than producing workflow tools. So, they are probably more of an end user first of all. Third, there are probably not enough „developers“ to accompany every researcher and make a workflow in e.g. pyiron from the researchers’ work. That is why I think it could be important to get more of the current end users / domain experts / researchers at least a bit on the developing side.
But that requires to persuade them that working with „our“ workflow tools is not just extra overhead for them but actually provides benefits directly to them and is an investment for themselves. Otherwise people tend to getting their current work done quickly and that probably happens using the tools they already know and need, not with a new language (e.g. Python), concept (e.g. PMD’s workflows) or software (e.g. Simstack, pyiron) they have to learn first.

I fully agree on @lzhu 's answer to the question why anyone should use workflow tools! But, in my experience reality shows that a lot of people do not care to make their work reproducible and easy to understand for the ones that come later, but just make it good enough for them and as a means for their goal. Making good, reproducible workflows for others is seldomly the goal on personal agendas. (I am sure that’s not true in this group, otherwise we wouldn’t have this discussion, but this is not about us anyways.) But I think if we can provide an understandible, tangible, easy to see personal benefit to those people, that would be a big gain for our goal of reproducible and interoperable workflows. Because they would join in on creating them.
How we achieve that is the discussion I wanted to foster with the last and most important question of my original post.

1 „Gefällt mir“

I agree. We probably don’t have „end users“ but we will often be a „user“ of a particular workflow. Typically, we modify the workflow for our needs, in other words, we become „developers“. Once I realize I’ve benefited from someone else’s workflow, I might be willing to share my own development as well. So we add “reusability” to “reproducibility”. And that might even be the bigger incentive.

2 „Gefällt mir“

For the kupferdigital I am using nextflow for rdf-transformation workflows.
Nextflow is very general general workflow engine, it is filebased and manages the flow of files between processes, while each process is shell command. The processes can be containerized and be run in diverse cloud environments.

That means every tool or workflow that is cli based and can be run in a linux container can be integrated quite easily

I have build ontoflow with it.

Another good example is nf-core. Nf-core build on nextflow and provides reproducible pipelines with can be collaboratively maintained.

1 „Gefällt mir“

We do not have somebody from KupferDigital joining our weekly exchange meetings yet (topic workflows ). Would you be interested? @dziwis

When I have finished the prototype and it’s documentation I could present the approach there. How can I then join the meeting?

1 „Gefällt mir“

You can joint the meetings also beforehand. Just send me an email and put your project coordinator in cc @Meisenbart . We do not have a representative from KupferDigital there yet.

After the last meeting in the workflow group trying to get a standard workflow exchange format, there is one question from my side to the pyiron developers/users. From my perspective, a workflow is a connection of different processes that have inputs and outputs building a DAG (direct acyclic graph). So the only input to a process in a workflow is the input into that process (in simstack a process would be a WaNo). As far as I understand the last discussion, this would not be the case for pyiron, where a module is essentially a class that stores internal data, such that the execution of a method of that class changes member variables. Calling a method (which would in the above setting be a process) is based on the inputs to the method as well as the internal variables. As a consequence, this calling of a method might not be compatible with a process or WaNo.

1 „Gefällt mir“