To say it in different words to make sure I understand.  Is the thinking
that the workflow would execute in a linear fashion (assuming that was what
the redshift pipeline was doing) to prevent needing to keep track of the
state of the job?

On Mon, Nov 20, 2023 at 7:02 AM Bertil Chapuis <bchap...@gmail.com> wrote:

> Hello everyone,
>
> It would be great to improve the workflow engine in Baremaps (package:
> org.apache.baremaps.workflow). In the current version, a workflow is a
> directed acyclic graph (DAG) of steps. Each step can have one or more tasks
> executed sequentially or in parallel. The inputs and outputs of the tasks
> are set manually. Some of the outputs (e.g., a table created in a database)
> are not described. Furthermore, some resources (e.g., DataSources) are
> shared across the workflow with a context object, but one task must be
> aware of what another task did to benefit from shared resources. This
> approach is loosely based on GitHub Actions.
>
> A nice improvement would be to remove the notion of step, to
> systematically describe the inputs and outputs of the tasks, and to
> introduce a format in the configuration file to describe the shared
> resources accessed via the context object. This would probably make the
> configuration file of the workflow more difficult to read, but at least,
> everything would be declared in it. The DAG could be inferred from the
> inputs and outputs of the tasks. This new approach would probably be closer
> to what AWS Data Pipeline does.
>
>
> https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-redshift-define-pipeline-cli.html
>
> I’d love to gather both technical and non-technical feedbacks regarding
> this question. If you have any experiences, whether good, mixed, or bad,
> with the current approach, please do not hesitate to share them.
> Additionally, if you have experience with other workflow technologies, it
> would be valuable to hear about those as well.
>
> Thanks a lot for your help,
>
> Bertil
>
>

Reply via email to