Introduction to data pipelines
Data pipelines consist of consecutive steps (more exactly directed ascyclic graphs or DAGs) that all serves a specific purpose. For instance, an energy forecasting a pipeline could consist of a data loading step, a data preprocessing step, a prediction step and a data saving step. There are several reasons why it makes sense to split your code into a pipeline consisting of several steps:- The code becomes more structured and readable
- Get a better code overview through pipeline visualization
- It becomes easier to localize errors and debug
- The code becomes more modular and resuable
Defining data pipelines
Rebase Pipelines enables users to define executable pipelines only using a couple of decorators. A pipeline combines several steps, representing individual tasks, to create workflows. Here is a example of a simple pipeline that consists of two steps chained together:The
@step and @pipeline decorators are used to turn a regular Python function
into a step and pipeline respectively.Running data pipelines locally
A pipeline can be executed locally through the command line through:Running data pipelines remotely
To run a pipeline remotely you simply replacepython with rb run in the command line: