Back to the Presentation Page: https://raulingaverage.dev/Presentations

About Me

  • Data Analyst @ Autodesk
  • You can find me at RaulingAverage.dev
  • Enjoy Coffee, Learning, and Running..near the beach

Notes:

  • This presentation does not reflect any workings or material at Autodesk
  • I am not a core-contributor to the Prefect product, but a user

  • Social Distance, Wear Masks, Wash Hands, and consider allyship for those that need it now more than ever. #BLM

The Prefect Overview

Prefect Logo

What is prefect?

Prefect is an alternative workflow management system designed for (more recent) modern infrastructure.

Quite similar to Airflow, but different.

Prefect Site Overview

Functional API

  • Tasks can be called on another like functions to build a DAG pythonically

Note: Coexisting imperative Airflow API & new functional API

# Presentation Pseudo-Code
import aircraftlib as aclib
from prefect import task, Flow


@task
def extract_reference_data():
    print("fetching reference data...")
    return aclib.fetch_reference_data()


@task
def extract_live_data():
    # Get the live aircraft vector data around Dulles airport
    dulles_airport_position = aclib.Position(lat=38.9519444444, long=-77.4480555556)
...
...
...
with Flow("etl") as flow:
    reference_data = extract_reference_data()
    live_data = extract_live_data()

    transformed_live_data = transform(live_data, reference_data)

    load_reference_data(reference_data)
    load_live_data(transformed_live_data)

# Run Workflow
flow.run()

# Register Workflow to Dashboard
flow.register()

Source

Intuitive UI

Prefect UI Overview

  • Create DAG (Directed Acyclic Graph) Data Pipelines
  • Focus more on Coding
  • Versioning
    • Versioning automatically occurs when you deploy flow to a project with same name

Scheduling

  • Not tied to event_time
  • Run on irregular or no schedules.
  • Run multiple simultaneous runs of your workflow (concurrency)

Task Scheduling

  • Task scheduling almost instant because of Prefect utilizes Dask. This can be differentiable compared to 10second wait time for another workflow management tool

  • Sequential Task executution without explicity management

  • Tasks can directly exchange data

DAG Scheduler & Tasks

Data Exchange

Data Exchange

  • Data Pipelines
  • Creator of the XCom component & Prefect, Jeremiah Lowin, in good-faith wanted to implement distinct solutions outside the Airflow ecosystem. Source
  • Modern solution for "data" pipeline management.

Other benefits

  • Error Handling
  • Innovation: GraphQL, Dask & More
  • Data Serialization
  • Parameterization
  • Go beyond workflow-as-code challenge, but rather be first class in data, scheduling, and other parts of the workflow management process with data.
  • And More!

There is a lot more to cover. However, one can find out more through the following resources:

Thank you,

SF Python