当前位置: 首页 > news >正文

Airflow - override()

 

"""
Example DAG demonstrating a workflow with nested branching. The join tasks are created with
``none_failed_min_one_success`` trigger rule such that they are skipped whenever their corresponding
branching tasks are skipped.
"""from __future__ import annotationsimport pendulumfrom airflow.providers.standard.operators.empty import EmptyOperator
from airflow.sdk import DAG, TriggerRule, taskwith DAG(dag_id="example_nested_branch_dag",start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),catchup=False,schedule="@daily",tags=["example"],
) as dag:@task.branch()def branch(task_id_to_return: str) -> str:return task_id_to_returnbranch_1 = branch.override(task_id="branch_1")(task_id_to_return="true_1")join_1 = EmptyOperator(task_id="join_1", trigger_rule=TriggerRule.NONE_FAILED_MIN_ONE_SUCCESS)true_1 = EmptyOperator(task_id="true_1")false_1 = EmptyOperator(task_id="false_1")branch_2 = branch.override(task_id="branch_2")(task_id_to_return="true_2")join_2 = EmptyOperator(task_id="join_2", trigger_rule=TriggerRule.NONE_FAILED_MIN_ONE_SUCCESS)true_2 = EmptyOperator(task_id="true_2")false_2 = EmptyOperator(task_id="false_2")false_3 = EmptyOperator(task_id="false_3")branch_1 >> true_1 >> join_1branch_1 >> false_1 >> branch_2 >> [true_2, false_2] >> join_2 >> false_3 >> join_1

 

1

 

1️⃣ What override() does

  • branch is a TaskFlow task (Python function decorated with @task.branch()).

  • .override() is a method on TaskFlow tasks that lets you override some parameters of the task without changing the original function.

  • Common things you can override:

    • task_id (unique identifier of the task in the DAG)

    • retries

    • retry_delay

    • execution_timeout

    • doc_md, etc.

Essentially, it creates a new task object with the same Python callable but updated configuration.


2️⃣ Why it's used here

branch.override(task_id='branch_1')
  • The original function is branch(task_id_to_return: str).

  • We want to instantiate it as a DAG task with a specific task_id called 'branch_1'.

  • Without override(), the task would take its default function name as the task_id (here, 'branch'), which may conflict if you create multiple instances.


3️⃣ Putting it together

branch_1 = branch.override(task_id='branch_1')(task_id_to_return='true_1')

Step by step:

  1. branch.override(task_id='branch_1') → creates a TaskFlow task object with task_id='branch_1'.

  2. (task_id_to_return='true_1')calls the task with the argument 'true_1' and returns a TaskInstance that Airflow can schedule.

So branch_1 is a DAG task instance with a custom task_id, not just a Python function call.


Analogy

Think of override() as:

“I like this function, but for this DAG task, I want it to have a different name or different execution settings.”

  • Original function: branch

  • DAG task: branch_1 (with overridden task_id)