Skip to main content

Transform Airflow

Introduction#

Airflow is an open source tool that lets you programmatically author, schedule, and monitor workflows. Transform introduced an Airflow operator integration that allows you to schedule Transform materializations so that your important metrics and dimensions that you've materialized are available quickly and up-to-date.

Install:#

pip install transform-airflow

Available Operators:#

See more in-depth details about the parameters in the api reference doc

  • MaterializeOperator
    • materialization_name: str
    • start_time: Optional[str]
    • end_time: Optional[str]
    • model_key_id: Optional[int]=None
    • output_table: Optional[str]=None
    • force: bool=False
    • creds: Optional[Dict[str, str]]=None

Authentication#

There are two methods to connect and authenticate your transform operator to your MQL server.

  1. Use creds in the operator (see Creating DAG section below for example)
  2. Set the respective environment variable for TRANSFORM_API_KEY and MQL_QUERY_URL

Creating DAG:#

from transform_airflow.operators import MaterializeOperator
# Init DAGmy_dag = DAG("my_dag_name")
# Associate task with DAGop = MaterializeOperator(  task_id=task_id,  dag=my_dag,  materialization_name="test",  start_time="2021-01-01",  end_time="2021-01-10",  creds={    "TRANSFORM_API_KEY": <api_key>,    "MQL_QUERY_URL": <mql_server_url>  },)
# Perform any dependency structuring on tasks

Troubleshooting#

git not available on airflow environment#

Our main library transform requires git to be installed in the specific environment. However, if git is not installable on the airflow environment then go to the transform pypi release history and find and use latest .dev version of transform instead.