Apache Airflow is an open-source project for scheduling and managing workflows, written in Python.
Kaxil Naik, director of Airflow engineering at Astronomer and one of the core committers of Airflow, told SD Times: “It is used to automate your daily jobs or daily tasks, and tasks can be as simple as running a Python script or it can be as complicated as bringing in all the data from 500 different data warehouses and manipulating it.”
It was created at Airbnb in 2014 and is about to celebrate its 10 year anniversary later this year. It joined the Apache Software Foundation in March 2016 at the Incubation level and was made a top-level project in 2019.
Airflow was initially designed for just ETL use cases, but has over the years evolved to add features that make it useful for all aspects related to data engineering.
“It has continued to be the leader in this space, because we have maintained a good balance between innovation and stability. Because of this almost 10 years of Airflow in the same space, we have added so many features that allow Airflow to be very reliable and stable,” he said.
The most recent release, 2.9, came out earlier this week and added new features like the ability to combine dataset and time-based schedules, custom names for Dynamic Task Mapping, and the ability to group task logs.
The project can be found on GitHub here.