Apache Pinot – SD Times Open Source Project of the Week


Apache Pinot is an open-source analytics platform that utilizes an OLAP database to provide low-latency insights into large amounts of data.

OLAP stands for Online Analytical Processing and is a method in which data from multiple sources can be used together, allowing companies to group data from websites, applications, internal systems, and more together for analysis.

“For example, a retailer stores data about all the products it sells, such as color, size, cost, and location. The retailer also collects customer purchase data, such as the name of the items ordered and total sales value, in a different system. OLAP combines the datasets to answer questions such as which color products are more popular or how product placement impacts sales,” AWS wrote in a post explaining OLAP.

Key features of Apache Pinot include low-latency queries, the ability to handle hundreds of thousands of concurrent queries per second, batch and streaming ingestion, versatile joins, rich indexing options, and more.

It was first created at LinkedIn in 2013 because the company wanted to provide its users interactive analytics, but with the amount of data LinkedIn had already amassed at that time, it was struggling to find something that could scale at the level it needed.

“Pinot was born as an answer to our problems, a web-scale real-time analytics engine designed and built at LinkedIn. Pinot enables us to slice, dice and scan through massively large quantities of data in real-time across a wide variety of products,” said Praveen Neppalli Naga, engineering manager at LinkedIn at the time, wrote in a blog post when the project was first announced.

It powers 25 of LinkedIn’s user-facing features such as Who Viewed My Profile, Company Follow Analytics, Jobs Analytics, and more, as well as over 30 of the company’s internal tools, such as its A/B testing platform.

In 2018, Apache Pinot joined the Apache Software Foundation as an incubator project and became a top-level project in 2021.

Since its creation it has been adopted by a number of major companies, including Robinhood, Slack, Stripe, Target, Uber, and Walmart.

The most recent release is 1.1, which came out in March, adding features such as vector index support and multi-stage query engine improvements.

Looking forward, some of the things the project maintainers are working on in 2024 include making V2 on-by-default, enabling column null storing by default, full PostgreSQL compliance, pagination, and continuing ease-of-use updates such as improved documentation, more user friendly error messages, and more.



Source link