Time for a brand-new topic today here at The Developers’ Bakery: Data Science! We’re really excited to have on stage Marco Gorelli, core contributor of both pandas and polars, two of the most popular data science libraries in the Python ecosystem.
In this episode, we’ll talk about how pandas became so popular in the data science space. Then we’ll move on to talk about polars, a new data science library written in Rust, and how its performances compare to pandas.
Finally, we’ll have the opportunity to touch on a very interesting and unique topic: the Dataframe Consortium, a multi-company effort to standardize the data science API across the ecosystem.
Enjoy the show 👨🍳
Show Notes
- 00.14 Intro
- 01.00 Episode Start
- 01.30 Marco’s Introduction
- 02.14 What is pandas?
- 03.27 Why do I need pandas?
- 05.19 pandas’ competitors
- 07.24 pandas’ popularity
- 10.12 What’s your role with pandas?
- 12.39 How to become a pandas maintainer?
- 13.50 From data scientist to open source maintainer
- 16.02 What is polars?
- 21.22 Can pandas and polars co-exist?
- 24.25 Performance benchmarks
- 26.21 The learning curve
- 29.11 Naming anecdotes
- 30.51 The Dataframe Consortium?
- 40.12 Marco’s role in the consortium
- 43.40 What’s next for polars?
- 46.50 How to start contributing?
- 50.56 Further reading
- 53.33 Where people can find you online?
Resources
- pandas-dev/pandas on GitHub
- pola-rs/polars on GitHub
- Pandas Official Website
- Polars Official Website
- Dataframe API Consortium Official Website
- Mentioned Resources:
- Pandas User Guide
- Polars User Guide
- Dataframe API Standard definition
- What polars does for you - Video from EuroPython
- H2O Benchmark
- Polars - TPCH Benchmark
- @MarcoGorelli on GitHub
- Marco Gorelli on LinkedIn