From Good AI to Good Data Engineering. Or how Responsible AI interplays with High Data Quality

Responsible AI depends on high-quality data engineering to ensure ethical, fair, and transparent AI systems.

A glimpse into the life of a data leader

Data leaders face pressure to balance AI hype with data landscape organization. Here’s how they stay focused, pragmatic, and strategic.

Data Stability with Python: How to Catch Even the Smallest Changes

As a data engineer, it is nearly always the safest option to run data pipelines every X minutes. This allows you to sleep well at night…

Clear signals: Enhancing communication within a data team

Demystifying Device Flow

Implementing OAuth 2.0 Device Authorization Grant with AWS Cognito and FastAPI

Short feedback cycles on AWS Lambda

A Makefile that enables to iterate quickly

Prompt Engineering for a Better SQL Code Generation With LLMs Copy

Picture yourself as a marketing executive tasked with optimising advertising strategies to target different customer segments effectively…

Age of DataFrames 2: Polars Edition

In this publication, I showcase some Polars tricks and features.

Quack, Quack, Ka-Ching: Cut Costs by Querying Snowflake from DuckDB

How to leverage Snowflake’s support for interoperable open lakehouse technology — Iceberg — to save money.

The building blocks of successful Data Teams

Based on my experience I will elaborate on key criteria for building successful data teams

Querying Hierarchical Data with Postgres

Hierarchical data is prevalent and simple to store, but querying it can be challenging. This post will guide you through the process of…

How to organize a data team to get the most value out of data

To state the obvious: a data team is there to bring value to the company. But is it this obvious? Haven’t companies too often created a ...

Becoming Clout* certified

Hot takes about my experience with cloud certifications

You can use a supercomputer to send an email but should you?

Discover the next evolution in data processing with DuckDB and Polars

Two Lifecycle Policies Every S3 Bucket Should Have

Abandoned multipart uploads and expired delete markers: what are they, and why you must care about them thanks to bad AWS defaults.

How we used GenAI to make sense of the government

We built a RAG chatbot with AWS Bedrock and GPT4 to answer questions about the Flemish government.

How we reduced our docker build times by 40%

This post describes two ways to speed up building your Docker images: caching build info remotely, using the link option when copying files

Cross-DAG Dependencies in Apache Airflow: A Comprehensive Guide

Exploring four methods to effectively manage and scale your data workflow dependencies with Apache Airflow.

Upserting Data using Spark and Iceberg

Use Spark and Iceberg’s MERGE INTO syntax to efficiently store daily, incremental snapshots of a mutable source table.

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Leave your email address to subscribe to the Dataminded newsletter

Belgium

Vismarkt 17, 3000 Leuven, Belgium


Vat. BE.0667.976.246

Germany

Friedrichstraße 68, 10117 Berlin, Germany


© 2024 Dataminded. All rights reserved.

Belgium

Vismarkt 17, 3000 Leuven, Belgium


Vat. BE.0667.976.246

Germany

Friedrichstraße 68, 10117 Berlin, Germany


© 2024 Dataminded. All rights reserved.

Belgium

Vismarkt 17, 3000 Leuven, Belgium


Vat. BE.0667.976.246

Germany

Friedrichstraße 68, 10117 Berlin, Germany


© 2024 Dataminded. All rights reserved.