Fetching latest headlines…
DBMerge: A database-agnostic Python UPSERT module to simplify ETL pipelines
NORTH AMERICA
🇺🇸 United StatesMarch 22, 2026

DBMerge: A database-agnostic Python UPSERT module to simplify ETL pipelines

0 views0 likes0 comments
Originally published byDev.to

Hi everyone, I’d like to share an open-source library I’ve been developing for the Python community: DBMerge. Consider it like an advanced version of pd.to_sql.

It is designed to simplify common but tedious task of syncing data to a SQL database.
Instead of writing custom MERGE, UPSERT or ON CONFLICT queries for different SQL dialects, DBMerge performs INSERT, UPDATE, and DELETE operations automatically in a single step.

Because it is built on top of SQLAlchemy Core, it is fully database-agnostic. It has currently been thoroughly tested with PostgreSQL, MySQL/MariaDB, SQLite and MS SQL Server.

How It Works Under the Hood

The underlying logic focuses on performance and reliability:

  1. Staging: The module first creates a temporary staging table in the database and loads your entire incoming dataset into it using a fast bulk INSERT.
  2. Reconciliation: It then executes optimized UPDATE, INSERT, and DELETE statements against the target table by comparing the target table with the temporary one. (In UPDATE operation it updates only rows that changed their values.)

Of course, real-world scenarios require flexibility. For example, DBMerge supports applying specific logical conditions during the delete phase—allowing you to perform partial data loads (like syncing just a single month's worth of data without accidentally deleting the rest of the table).

Supported Data Sources

DBMerge natively accepts three types of input formats:

  • Pandas DataFrames: Useful when you load data (e.g., from CSVs), perform your transformations or cleaning, and want to efficiently push the final result to a database.
  • Lists of Dictionaries: Useful when you prefer not to add pandas as a dependency, or when you are dealing with special data types like UUIDs or JSONB objects.
  • Existing Tables or Views: Useful when you have a computationally "heavy" database view and want to periodically materialize its results into a target table for faster querying. Think of it as a alternative to PostgreSQL’s materialized views, but with built-in support for partial updates.

Installation

pip install dbmerge

Quick Start Example

The library uses a context manager to safely handle database connection and drop temporary table in any case.

from sqlalchemy import create_engine
from datetime import date
from dbmerge import dbmerge

# 1. Initialize DB engine
engine = create_engine("sqlite://")

# 2. Prepare your source data
data = [
    {'Shop': '123', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 2, 'Price': 50.10},
    {'Shop': '124', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 1, 'Price': 100.50}
]

# 3. Execute the merge operation
# The table will be created automatically if it doesn't exist.
with dbmerge(engine=engine, data=data, table_name="Facts", 
             key=['Shop', 'Product', 'Date']) as merge:
    merge.exec()

(By default it auto-creates missing target table or its columns. Also it has parameters to add audit timestamps, or configure advanced deletion rules).

Performance Benchmark

Here is a rough performance comparison for synchronizing data of different sizes using DBMerge (measured on a standard developer laptop):

Database DBMerge (100k rows) DBMerge (1mil rows)
PostgreSQL ~2.0s ~19.8s
MySQL / MariaDB ~1.0s ~11.1s
SQLite ~0.7s ~7.6s
MS SQL Server* ~22.4s ~4m 23s

* Note: MS SQL Server bulk operations take longer due to inherent limitations in the pyodbc driver.

I included in this library all important features that I needed in over 10 years of various ETL experience, trying to make it as simple as it can be.
If you regularly work with data engineering or ETL tasks, I'd love to hear your feedback!
On the GitHub page you will also find database-specific technical details, examples, and detailed documentation of advanced parameters.

GitHub logo pavel-v-sobolev / dbmerge

A database-agnostic Python library for simple and fast UPSERT (Insert/Update/Delete) operations via SQLAlchemy.

DBMerge

PyPI version Python versions

PostgreSQL MariaDB SQLite MS SQL

DBMerge is a Python library that provides a simplified interface for performing UPSERT (Insert/Update/Delete) operations. Built on top of SQLAlchemy, it abstracts away the complexities of writing engine-specific SQL MERGE or ON CONFLICT statements.

Overview

DBMerge accepts multiple data sources as input:

  • Pandas DataFrames
  • Lists of dictionaries
  • Other database tables or views

DBMerge automates data update process by comparing your source data against the target table and automatically performing the required operations.

  • Insert new records that do not exist in the target table.
  • Update existing records only if their values have changed.
  • Delete (or mark) existing records in the target table that are no longer present in the source data.

To ensure optimal performance, the library loads your data into a temporary table first, and then executes bulk synchronization queries.

Supported Databases

Tested and verified with:

  • PostgreSQL
  • MariaDB / MySQL
  • SQLite
  • MS SQL Server

Installation

pip install dbmerge

Quick

Comments (0)

Sign in to join the discussion

Be the first to comment!