Hi everyone, I’d like to share an open-source library I’ve been developing for the Python community: DBMerge. Consider it like an advanced version of pd.to_sql.
It is designed to simplify common but tedious task of syncing data to a SQL database.
Instead of writing custom MERGE, UPSERT or ON CONFLICT queries for different SQL dialects, DBMerge performs INSERT, UPDATE, and DELETE operations automatically in a single step.
Because it is built on top of SQLAlchemy Core, it is fully database-agnostic. It has currently been thoroughly tested with PostgreSQL, MySQL/MariaDB, SQLite and MS SQL Server.
How It Works Under the Hood
The underlying logic focuses on performance and reliability:
-
Staging: The module first creates a temporary staging table in the database and loads your entire incoming dataset into it using a fast bulk
INSERT. -
Reconciliation: It then executes optimized
UPDATE,INSERT, andDELETEstatements against the target table by comparing the target table with the temporary one. (InUPDATEoperation it updates only rows that changed their values.)
Of course, real-world scenarios require flexibility. For example, DBMerge supports applying specific logical conditions during the delete phase—allowing you to perform partial data loads (like syncing just a single month's worth of data without accidentally deleting the rest of the table).
Supported Data Sources
DBMerge natively accepts three types of input formats:
- Pandas DataFrames: Useful when you load data (e.g., from CSVs), perform your transformations or cleaning, and want to efficiently push the final result to a database.
- Lists of Dictionaries: Useful when you prefer not to add pandas as a dependency, or when you are dealing with special data types like UUIDs or JSONB objects.
- Existing Tables or Views: Useful when you have a computationally "heavy" database view and want to periodically materialize its results into a target table for faster querying. Think of it as a alternative to PostgreSQL’s materialized views, but with built-in support for partial updates.
Installation
pip install dbmerge
Quick Start Example
The library uses a context manager to safely handle database connection and drop temporary table in any case.
from sqlalchemy import create_engine
from datetime import date
from dbmerge import dbmerge
# 1. Initialize DB engine
engine = create_engine("sqlite://")
# 2. Prepare your source data
data = [
{'Shop': '123', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 2, 'Price': 50.10},
{'Shop': '124', 'Product': 'A1', 'Date': date(2025, 1, 1), 'Qty': 1, 'Price': 100.50}
]
# 3. Execute the merge operation
# The table will be created automatically if it doesn't exist.
with dbmerge(engine=engine, data=data, table_name="Facts",
key=['Shop', 'Product', 'Date']) as merge:
merge.exec()
(By default it auto-creates missing target table or its columns. Also it has parameters to add audit timestamps, or configure advanced deletion rules).
Performance Benchmark
Here is a rough performance comparison for synchronizing data of different sizes using DBMerge (measured on a standard developer laptop):
| Database | DBMerge (100k rows) | DBMerge (1mil rows) |
|---|---|---|
| PostgreSQL | ~2.0s | ~19.8s |
| MySQL / MariaDB | ~1.0s | ~11.1s |
| SQLite | ~0.7s | ~7.6s |
| MS SQL Server* | ~22.4s | ~4m 23s |
* Note: MS SQL Server bulk operations take longer due to inherent limitations in the pyodbc driver.
I included in this library all important features that I needed in over 10 years of various ETL experience, trying to make it as simple as it can be.
If you regularly work with data engineering or ETL tasks, I'd love to hear your feedback!
On the GitHub page you will also find database-specific technical details, examples, and detailed documentation of advanced parameters.
pavel-v-sobolev
/
dbmerge
A database-agnostic Python library for simple and fast UPSERT (Insert/Update/Delete) operations via SQLAlchemy.
DBMerge
DBMerge is a Python library that provides a simplified interface for performing UPSERT (Insert/Update/Delete) operations. Built on top of SQLAlchemy, it abstracts away the complexities of writing engine-specific SQL MERGE or ON CONFLICT statements.
Overview
DBMerge accepts multiple data sources as input:
- Pandas DataFrames
- Lists of dictionaries
- Other database tables or views
DBMerge automates data update process by comparing your source data against the target table and automatically performing the required operations.
- Insert new records that do not exist in the target table.
- Update existing records only if their values have changed.
- Delete (or mark) existing records in the target table that are no longer present in the source data.
To ensure optimal performance, the library loads your data into a temporary table first, and then executes bulk synchronization queries.
Supported Databases
Tested and verified with:
- PostgreSQL
- MariaDB / MySQL
- SQLite
- MS SQL Server
Installation
pip install dbmerge
Quick
…United States
NORTH AMERICA
Related News
Jeff Bezos Seeking $100 Billion to Buy Manufacturing Companies, 'Transform' Them With AI
9h ago
Firefox Announces Built-In VPN and Other New Features - and Introduces Its New Mascot
9h ago
Can Private Space Companies Replace the ISS Before 2030?
9h ago
Juicier Steaks Soon? The UK Approves Testing of Gene-Edited Cow Feed
9h ago
White House Unveils National AI Policy Framework To Limit State Power
9h ago