Crypto ETL Pipeline (Python + PostgreSQL)

Overview

This project is a simple ETL (Extract, Transform, Load) pipeline that retrieves real-time cryptocurrency market data from the CoinPaprika API, processes it using Python (Pandas), and loads it into a PostgreSQL database.

The pipeline is designed for learning and demonstrating core data engineering concepts such as API ingestion, data transformation, and relational database storage.

Architecture

Extract → Transform → Load

Extract
- Data is fetched from the CoinPaprika REST API
- Cryptocurrencies include: Bitcoin, Ethereum, XRP, Solana
Transform
- JSON response is normalized using Pandas
- Unnecessary fields are removed
- Columns are renamed for clarity
- Timestamp (ingested_at) is added
Load
- Cleaned data is inserted into PostgreSQL
- Table: crypto_market_data

Tech Stack

Python 3.x
Pandas
Requests
SQLAlchemy
Psycopg2
PostgreSQL
python-dotenv

Project Structure

Crypto_etl.py/
│
├── crypto_etl.py Main ETL script
├── requirements.txt Python dependencies
├── .env Environment variables (not pushed to GitHub)
├── .gitignore Ignored files (venv, env, etc.)
└── README.md Project documentation

Setup Instructions

1. Clone repository

bash
git clone https://github.com/your-username/crypto-etl.git
cd crypto-etl

2. Create virtual environment

python3 -m venv venv
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

Create a .env file:

DB_USER=your_db_user
DB_PASSWORD=your_db_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=crypto_db

5. Run ETL pipeline

python crypto_etl.py

Database Schema

Table: crypto_market_data

Column	Description
coin_id	Unique coin identifier
coin_name	Name of cryptocurrency
coin_symbol	Symbol (BTC, ETH, etc.)
price_usd	Current price in USD
volume_24h_usd	24h trading volume
volume_24h_change	Volume change percentage
market_cap_usd	Market capitalization
ingested_at	Timestamp of ingestion

Features

Real-time crypto data ingestion
Automated transformation using Pandas
PostgreSQL data storage
Environment variable configuration
Modular and extensible ETL structure

Future Improvements

Add scheduling (cron/Airflow)
Implement upsert logic (avoid duplicates)
Dockerize the pipeline
Add logging and error handling
Build dashboard for visualization

Author

Wangeci Ndovu
Data Engineer

SIMPLE BEGINNER CRYPTO ETL PIPELINE.

Crypto ETL Pipeline (Python + PostgreSQL)

Overview

Architecture

Tech Stack

Project Structure

Setup Instructions

1. Clone repository

2. Create virtual environment

3. Install dependencies

4. Configure environment variables

5. Run ETL pipeline

Database Schema

Features

Future Improvements

Author

Comments (0)

United States

Related News

What Does "Building in Public" Actually Mean in 2026?

The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done

Why I’m Still Learning to Code Even With AI

I gave Claude a persistent memory for $0/month using Cloudflare

NYT: 'Meta's Embrace of AI Is Making Its Employees Miserable'