I started learning Python not because I wanted to become a software developer, but because I kept hitting walls in Excel. I work in data analytics and business intelligence, and my career has cut across fintech, payments platforms, KYC/AML compliance, and securities brokerage. In every one of those roles, there came a point where a spreadsheet could not do what I needed β whether that was cleaning 50,000 rows of messy transaction records, automating a daily reconciliation report, or pulling data from an API that only spoke JSON.
Python was the tool that removed those walls. This article is written for people in a similar position: you work with data, you are comfortable with spreadsheets, and you are wondering whether Python is worth the investment. It is.
What Python Actually Is
Python is a general-purpose programming language created by Guido van Rossum and first released in 1991. "General-purpose" means it was not designed exclusively for data work β people build web applications, automation scripts, machine learning models, and even games with it. But its design philosophy is what makes it stand out: readability and simplicity over cleverness.
Python code reads close to plain English. Where other languages require you to declare variable types, manage memory, or write dozens of lines of boilerplate before doing anything useful, Python lets you get to the point fast. This matters enormously for data work, where the goal is answering a question, not building production software.
Here is a concrete comparison. To read a CSV file and calculate an average in Python, the code looks like this:
import pandas as pd
data = pd.read_csv("transactions.csv")
average_amount = data["amount"].mean()
print(average_amount)
Three lines. No configuration files, no class declarations, no compilation step. You write it, you run it, you get a number. That directness is why Python dominates the data analytics space today.
Why Python Took Over Data Analytics
Python was not always the default choice. R held that position in academic statistics for years, and SAS dominated in corporate environments, particularly in banking and pharma. What shifted the balance was a combination of factors that compounded over the 2010s.
The library ecosystem matured. Pandas, NumPy, Matplotlib, and Scikit-learn went from experimental projects to production-grade tools used by companies like Netflix, Spotify, and JP Morgan. These libraries gave Python capabilities that previously required expensive licensed software.
The community grew around data problems. Stack Overflow, GitHub, and platforms like Kaggle created a feedback loop: more people used Python for data work, more questions got answered, more tutorials got written, more beginners chose Python because help was easy to find.
Industry adopted it. When companies like Google, Facebook, and Amazon built their data infrastructure around Python, the job market followed. Today, virtually every data analyst or data scientist job listing mentions Python as either required or preferred. This is not a trend β it has been the case for over a decade now.
It bridges analysis and engineering. A SQL query can pull data. Excel can summarise it. But if you need to automate a pipeline that pulls data every morning, cleans it, runs calculations, and emails a report β that is a programming task. Python handles the entire chain without switching tools.
Python Libraries That Matter for Data Analytics
A library in Python is a collection of pre-written code that handles a specific type of task. You do not build everything from scratch β you import a library and use its functions. These are the ones I use regularly, and the ones any data analyst will encounter early.
Pandas
Pandas is the workhorse. It provides the DataFrame β a table structure similar to a spreadsheet or SQL table β and an enormous set of functions for filtering, grouping, merging, reshaping, and summarising data. If you work with tabular data (and in analytics, you almost always do), Pandas is the first library you learn and the last one you stop using.
Real example: I have used Pandas to merge customer transaction records from separate payment systems, where the data came in different formats with different column names and date conventions. Pandas handled the column renaming, date parsing, and the merge on a common identifier in about 20 lines of code. In Excel, the same task involved VLOOKUP chains across three workbooks and broke every time the source files changed format.
NumPy
NumPy handles numerical computation. It operates on arrays β ordered collections of numbers β and performs mathematical operations on them much faster than plain Python loops. Pandas is built on top of NumPy internally, so when you calculate a column average or a standard deviation in Pandas, NumPy is doing the actual maths underneath.
For most analytics work, you use NumPy indirectly through Pandas. It becomes essential when you are doing heavier numerical work β financial modelling, statistical simulations, or matrix operations.
Matplotlib and Seaborn
Matplotlib is the foundational plotting library. It creates line charts, bar charts, scatter plots, histograms, and every other standard chart type. It is powerful but verbose β producing a polished chart can take 15-20 lines of configuration code.
Seaborn sits on top of Matplotlib and provides higher-level functions that produce better-looking statistical charts with less code. If Matplotlib is the engine, Seaborn is the dashboard.
I use Matplotlib for custom charts where I need precise control over every axis label and annotation. I use Seaborn for quick exploratory charts when I am trying to understand a dataset before doing detailed analysis.
Openpyxl
Openpyxl reads and writes Excel files (.xlsx). This matters because in most organisations, the people who consume your analysis do not use Python. They use Excel. Openpyxl lets you generate formatted Excel reports programmatically β create sheets, set column widths, apply number formats, even add charts β and deliver output in the format your stakeholders actually open.
Requests
The Requests library handles HTTP communication, which in practice means pulling data from web APIs. Payment processors, financial data providers, government databases, and social media platforms all expose data through APIs. Requests lets you call those endpoints, receive the response (usually JSON), and feed it into Pandas for analysis.
In fintech and payments work specifically, I have used Requests to pull transaction status data from payment gateway APIs for reconciliation β comparing what the gateway reports against what our internal system recorded.
How Python Handles the Data Analytics Workflow
Data analytics is not a single task. It is a sequence: acquire data, clean it, explore it, analyse it, and communicate the findings. Python participates in every stage.
Data Acquisition
Data arrives from databases, CSV exports, Excel files, APIs, and web scraping. Python connects to all of these. The sqlite3 module talks to SQLite databases. The psycopg2 library connects to PostgreSQL. Pandas reads CSV, Excel, JSON, and Parquet files directly. Requests pulls from APIs. For web scraping, BeautifulSoup and Scrapy extract structured data from HTML pages.
Data Cleaning
This is where analysts spend the majority of their time, and where Python saves the most effort compared to manual spreadsheet work. Cleaning tasks include handling missing values, correcting data types (a column that should be numeric but was read as text), removing duplicates, standardising formats (dates in DD/MM/YYYY vs MM/DD/YYYY vs YYYY-MM-DD), and trimming whitespace.
Pandas provides direct functions for all of these:
# Drop rows where 'amount' is missing
data = data.dropna(subset=["amount"])
# Convert a text column to numeric
data["amount"] = pd.to_numeric(data["amount"], errors="coerce")
# Standardise date format
data["date"] = pd.to_datetime(data["date"], dayfirst=True)
# Remove duplicate rows
data = data.drop_duplicates()
In compliance and KYC work, data cleaning is not optional β it is regulatory. You cannot run a sanctions screening check on a name field full of trailing spaces, inconsistent capitalisation, and encoding artefacts. Python standardises that data before it reaches the screening system.
Analysis
Once data is clean, Pandas handles grouping, aggregation, and statistical summaries. SQL-like operations β GROUP BY, JOIN, WHERE β all have direct Pandas equivalents.
# Total transaction amount by region
regional_totals = data.groupby("region")["amount"].sum()
# Monthly trend
data["month"] = data["date"].dt.to_period("M")
monthly_trend = data.groupby("month")["amount"].sum()
Visualisation
Analysis without communication is incomplete. Matplotlib and Seaborn turn DataFrames into charts that tell a story:
import matplotlib.pyplot as plt
monthly_trend.plot(kind="bar", title="Monthly Transaction Volume")
plt.ylabel("Total Amount (KES)")
plt.tight_layout()
plt.savefig("monthly_trend.png")
For interactive dashboards, Python integrates with tools like Power BI (through scripted visuals or data preparation) and Looker Studio (through exported clean datasets).
IDEs β Where You Actually Write Python
An IDE, or Integrated Development Environment, is the application where you write, run, and debug your code. Choosing the right one matters because you will spend hours in it.
VS Code (Visual Studio Code)
VS Code is a free, open-source code editor made by Microsoft. It is not built exclusively for Python β it supports virtually every programming language through extensions. The Python extension for VS Code gives you syntax highlighting, error detection, code completion, and an integrated terminal to run scripts. It also has a built-in Git interface, which matters once you start version-controlling your work.
I use VS Code for writing Python scripts, building Flask applications, and managing project files. It is lightweight, fast, and does not force a specific workflow on you.
Jupyter Notebook
Jupyter Notebook is a browser-based environment that lets you write and run Python code in cells, with the output displayed directly below each cell. This is ideal for exploratory data analysis because you can run a block of code, see the result (a table, a chart, a summary statistic), and then decide what to do next.
Jupyter is not great for building applications or scripts that need to run automatically. But for sitting down with a new dataset and figuring out what it contains, nothing beats it. Most data analytics courses use Jupyter Notebooks as the primary teaching environment.
PyCharm
PyCharm is a full-featured Python IDE made by JetBrains. The Community Edition is free. It provides more advanced features than VS Code out of the box β refactoring tools, database integration, scientific mode for data work, and a built-in profiler. The tradeoff is that it is heavier and takes longer to load.
PyCharm suits people who work in Python full-time and want deep tooling support. For analysts who split their time between Python, SQL, and BI tools, VS Code is usually the more practical choice.
DBeaver
DBeaver is not a Python IDE β it is a database management tool. I include it here because in data analytics, you rarely work in Python alone. You pull data from databases using SQL, then move to Python for cleaning and analysis. DBeaver connects to PostgreSQL, MySQL, SQLite, and dozens of other database engines, and gives you a visual interface to write queries, browse tables, and export results. It pairs with Python rather than replacing it.
Google Colab
Google Colab is essentially a Jupyter Notebook that runs in Google's cloud. You do not install anything on your machine. You open a browser, write Python code, and Google provides the computing resources. The free tier gives you access to GPUs, which matters for machine learning work but is overkill for standard analytics. Colab is useful for learning and for sharing notebooks with colleagues who do not have Python installed locally.
Real-World Applications
Python in data analytics is not theoretical. Here are contexts where it does actual operational work.
Financial reconciliation. Payment platforms process thousands of transactions daily. Discrepancies between what a customer paid, what the payment gateway recorded, and what settled in the bank account need to be identified and resolved. Python scripts pull data from each source, match records on transaction IDs, and flag mismatches β turning a day-long manual process into a 10-minute automated run.
Compliance reporting. Regulatory bodies require periodic reports on transaction volumes, suspicious activity patterns, and customer risk profiles. Python automates the data extraction, applies the classification rules, and generates the report in the required format. This is not a convenience β late or inaccurate compliance reports carry real penalties.
Agricultural operations. On a tea farm, daily plucking records need to be tracked, labour costs calculated, and production trends analysed. Python reads receipt data (even via OCR on handwritten slips), stores it in a database, and feeds it into dashboards. The same pattern applies to any agricultural operation that still runs on paper records.
Customer segmentation. E-commerce and financial service providers use Python to group customers by behaviour β transaction frequency, average spend, product preferences. Pandas and Scikit-learn handle the grouping logic, and the output informs marketing strategy and product design.
Why Beginners Should Start With Python
If you are starting out in data analytics, Python gives you more return on your learning investment than any other single tool. Here is why:
It is free and open source β no licence fees, no subscription, no approval from your IT department.
It has the largest support community of any programming language. Whatever error you encounter, someone has asked about it on Stack Overflow and received a detailed answer.
It transfers across industries. The Python skills you build doing financial analysis apply directly to healthcare analytics, logistics, agriculture, or any other domain. The syntax does not change because the industry changed.
It grows with you. Start with Pandas for basic analysis. Add Matplotlib for charts. Move to Scikit-learn for machine learning. Pick up PySpark for big data. The language is the same at every stage β you are just adding libraries.
And critically, it connects to everything. Databases, APIs, Excel files, cloud services, BI tools, web applications β Python has a library for each one. You never outgrow it because whatever new data source or output format your job throws at you, Python already has a way to handle it.
The barrier to starting is lower than it looks. Install Python, open VS Code, type import pandas as pd, and load your first CSV. The gap between that first line of code and a working automated report is shorter than most people expect.
United States
NORTH AMERICA
Related News
What Does "Building in Public" Actually Mean in 2026?
20h ago
The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done
20h ago
Why Iβm Still Learning to Code Even With AI
22h ago
I gave Claude a persistent memory for $0/month using Cloudflare
1d ago
NYT: 'Meta's Embrace of AI Is Making Its Employees Miserable'
1d ago