Befriending an Elephant with a Snake: A Practical Guide to PL/Python
Overview
PostgreSQL has firmly established itself as one of the core technologies powering modern software systems. It’s consistently ranked as the relational database of the year by DB-Engines, and for good reason. One of PostgreSQL’s most powerful traits is its extensibility—a feature that continues to set it apart in the world of relational databases.
Among the various ways to extend PostgreSQL, procedural language (PL) functions stand out as a practical and powerful mechanism. In this article, we’ll focus on one particular extension: PL/Python, which enables writing functions and procedures in Python.
Python itself needs little introduction. It’s a dominant force across software development, data science, and the AI/ML ecosystem. When combined with PostgreSQL, Python opens up new possibilities—from complex logic and text processing to integrations and experimentation—that go far beyond what traditional SQL can handle.
The article provides a hands-on technical introduction to PL/Python. Whether you're a backend engineer curious about database scripting, or a data scientist looking to bring Python closer to your data, this guide will walk you through PL/Python’s real-world use, its strengths, and its limitations.
Setting Things Up
PL/Python is available in PostgreSQL through the
plpython3u
extension, which is included in most standard
PostgreSQL installations. You can enable it just like any other extension:
CREATE EXTENSION plpython3u;
Note: For plpython3u
to work, Python 3 must be installed and available on the host system. PostgreSQL does not bundle its own Python runtime; it links to the system-installed version. If Python is missing or incompatible, the extension may fail to install or execute correctly.
The u
at the end of plpython3u
stands for
“untrusted". This is an important detail. An untrusted procedural language
means PostgreSQL can't safely restrict what the code inside a function can
do. As a result, only superusers are allowed to create functions in untrusted
languages. PostgreSQL’s official documentation puts it clearly:
The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator.
In other words, PL/Python gives you a lot of power—but with that comes a strong responsibility. The database treats PL/Python code as potentially dangerous because it has access to the full capabilities of the Python interpreter, including the file system, network, and operating system.
Hello, PL/Python
A PL/Python function or procedure is created using standard SQL syntax for stored functions and procedures. The only difference from PL/pgSQL is the body of the user-defined function.
CREATE OR REPLACE FUNCTION hello_py_world()
RETURNS text
AS $$
return "Big elephant and long snake"
$$ LANGUAGE plpython3u;
The hello_py_world
function can be called with a SELECT
query:
pguser=# SELECT hello_py_world();
hello_py_world
-----------------------------
Big elephant and long snake
(1 row)
pguser=#
The extension is responsible for mapping PostgreSQL and Python data
types and transferring values from one environment to the other and back.
plpython3u
can be seen as a bridge between the database and the
host-provided Python, thus enabling access to Python’s standard library and
installed third-party libraries.
Just like PL/pgSQL functions, PL/Python functions can accept input arguments and return values of any PostgreSQL-supported data types.
pguser=# CREATE OR REPLACE FUNCTION hello_py_world(message text)
pguser-# RETURNS text
pguser-# AS $$
pguser$# import textwrap
pguser$#
pguser$#
pguser$# return textwrap.shorten(message, width=11)
pguser$# $$ LANGUAGE plpython3u;
CREATE FUNCTION
pguser=#
pguser=# SELECT hello_py_world('Big elephant and long snake');
hello_py_world
----------------
Big [...]
(1 row)
pguser=#
The above example demonstrates a modified hello_py_world
function, which accepts a TEXT
type parameter and uses the textwrap
module from the standard library to shorten the input.
PL/Python can also be used in anonymous code blocks with the DO statement:
DO $$
# PL/Python code
$$ LANGUAGE plpython3u;
To integrate user-defined Python code with the PostgreSQL execution engine, PL/Python exposes several special constructs:
-
plpy module – Provides access to database operations like
plpy.execute()
and logging. -
TD dictionary – Used for trigger-related functions to access trigger context, such as
TD["new"]
,TD["old"]
, orTD["event"]
. - SD dictionary – A session dictionary for storing private data between repeated calls to the same function.
- GD dictionary – A global session dictionary for public data that is available to all Python functions within a session; use with care.
Arts and Crafts
The magical combination of powerful tools like PostgreSQL and Python, when placed in the hands of an engineer with boundless creativity, can lead to fascinating and sometimes unexpected results.
The following examples demonstrate what becomes possible when PostgreSQL and Python work together. All examples were tested on Postgresql 17.5 with Python 3.13.5.
Generate Random Secure Token
Python's standard library includes a secrets
module for generating cryptographically secure random numbers—ideal for managing sensitive tokens. The following function exposes this functionality in PostgreSQL:
CREATE OR REPLACE FUNCTION generate_token_urlsafe(nbytes INTEGER DEFAULT 16)
RETURNS TEXT
AS $$
from secrets import token_urlsafe
return token_urlsafe(nbytes)
$$ LANGUAGE plpython3u;
Usage:
pguser=# SELECT generate_token_urlsafe();
generate_token_urlsafe
------------------------
oPEKMe5KmPpMXFo3w6md6Q
(1 row)
pguser=# SELECT generate_token_urlsafe(32);
generate_token_urlsafe
---------------------------------------------
sP2sm4wTFmwHLfvXGbG2TNJZ06s0EZjgZKC6t7XW4ZY
(1 row)
pguser=#
The defined function can also be used to generate default values for columns.
CREATE TABLE IF NOT EXISTS cache (
key TEXT PRIMARY KEY DEFAULT generate_token_urlsafe(),
data JSONB
);
Example:
pguser=# INSERT INTO cache(data) VALUES ('{"countries": []}') RETURNING key;
key
------------------------
Sf3KTK53i8jf8pdIGErnkQ
(1 row)
INSERT 0 1
pguser=#
Note: The schema used for caching is only for demonstration purposes. Real-world caching solutions are typically more complex and implementation-specific.
Remote Datasets as Virtual Tables
PL/Python makes it possible to dynamically load remote data as a virtual table using a user-defined set-returning function. This example demonstrates fetching publicly available historical weather data for U.S. cities from a remote source and combining it with local data stored in PostgreSQL.
Schema:
CREATE TABLE usa_states (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name TEXT
);
PL/Python Function:
CREATE OR REPLACE FUNCTION usa_state_weather()
RETURNS TABLE(city TEXT, state TEXT, temperature FLOAT)
AS $$
import csv
from urllib.request import urlopen
csv_data = csv.reader(
urlopen(
'https://corgis-edu.github.io/corgis/datasets/csv/weather/weather.csv'
).read().decode().strip().split('\n')[1:]
)
for row in csv_data:
yield [row[5], row[8], float(row[9])]
return None
$$ LANGUAGE plpython3u;
Usage:
pguser=# INSERT INTO usa_states(name) VALUES ('Washington'), ('Florida'), ('California'), ('Texas');
pguser=# SELECT * FROM usa_state_weather() LIMIT 10;
city | state | temperature
------------+---------+-------------
Birmingham | Alabama | 39
Huntsville | Alabama | 39
Mobile | Alabama | 46
Montgomery | Alabama | 45
Anchorage | Alaska | 34
Annette | Alaska | 38
Bethel | Alaska | 30
Bettles | Alaska | 22
Cold Bay | Alaska | 34
Cordova | Alaska | 38
(10 rows)
pguser=# SELECT
name,
city,
temperature
FROM
usa_states
INNER JOIN usa_state_weather() AS weather ON weather.state = usa_states.name
ORDER BY name
LIMIT 10;
name | city | temperature
------------+-------------+-------------
California | China Lake | 79
California | Bakersfield | 59
California | Bishop | 50
California | China Lake | 55
California | Concord | 59
California | Eureka | 55
California | Fresno | 58
California | Hayward | 59
California | Long Beach | 67
California | Los Angeles | 66
(10 rows)
pguser=#
While this approach allows you to pull external data directly into PostgreSQL, it comes with several significant limitations:
-
Network reliability
HTTP requests from within PostgreSQL depend on external network availability. -
Security risks
Downloading and deserializing untrusted content inside the database can expose your system to malicious data or code. -
Performance
Remote calls are slow, and doing them per-query may not be suitable for high-frequency or production workloads.
Use this technique wisely, preferably in controlled or internal environments, and avoid calling such functions in critical transactional code paths.
Sentiment Analysis
This example uses the third-party library textblob
to perform sentiment analysis on user-submitted reviews. A trigger function written in PL/Python analyzes the sentiment of the review text on each INSERT
or UPDATE
and stores both the reaction (positive, negative, neutral) and polarity score in the corresponding columns. For this to work, the textblob
package must be installed and accessible to the PostgreSQL server’s Python environment.
python3 -m pip install textblob
Schema:
CREATE TABLE IF NOT EXISTS reviews (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
email TEXT,
review TEXT,
reaction TEXT,
polarity FLOAT,
created TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);
Trigger function:
CREATE OR REPLACE FUNCTION analyze_sentiment()
RETURNS trigger
AS $$
from textblob import TextBlob
if review := TD['new']['review']:
polarity = TextBlob(review).sentiment.polarity
TD['new']['polarity'] = polarity
if polarity > 0:
TD['new']['reaction'] = 'positive'
elif polarity < 0:
TD['new']['reaction'] = 'negative'
else:
TD['new']['reaction'] = 'neutral'
return "MODIFY"
$$ LANGUAGE plpython3u;
Trigger setup:
CREATE TRIGGER tg_analyze_sentiment BEFORE INSERT OR UPDATE ON reviews
FOR EACH ROW EXECUTE FUNCTION analyze_sentiment();
Usage:
pguser=# INSERT INTO reviews(email, review) VALUES
('john.dow@domain.org', 'I am really pleased with how your application works. Thank you!'),
('jane.dow@domain.org', 'I tried to use the app, but there are too many errors!'),
('', 'I absolutely hate it. Nothing works...');
INSERT 0 3
pguser=# SELECT * FROM reviews;
-[ RECORD 1 ]------------------------------------------------------------
id | 2
email | john.dow@domain.org
review | I am really pleased with how your application works. Thank you!
reaction | positive
polarity | 0.25
created | 2025-07-04 11:50:43.009374
-[ RECORD 2 ]------------------------------------------------------------
id | 3
email | jane.dow@domain.org
review | I tried to use the app, but there are too many errors!
reaction | positive
polarity | 0.625
created | 2025-07-04 11:50:43.009374
-[ RECORD 3 ]------------------------------------------------------------
id | 4
email |
review | I absolutely hate it. Nothing works...
reaction | negative
polarity | -0.8
created | 2025-07-04 11:50:43.009374
pguser=#
The sentiment analysis in this example is heavily simplified and serves illustrative purposes only. In real-world applications, natural language processing involves more sophisticated techniques, larger models, and nuanced interpretation. Many robust libraries and services based on machine learning or transformer models can deliver more accurate and context-aware results.
Abusing Anonymous Code Blocks
Sometimes, database migrations become more complex than originally intended. They may involve loading large amounts of data from the local file system or manipulating deeply nested JSON structures. While PL/pgSQL is capable of handling JSON, Python often allows for more concise and comprehensible solutions.
The following is a real-world example of a configuration update during a database migration. By using PL/Python in an anonymous DO
block, it was possible to avoid writing and distributing a separate update script for execution on live installations.
DO $$
import os
import json
import plpy
network = {"csp": None, "cookie_samesite": "Strict"}
policy = {
"connect-src": "",
"script-src": "",
"style-src": "",
"img-src": "",
"font-src": "",
"frame-src": "",
"frame-ancestors": ""
}
csp_filepath = "/opt/shared/csp.json"
if os.path.isfile(csp_filepath):
with open(csp_filepath, 'r') as fd:
policy.update(json.load(fd))
os.unlink(csp_filepath)
network["csp"] = policy
plpy.execute(f"INSERT INTO settings VALUES ('network', '{json.dumps(network)}', 'system')")
$$ LANGUAGE 'plpython3u';
Note: This kind of anonymous block is powerful but should be used with caution. It runs as superuser, touches the filesystem, and may introduce platform-specific or environment-specific dependencies.
DOOM in PostgreSQL
Running a full DOOM clone in your database? Not quite—but with enough Python and a brave spirit, who knows where this road leads?
PL/Python: Power and Pitfalls
Every technology comes with its own advantages and limitations. The universal answer to whether a particular technology should be used—or replaced—is: "it depends". It always depends on multiple factors, such as the environment, the type of software being developed, requirements, existing infrastructure, and available expertise. A solution might fit almost perfectly in one context but fail miserably in another. That doesn’t mean the solution is inherently bad or not reusable.
PL/Python is no exception. It can be a powerful tool in the right hands and under the right conditions—but in other cases, it might become a source of friction or even a liability.
One of the most powerful advantages of PL/Python is that it brings Python’s extensive ecosystem into the database without requiring the data to leave PostgreSQL. You can run Python logic—such as string manipulation, data validation, or even lightweight ML inference—directly where the data lives. Unless explicitly programmed to export or transmit data, everything stays inside the database engine. This approach enhances security and privacy, as it reduces the need to expose sensitive data to external services, scripts, or tools. This is particularly valuable in environments with strict data governance requirements or limited network access.
Below is a summary of the most important advantages and limitations to consider when deciding whether to use PL/Python in a PostgreSQL-based system.
Advantages of Using PL/Python
-
Access to Python's extensive ecosystem
PL/Python brings Python's standard library and installed third-party packages (e.g.re
,json
,datetime
, evennumpy
orpandas
if available) directly into PostgreSQL. This opens the door to more expressive and powerful logic than what PL/pgSQL offers. -
Data Stays Inside the Database
Even while using Python’s expressive features, data doesn't leave the PostgreSQL instance unless explicitly programmed to do so. This reduces data movement and exposure, helping improve security, privacy, and compliance with standards like GDPR or HIPAA. -
Rapid Prototyping
PL/Python allows fast prototyping of complex logic directly in the database. This is helpful in experimentation, data science, and proof-of-concept development without the need to fully externalize data or build API pipelines. -
Reuse of Existing Python Knowledge
Many developers and data scientists already know Python. PL/Python lets teams leverage existing skills without needing deep knowledge of PL/pgSQL or SQL-based procedural logic. -
Trigger Support and Context Awareness
PL/Python functions can be used as triggers and have access to PostgreSQL's special runtime dictionaries likeTD
andSD
, allowing advanced workflows and stateful logic within the database.
Drawbacks of PL/Python
-
Untrusted Language: Superuser Required
PL/Python is an untrusted language (plpython3u
), meaning it can access the file system, network, and OS-level resources. Only superusers are allowed to create or modify PL/Python functions, which makes it unsuitable for use in multi-tenant or security-constrained environments. -
Limited SQL Integration
Unlike PL/pgSQL, you can’t embed SQL statements directly inside the function body. Instead, you must useplpy.execute()
or wrap SQL logic in separate functions. This can make code more verbose and less integrated with PostgreSQL's declarative style. -
Performance Overhead
For simple or performance-critical logic, PL/Python may be slower than native SQL or PL/pgSQL due to context switching and interpreter overhead. It's better suited to logic complexity rather than raw execution speed. -
Deployment and Portability Challenges
If a PL/Python function depends on external libraries, those must be installed and configured on every PostgreSQL server instance. This adds complexity to deployment, testing, and containerization workflows. -
Harder to Debug
Debugging errors inside PL/Python functions can be harder than debugging pure SQL or PL/pgSQL, especially since you're juggling two runtime environments (PostgreSQL and Python) with their own exception behaviors.
Conclusions
PL/Python is a powerful extension that unlocks a new dimension of flexibility within PostgreSQL. By bringing Python’s expressive syntax and rich ecosystem into the database engine, it enables developers and data scientists to implement complex logic, advanced text processing, and custom workflows directly where the data lives.
This tight integration can lead to more secure, maintainable, and efficient systems—especially when moving data outside the database is costly, risky, or unnecessary.
At the same time, this extension is not a silver bullet. It comes with important trade-offs: it requires superuser privileges, imposes restrictions on SQL integration and transaction control, and introduces potential deployment complexities. It’s a tool best used with intent, not by default.
In the right hands and the right context, PL/Python can simplify development, reduce friction, and open the door to creative solutions that would be difficult—or impossible—with SQL alone.
Whether you’re looking to experiment, prototype, or push the boundaries of what’s possible inside your database, PL/Python is a worthy tool to explore.
No comments: