Befriending an Elephant with a Snake: A Practical Guide to PL/Python

Overview

PostgreSQL has firmly established itself as one of the core technologies powering modern software systems. It’s consistently ranked as the relational database of the year by DB-Engines, and for good reason. One of PostgreSQL’s most powerful traits is its extensibility—a feature that continues to set it apart in the world of relational databases.


Among the various ways to extend PostgreSQL, procedural language (PL) functions stand out as a practical and powerful mechanism. In this article, we’ll focus on one particular extension: PL/Python, which enables writing functions and procedures in Python.


Python itself needs little introduction. It’s a dominant force across software development, data science, and the AI/ML ecosystem. When combined with PostgreSQL, Python opens up new possibilities—from complex logic and text processing to integrations and experimentation—that go far beyond what traditional SQL can handle.


The article provides a hands-on technical introduction to PL/Python. Whether you're a backend engineer curious about database scripting, or a data scientist looking to bring Python closer to your data, this guide will walk you through PL/Python’s real-world use, its strengths, and its limitations.


Setting Things Up

PL/Python is available in PostgreSQL through the plpython3u extension, which is included in most standard PostgreSQL installations. You can enable it just like any other extension:

CREATE EXTENSION plpython3u;


Note: For plpython3u to work, Python 3 must be installed and available on the host system. PostgreSQL does not bundle its own Python runtime; it links to the system-installed version. If Python is missing or incompatible, the extension may fail to install or execute correctly.


The u at the end of plpython3u stands for “untrusted". This is an important detail. An untrusted procedural language means PostgreSQL can't safely restrict what the code inside a function can do. As a result, only superusers are allowed to create functions in untrusted languages. PostgreSQL’s official documentation puts it clearly:

The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator.


In other words, PL/Python gives you a lot of power—but with that comes a strong responsibility. The database treats PL/Python code as potentially dangerous because it has access to the full capabilities of the Python interpreter, including the file system, network, and operating system.


Hello, PL/Python

A PL/Python function or procedure is created using standard SQL syntax for stored functions and procedures. The only difference from PL/pgSQL is the body of the user-defined function.

CREATE OR REPLACE FUNCTION hello_py_world()
    RETURNS text
AS $$
    return "Big elephant and long snake"
$$ LANGUAGE plpython3u;


The hello_py_world function can be called with a SELECT query:

pguser=# SELECT hello_py_world();
       hello_py_world        
-----------------------------
 Big elephant and long snake
(1 row)

pguser=#


The extension is responsible for mapping PostgreSQL and Python data types and transferring values from one environment to the other and back. plpython3u can be seen as a bridge between the database and the host-provided Python, thus enabling access to Python’s standard library and installed third-party libraries.


Just like PL/pgSQL functions, PL/Python functions can accept input arguments and return values of any PostgreSQL-supported data types.


pguser=# CREATE OR REPLACE FUNCTION hello_py_world(message text)
pguser-#     RETURNS text
pguser-# AS $$
pguser$#     import textwrap
pguser$# 
pguser$# 
pguser$#     return textwrap.shorten(message, width=11)
pguser$# $$ LANGUAGE plpython3u;
CREATE FUNCTION
pguser=#
pguser=# SELECT hello_py_world('Big elephant and long snake');
 hello_py_world 
----------------
 Big [...]
(1 row)

pguser=#


The above example demonstrates a modified hello_py_world function, which accepts a TEXT type parameter and uses the textwrap module from the standard library to shorten the input.


PL/Python can also be used in anonymous code blocks with the DO statement:

DO $$
    # PL/Python code
$$ LANGUAGE plpython3u;


To integrate user-defined Python code with the PostgreSQL execution engine, PL/Python exposes several special constructs:

  • plpy module – Provides access to database operations like plpy.execute() and logging.
  • TD dictionary – Used for trigger-related functions to access trigger context, such as TD["new"], TD["old"], or TD["event"].
  • SD dictionary – A session dictionary for storing private data between repeated calls to the same function.
  • GD dictionary – A global session dictionary for public data that is available to all Python functions within a session; use with care.


Arts and Crafts

The magical combination of powerful tools like PostgreSQL and Python, when placed in the hands of an engineer with boundless creativity, can lead to fascinating and sometimes unexpected results.


The following examples demonstrate what becomes possible when PostgreSQL and Python work together. All examples were tested on Postgresql 17.5 with Python 3.13.5.


Generate Random Secure Token

Python's standard library includes a secrets module for generating cryptographically secure random numbers—ideal for managing sensitive tokens. The following function exposes this functionality in PostgreSQL:

CREATE OR REPLACE FUNCTION generate_token_urlsafe(nbytes INTEGER DEFAULT 16)
    RETURNS TEXT
AS $$
    from secrets import token_urlsafe


    return token_urlsafe(nbytes)
$$ LANGUAGE plpython3u;


Usage:

pguser=# SELECT generate_token_urlsafe();
 generate_token_urlsafe 
------------------------
 oPEKMe5KmPpMXFo3w6md6Q
(1 row)

pguser=# SELECT generate_token_urlsafe(32);
           generate_token_urlsafe            
---------------------------------------------
 sP2sm4wTFmwHLfvXGbG2TNJZ06s0EZjgZKC6t7XW4ZY
(1 row)

pguser=#


The defined function can also be used to generate default values for columns.

CREATE TABLE IF NOT EXISTS cache (
    key TEXT PRIMARY KEY DEFAULT generate_token_urlsafe(),
    data JSONB
);


Example:

pguser=# INSERT INTO cache(data) VALUES ('{"countries": []}') RETURNING key;
          key           
------------------------
 Sf3KTK53i8jf8pdIGErnkQ
(1 row)

INSERT 0 1
pguser=#


Note: The schema used for caching is only for demonstration purposes. Real-world caching solutions are typically more complex and implementation-specific.


Remote Datasets as Virtual Tables

PL/Python makes it possible to dynamically load remote data as a virtual table using a user-defined set-returning function. This example demonstrates fetching publicly available historical weather data for U.S. cities from a remote source and combining it with local data stored in PostgreSQL.


Schema:

CREATE TABLE usa_states (
    id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    name TEXT
);


PL/Python Function:

CREATE OR REPLACE FUNCTION usa_state_weather()
    RETURNS TABLE(city TEXT, state TEXT, temperature FLOAT)
AS $$
    import csv
    from urllib.request import urlopen


    csv_data = csv.reader(
        urlopen(
            'https://corgis-edu.github.io/corgis/datasets/csv/weather/weather.csv'
        ).read().decode().strip().split('\n')[1:]
    )
    for row in csv_data:
        yield [row[5], row[8], float(row[9])]

    return None
$$ LANGUAGE plpython3u;


Usage:

pguser=# INSERT INTO usa_states(name) VALUES ('Washington'), ('Florida'), ('California'), ('Texas');
pguser=# SELECT * FROM usa_state_weather() LIMIT 10;
    city    |  state  | temperature 
------------+---------+-------------
 Birmingham | Alabama |          39
 Huntsville | Alabama |          39
 Mobile     | Alabama |          46
 Montgomery | Alabama |          45
 Anchorage  | Alaska  |          34
 Annette    | Alaska  |          38
 Bethel     | Alaska  |          30
 Bettles    | Alaska  |          22
 Cold Bay   | Alaska  |          34
 Cordova    | Alaska  |          38
(10 rows)

pguser=# SELECT
    name,
    city,
    temperature
FROM
    usa_states
    INNER JOIN usa_state_weather() AS weather ON weather.state = usa_states.name
ORDER BY name
LIMIT 10;
    name    |    city     | temperature 
------------+-------------+-------------
 California | China Lake  |          79
 California | Bakersfield |          59
 California | Bishop      |          50
 California | China Lake  |          55
 California | Concord     |          59
 California | Eureka      |          55
 California | Fresno      |          58
 California | Hayward     |          59
 California | Long Beach  |          67
 California | Los Angeles |          66
(10 rows)

pguser=#


While this approach allows you to pull external data directly into PostgreSQL, it comes with several significant limitations:

  • Network reliability
    HTTP requests from within PostgreSQL depend on external network availability.
  • Security risks
    Downloading and deserializing untrusted content inside the database can expose your system to malicious data or code.
  • Performance
    Remote calls are slow, and doing them per-query may not be suitable for high-frequency or production workloads.

Use this technique wisely, preferably in controlled or internal environments, and avoid calling such functions in critical transactional code paths.


Sentiment Analysis

This example uses the third-party library textblob to perform sentiment analysis on user-submitted reviews. A trigger function written in PL/Python analyzes the sentiment of the review text on each INSERT or UPDATE and stores both the reaction (positive, negative, neutral) and polarity score in the corresponding columns. For this to work, the textblob package must be installed and accessible to the PostgreSQL server’s Python environment.

python3 -m pip install textblob


Schema:

CREATE TABLE IF NOT EXISTS reviews (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    email TEXT,
    review TEXT,
    reaction TEXT,
    polarity FLOAT,
    created TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);


Trigger function:

CREATE OR REPLACE FUNCTION analyze_sentiment()
    RETURNS trigger
AS $$
    from textblob import TextBlob


    if review := TD['new']['review']:
        polarity = TextBlob(review).sentiment.polarity
        TD['new']['polarity'] = polarity

        if polarity > 0:
            TD['new']['reaction'] = 'positive'
        elif polarity < 0:
            TD['new']['reaction'] = 'negative'
        else:
            TD['new']['reaction'] = 'neutral'

    return "MODIFY"
$$ LANGUAGE plpython3u;


Trigger setup:

CREATE TRIGGER tg_analyze_sentiment BEFORE INSERT OR UPDATE ON reviews
    FOR EACH ROW EXECUTE FUNCTION analyze_sentiment();


Usage:

pguser=# INSERT INTO reviews(email, review) VALUES
    ('john.dow@domain.org', 'I am really pleased with how your application works. Thank you!'),
    ('jane.dow@domain.org', 'I tried to use the app, but there are too many errors!'),
    ('', 'I absolutely hate it. Nothing works...');
INSERT 0 3
pguser=# SELECT * FROM reviews;
-[ RECORD 1 ]------------------------------------------------------------
id       | 2
email    | john.dow@domain.org
review   | I am really pleased with how your application works. Thank you!
reaction | positive
polarity | 0.25
created  | 2025-07-04 11:50:43.009374
-[ RECORD 2 ]------------------------------------------------------------
id       | 3
email    | jane.dow@domain.org
review   | I tried to use the app, but there are too many errors!
reaction | positive
polarity | 0.625
created  | 2025-07-04 11:50:43.009374
-[ RECORD 3 ]------------------------------------------------------------
id       | 4
email    | 
review   | I absolutely hate it. Nothing works...
reaction | negative
polarity | -0.8
created  | 2025-07-04 11:50:43.009374

pguser=#


The sentiment analysis in this example is heavily simplified and serves illustrative purposes only. In real-world applications, natural language processing involves more sophisticated techniques, larger models, and nuanced interpretation. Many robust libraries and services based on machine learning or transformer models can deliver more accurate and context-aware results.


Abusing Anonymous Code Blocks

Sometimes, database migrations become more complex than originally intended. They may involve loading large amounts of data from the local file system or manipulating deeply nested JSON structures. While PL/pgSQL is capable of handling JSON, Python often allows for more concise and comprehensible solutions.


The following is a real-world example of a configuration update during a database migration. By using PL/Python in an anonymous DO block, it was possible to avoid writing and distributing a separate update script for execution on live installations.

DO $$
import os
import json

import plpy


network = {"csp": None, "cookie_samesite": "Strict"}
policy = {
    "connect-src": "",
    "script-src": "",
    "style-src": "",
    "img-src": "",
    "font-src": "",
    "frame-src": "",
    "frame-ancestors": ""
}
csp_filepath = "/opt/shared/csp.json"

if os.path.isfile(csp_filepath):
    with open(csp_filepath, 'r') as fd:
        policy.update(json.load(fd))
    os.unlink(csp_filepath)

network["csp"] = policy

plpy.execute(f"INSERT INTO settings VALUES ('network', '{json.dumps(network)}', 'system')")
$$ LANGUAGE 'plpython3u';


Note: This kind of anonymous block is powerful but should be used with caution. It runs as superuser, touches the filesystem, and may introduce platform-specific or environment-specific dependencies.


DOOM in PostgreSQL

Running a full DOOM clone in your database? Not quite—but with enough Python and a brave spirit, who knows where this road leads?


PL/Python: Power and Pitfalls

Every technology comes with its own advantages and limitations. The universal answer to whether a particular technology should be used—or replaced—is: "it depends". It always depends on multiple factors, such as the environment, the type of software being developed, requirements, existing infrastructure, and available expertise. A solution might fit almost perfectly in one context but fail miserably in another. That doesn’t mean the solution is inherently bad or not reusable.


PL/Python is no exception. It can be a powerful tool in the right hands and under the right conditions—but in other cases, it might become a source of friction or even a liability.


One of the most powerful advantages of PL/Python is that it brings Python’s extensive ecosystem into the database without requiring the data to leave PostgreSQL. You can run Python logic—such as string manipulation, data validation, or even lightweight ML inference—directly where the data lives. Unless explicitly programmed to export or transmit data, everything stays inside the database engine. This approach enhances security and privacy, as it reduces the need to expose sensitive data to external services, scripts, or tools. This is particularly valuable in environments with strict data governance requirements or limited network access.


Below is a summary of the most important advantages and limitations to consider when deciding whether to use PL/Python in a PostgreSQL-based system.


Advantages of Using PL/Python

  1. Access to Python's extensive ecosystem
    PL/Python brings Python's standard library and installed third-party packages (e.g. re, json, datetime, even numpy or pandas if available) directly into PostgreSQL. This opens the door to more expressive and powerful logic than what PL/pgSQL offers.
  2. Data Stays Inside the Database
    Even while using Python’s expressive features, data doesn't leave the PostgreSQL instance unless explicitly programmed to do so. This reduces data movement and exposure, helping improve security, privacy, and compliance with standards like GDPR or HIPAA.
  3. Rapid Prototyping
    PL/Python allows fast prototyping of complex logic directly in the database. This is helpful in experimentation, data science, and proof-of-concept development without the need to fully externalize data or build API pipelines.
  4. Reuse of Existing Python Knowledge
    Many developers and data scientists already know Python. PL/Python lets teams leverage existing skills without needing deep knowledge of PL/pgSQL or SQL-based procedural logic.
  5. Trigger Support and Context Awareness
    PL/Python functions can be used as triggers and have access to PostgreSQL's special runtime dictionaries like TD and SD, allowing advanced workflows and stateful logic within the database.


Drawbacks of PL/Python

  1. Untrusted Language: Superuser Required
    PL/Python is an untrusted language (plpython3u), meaning it can access the file system, network, and OS-level resources. Only superusers are allowed to create or modify PL/Python functions, which makes it unsuitable for use in multi-tenant or security-constrained environments.
  2. Limited SQL Integration
    Unlike PL/pgSQL, you can’t embed SQL statements directly inside the function body. Instead, you must use plpy.execute() or wrap SQL logic in separate functions. This can make code more verbose and less integrated with PostgreSQL's declarative style.
  3. Performance Overhead
    For simple or performance-critical logic, PL/Python may be slower than native SQL or PL/pgSQL due to context switching and interpreter overhead. It's better suited to logic complexity rather than raw execution speed.
  4. Deployment and Portability Challenges
    If a PL/Python function depends on external libraries, those must be installed and configured on every PostgreSQL server instance. This adds complexity to deployment, testing, and containerization workflows.
  5. Harder to Debug
    Debugging errors inside PL/Python functions can be harder than debugging pure SQL or PL/pgSQL, especially since you're juggling two runtime environments (PostgreSQL and Python) with their own exception behaviors.


Conclusions

PL/Python is a powerful extension that unlocks a new dimension of flexibility within PostgreSQL. By bringing Python’s expressive syntax and rich ecosystem into the database engine, it enables developers and data scientists to implement complex logic, advanced text processing, and custom workflows directly where the data lives.


This tight integration can lead to more secure, maintainable, and efficient systems—especially when moving data outside the database is costly, risky, or unnecessary.


At the same time, this extension is not a silver bullet. It comes with important trade-offs: it requires superuser privileges, imposes restrictions on SQL integration and transaction control, and introduces potential deployment complexities. It’s a tool best used with intent, not by default.


In the right hands and the right context, PL/Python can simplify development, reduce friction, and open the door to creative solutions that would be difficult—or impossible—with SQL alone.


Whether you’re looking to experiment, prototype, or push the boundaries of what’s possible inside your database, PL/Python is a worthy tool to explore.


Further Reading

No comments:

Powered by Blogger.