Upgrade PostgreSQL with pg_ugprade

This guide outlines the steps to upgrade a PostgreSQL server instance to a newer version using the pg_upgrade utility.

These instructions are written specifically for upgrading from PostgreSQL 10 to PostgreSQL 15.1. However, the upgrade process is generic and you should be able to swap out the version numbers for your use-case and still follow this how-to guide without a hiccup.

Additionally, this guide is for PostgreSQL servers on Red Hat Enterprise Linux / Cent OS / Oracle Linux – more generally known as the Red Hat family of Linux distros, but again, the process is similar for other Linux distributions – just replace RPM repository and package with the corresponding package repository and package name.

Customary Note: Always ensure that files and databases are backed up properly before performing an upgrade.

On that note, start by backing up the existing client authentication configuration file – pg_hba.conf (HBA stands for Host-based Authentication).

sudo cp /var/lib/pgsql/10/data/pg_hba.conf ~/pg_hba.conf.bak

Install the repository, install the newer version of PostgreSQL and initialize the database:

# Install the repository RPM:
sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Install the newer version of PostgreSQL:
sudo yum install -y postgresql15-server

# Optionally initialize the database:
sudo /usr/pgsql-15/bin/postgresql-15-setup initdb

Change to postgres user:

sudo su - postgres

Stop all connections from the application (or elsewhere) to the database in the older version of PostgreSQL:

/usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data -mf stop
waiting for server to shut down.... done server stopped

Use the pg_upgrade utility (from the new installation) to do a dry run and see if the new database cluster is compatible with the old one using the --check flag.

Make sure to specify the bin and data directories for the old and the new database clusters respectively:

/usr/pgsql-15/bin/pg_upgrade --old-bindir /usr/pgsql-10/bin --new-bindir /usr/pgsql-15/bin --old-datadir /var/lib/pgsql/10/data --new-datadir /var/lib/pgsql/15/data --link --check
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for user-defined encoding conversions              ok
Checking for user-defined postfix operators                 ok
Checking for incompatible polymorphic functions             ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok
 
*Clusters are compatible*

If everything looks good, run the upgrade with the same command but without the --check flag.

/usr/pgsql-15/bin/pg_upgrade --old-bindir /usr/pgsql-10/bin --new-bindir /usr/pgsql-15/bin --old-datadir /var/lib/pgsql/10/data --new-datadir /var/lib/pgsql/15/data --link
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for user-defined encoding conversions              ok
Checking for user-defined postfix operators                 ok
Checking for incompatible polymorphic functions             ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Creating dump of global objects                             ok
Creating dump of database schemas                          
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok
Checking for new cluster tablespace directories             ok
 
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
 
Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_xact to new server                           ok
Setting oldest XID for new cluster                          ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster              
                                                            ok
Adding ".old" suffix to old global/pg_control               ok
 
If you want to start the old cluster, you will need to remove
the ".old" suffix from /var/lib/pgsql/10/data/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.
 
Linking user relation files                                
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to delete old cluster                       ok
Checking for extension updates                              ok
 
Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /usr/pgsql-15/bin/vacuumdb --all --analyze-in-stages
 
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Exit out of the postgres user’s shell by employing one of these methods:

  • Press CTRL+D, or
  • Type logout and press Enter/Return key, or
  • Type exit and press Enter/Return key

Enable and start the new PostgreSQL service:

sudo systemctl enable postgresql-15 sudo systemctl start postgresql-15

Optionally, to check if the service is now running:

# optionally, to check
sudo systemctl status postgresql-15
● postgresql-15.service - PostgreSQL 15 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-15.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-01-03 16:49:21 EST; 7s ago
     Docs: https://www.postgresql.org/docs/15/static/
  Process: 4482 ExecStartPre=/usr/pgsql-15/bin/postgresql-15-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 4488 (postmaster)
    Tasks: 7
   CGroup: /system.slice/postgresql-15.service
           ├─4488 /usr/pgsql-15/bin/postmaster -D /var/lib/pgsql/15/data/
           ├─4492 postgres: logger
           ├─4493 postgres: checkpointer
           ├─4494 postgres: background writer
           ├─4497 postgres: walwriter
           ├─4498 postgres: autovacuum launcher
           └─4499 postgres: logical replication launcher
 
Jan 03 16:49:21 DA-WS-07 systemd[1]: Starting PostgreSQL 15 database server...
Jan 03 16:49:21 DA-WS-07 postmaster[4488]: 2023-01-03 16:49:21.136 EST [4488] LOG:  redirecting log output to logging collector process
Jan 03 16:49:21 DA-WS-07 postmaster[4488]: 2023-01-03 16:49:21.136 EST [4488] HINT:  Future log output will appear in directory "log".
Jan 03 16:49:21 DA-WS-07 systemd[1]: Started PostgreSQL 15 database server.

Once the service is started, change back to postgres user:

sudo su - postgres


Now, optimizer statistics can be generated with the vacuumdb utility (as they’re not transferred over by pg_upgrade utility):

# this could take a few minutes (especially when generating full optimizer statistics)
/usr/pgsql-15/bin/vacuumdb --all --analyze-in-stages
vacuumdb: processing database "test_db": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "test_db": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "test_db": Generating default (full) optimizer statistics
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics

At this point, you could restore the backed up pg_hba.conf file, or manually allow localhost to access the PostgreSQL database cluster over TCP by modifying the pg_hba.conf file:

vi /var/lib/pgsql/15/data/pg_hba.conf

Comment the following lines (the hash or pound sign # indicates a commented line):

# "local" is for Unix domain socket connections only
# local   all             all                                     peer
# IPv4 local connections:
# host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
# host    all             all             ::1/128                 scram-sha-256

Add the following lines to the end of the file, replacing test_db with the name of your database and test_user with the name of your database user:

# Allow via TCP from localhost
host    test_db         test_user   127.0.0.1/32            md5
host    test_db         test_user   ::1/32                  md5

Once again, exit out of the postgres user’s shell by employing one of these methods:

  • Press CTRL+D, or
  • Type logout and press Enter/Return key, or
  • Type exit and press Enter/Return key

Restart the database service:

sudo systemctl restart postgresql-15

Upgrade psql client by creating a symlink to the latest bundled version:

sudo ln -sf /usr/pgsql-15/bin/psql /usr/bin/psql

At this point, the upgrade is complete and the everything should run normally.

Once database functionality is checked and verified for a successful upgrade, delete the old cluster’s data files by running the following as postgres user, if needed:

sudo su - postgres ./delete_old_cluster.sh

Exit out of the postgres user shell for the final time. The upgrade is complete.

CAS Registry Number Validation using PL SQL

CAS Registry Numbers are universally used to provide a unique, unmistakable identifier for chemical substances. CAS stands for Chemical Abstracts Service, a division of the American Chemical Society, that assigns a unique numerical identifier to every chemical substance described in the open scientific literature.

The CAS Registry database contains more than 113 million unique organic and inorganic chemical substances, such as alloys, coordination compounds, minerals, mixtures, polymers and salts, and more than 66 million sequences. CAS Registry contains substances reported in the literature back to the early 1800s and over 15,000 substances are added daily. [1]

A CAS Registry Number itself has no inherent chemical significance but provides an unambiguous way to identify a chemical substance or molecular structure when there are many possible systematic, generic, proprietary or trivial names. [2] These numbers are assigned in a sequential, increasing order when the substance is identified by CAS scientists for inclusion in the CAS REGISTRY database,

A CAS number is separated by hyphens into three parts, the first consisting from two up to seven digits, the second consisting of two digits, and the third consisting of a single digit serving as a check digit. The check digit is calculated by taking the sum of last digit times 1, the previous digit times 2 and the still previous digit times 3 and so on, and finally computing the remainder upon dividing the sum by 10.

For example, the CAS number of water is 7732-18-5: the checksum 5 is calculated as (8×1 + 1×2 + 2×3 + 3×4 + 7×5 + 7×6) = 105; 105 mod 10 = 5.

I wrote a PL SQL function to validate CAS Registry Numbers for use in one of the database applications for a semiconductors company for managing the chemicals used in their chip manufacturing facilities.

The function f_is_valid_cas_number takes a CAS number as input and returns 1 if it is valid and 0 otherwise. The following examples show how the function can be used:

-- Valid CAS Number for Water, returns 1
SELECT f_is_valid_cas_number('7732-18-5') FROM dual;

-- Invalid syntax, returns 0
SELECT f_is_valid_cas_number('7A2-181-522') FROM dual;

-- Valid syntax but incorrect checksum, returns 0
SELECT f_is_valid_cas_number('1-11-5') FROM dual;

In an Oracle PL SQL program, the function can be used to set a flag as follows:

-- Valid CAS Number for Water, sets is_valid_cas_number flag to 1
is_valid_cas_number NUMBER(1) := f_is_valid_cas_number('7732-18-5');

The code for the function is available in the CAS Number Validator repository on Github. I hope you find it useful.

ORA-03113: end-of-file on communication channel

One of the lesser used features in an application using Oracle Database installed on a UNIX server stopped working recently throwing the ORA-03113 error. The oerr utility provided the following information:

$ oerr ora 3113

ORA-03113: end-of-file on communication channel

Cause: The connection between Client and Server process was broken.

Action: There was a communication error that requires further investigation.

This meant that one of these most probable things had happened:

  • The network connection was lost
  • The server process had crashed

(Note that ORA-03113 is a generic error and there are over 1000 bugs related to it on Oracle Support!)

First possibility was quickly eliminated as the error wasn’t encountered just once or twice which might happen due to that rare, momentary glitch in network connectivity. But it did happen every single time that particular feature was used.

This meant that something in the function invoked, when the feature in question was run, was causing the server process to die. Running through the code using a test script led to the point of failure.

A simple dynamic SQL query was being executed without event.

-- The following query is being used multiple times later on
v_sqlstr := 'SELECT column1, column2, SUM( column_qty ) AS proposed_qty
             FROM a_global_temporary_table	-- or any table
             WHERE column1 = ' || column1_value ||
             ' GROUP BY column1, column2' ||
             ' ORDER BY MOD( SUM( column_qty ) )';

-- Count the number of rows returned by the "ordered" query above
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM (' || v_sqlstr || ')' INTO v_count;

But why?

Playing around with bits and pieces of the query stored in v_sqlstr led to the finding that when trying to get COUNT(*) using v_sqlstr as the source, including the MOD() in the ORDER BY clause of the query was causing the server process to crash. Weird!

To fix the issue, and because you don’t need an ordered result set to take a count, I removed the ORDER BY clause from the original query and appended it back only after taking the count, as shown below:

-- The following query is being used multiple times
v_sqlstr := 'SELECT column1, column2, SUM( column_qty ) AS proposed_qty
             FROM a_global_temporary_table	-- or any table
             WHERE column1 = ' || column1_value ||
             ' GROUP BY column1, column2';

-- Count the number of rows returned by the query above
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM (' || v_sqlstr || ')' INTO v_count;

-- Append the ORDER BY clause to the original query
v_sqlstr := v_sqlstr || ' ORDER BY MOD( SUM( column_qty ) )';

And things were back on track!

This is just one of the many possible causes and fixes for this error. I hope you find it useful in case you get stuck in a similar situation. Cheers!