01 Nov, 2023

pg_experiments, Part 2: Adding a new data type

Exploring Postgres internals by adding a new data type.

Disclaimer: This is my second post on my experiment with Postgres internals. I'm still pretty new to this, and I have only explored a few things in the source code, so please let me know if I did things incorrectly.

Edit: I’ve pushed my changes here, feel free to check it out!

Introduction ¶

Two days in, I’ve made some tiny baby steps in understanding how this thing works. In my previous experiment, I was tinkering with + operator logic, so I decided it would be a nice time to explore what it takes to write my own data type.

After doing some research on this, I found out that the easiest way to do it was to develop an extension which would have the logic for the data type. You plug it in into Postgres, and it works as intended. But I wanted to explore further; building an extension didn’t feel like a challenge.

I wanted to add a new “in-built” data type by trying to tweak with Postgres’ backend. Why? Because it’s fun to break things and fix it. I took up the challenge and started finding things out….

Initial exploration ¶

As we saw in the previous experiment, logic for ADTs (abstract data types) is present in src/backend/utils/adt, so it made sense to explore this area once again. In order to not get intimidated by complex logic, I tried to search for an ADT which was easy to follow and had minimal logic.

I found bool.c to be a nice candidate for exploration; it was pretty easy to follow, and gave me an idea about how ADTs are structured. One pattern I found out was that all ADTs seem to implement an interface; most of them had function names with common suffixes like in, out, send, recv. On exploring further, it made sense:

in is meant for converting an external representation to your ADT’s internal representation
out is meant for converting your ADT’s internal representation to an external representation
send is meant for converting your ADT’s internal representation to a binary representation
recv is meant for converting an external binary representation to your ADT’s internal representation

ADTs must always have input and output functions. send and recv are optional and are used for providing binary input and output routines.

For this experiment, I wanted to build a custom color type which would store hex values and if possible, it should support operations on top of it. And since I haven’t looked into the internals of how colors are represented, this was the perfect opportunity.

Implementing the color type ¶

After multiple failed attempts and rewrites, I was finally happy with the internal representation for color - an uint32 integer:

/*-------------------------------------------------------------------------
 *
 * color.c
 *	  PostgreSQL type definition for a custom 'color' data type.
 *
 *-------------------------------------------------------------------------
 */

#include "postgres.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "libpq/pqformat.h"

typedef uint32 color;

#define COLOR_RED_SHIFT 16
#define COLOR_GREEN_SHIFT 8
#define COLOR_BLUE_SHIFT 0
#define COLOR_CHANNEL_MASK 0xFF

Next thing to do was to implement the input function. To implement it, I took a cstring as input and converted it to an uint32 by using scanf:

/*
 * color_in - Color reader. Accepts a hexadecimal string as the input, and converts it to a color value.
 */
PG_FUNCTION_INFO_V1(color_in);
Datum color_in(PG_FUNCTION_ARGS)
{
    char *str = PG_GETARG_CSTRING(0);
    uint32 c = 0;

    if (sscanf(str, "%06X", &c) != 1) {
        ereport(ERROR, (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                        errmsg("invalid input syntax for type color: \"%s\"", str)));
    }

    PG_RETURN_UINT32(c);
}

The output function takes an uint32 as an argument, and converts it to a hexadecimal string using snprintf:

/*
 * color_out - Color output function. Converts the internal color value to a hexadecimal string representation.
 */
PG_FUNCTION_INFO_V1(color_out);
Datum color_out(PG_FUNCTION_ARGS)
{
    uint32 c = PG_GETARG_UINT32(0);
    char str[7];

    // Format color as a 6-character hexadecimal string
    snprintf(str, sizeof(str), "%06X", c);
    PG_RETURN_CSTRING(pstrdup(str));
}

Send and receive routines ¶

Implementing send and recv was tricky. I checked bool’s implementation and it seemed to use a few things from pg_format.c, which had an interesting comment:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/pqformat.c;h=d712c80b5e76b7b488211291bbb63c51203c7cad;hb=HEAD#l2)

/*-------------------------------------------------------------------------
 *
 * pqformat.c
 *		Routines for formatting and parsing frontend/backend messages
 *
 * Outgoing messages are built up in a StringInfo buffer (which is expansible)
 * and then sent in a single call to pq_putmessage.  This module provides data
 * formatting/conversion routines that are needed to produce valid messages.
 * Note in particular the distinction between "raw data" and "text"; raw data
 * is message protocol characters and binary values that are not subject to
 * character set conversion, while text is converted by character encoding
 * rules.
 *
 * Incoming messages are similarly read into a StringInfo buffer, via
 * pq_getmessage, and then parsed and converted from that using the routines
 * in this module.

This is exactly what I needed to understand. So, I needed to store my message in a buffer of type StringInfo. But wait. Wasn’t the buffer of type StringInfoData in bool.c?

Let me check. Oh, okay.

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/lib/stringinfo.h;h=598ed093dc8757fd08f089b0a1c3cb15ec0c37d0;hb=HEAD#l22)

typedef struct StringInfoData
{
	char	   *data;
	int			len;
	int			maxlen;
	int			cursor;
} StringInfoData;

typedef StringInfoData *StringInfo;

The interface routines are also neatly specified in the code:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/pqformat.c;h=d712c80b5e76b7b488211291bbb63c51203c7cad;hb=HEAD#l32)

/*
 * INTERFACE ROUTINES
 * Message assembly and output:
 *		pq_beginmessage - initialize StringInfo buffer
 *		pq_sendbyte		- append a raw byte to a StringInfo buffer
 *		pq_sendint		- append a binary integer to a StringInfo buffer
 *		pq_sendint64	- append a binary 8-byte int to a StringInfo buffer
 *		pq_sendfloat4	- append a float4 to a StringInfo buffer
 *		pq_sendfloat8	- append a float8 to a StringInfo buffer
 *		pq_sendbytes	- append raw data to a StringInfo buffer
 *		pq_sendcountedtext - append a counted text string (with character set conversion)
 *		pq_sendtext		- append a text string (with conversion)
 *		pq_sendstring	- append a null-terminated text string (with conversion)
 *		pq_send_ascii_string - append a null-terminated text string (without conversion)
 *		pq_endmessage	- send the completed message to the frontend
 * Note: it is also possible to append data to the StringInfo buffer using
 * the regular StringInfo routines, but this is discouraged since required
 * character set conversion may not occur.
 *
 * typsend support (construct a bytea value containing external binary data):
 *		pq_begintypsend - initialize StringInfo buffer
 *		pq_endtypsend	- return the completed string as a "bytea*"
 *
 * Special-case message output:
 *		pq_puttextmessage - generate a character set-converted message in one step
 *		pq_putemptymessage - convenience routine for message with empty body
 *
 * Message parsing after input:
 *		pq_getmsgbyte	- get a raw byte from a message buffer
 *		pq_getmsgint	- get a binary integer from a message buffer
 *		pq_getmsgint64	- get a binary 8-byte int from a message buffer
 *		pq_getmsgfloat4 - get a float4 from a message buffer
 *		pq_getmsgfloat8 - get a float8 from a message buffer
 *		pq_getmsgbytes	- get raw data from a message buffer
 *		pq_copymsgbytes - copy raw data from a message buffer
 *		pq_getmsgtext	- get a counted text string (with conversion)
 *		pq_getmsgstring - get a null-terminated text string (with conversion)
 *		pq_getmsgrawstring - get a null-terminated text string - NO conversion
 *		pq_getmsgend	- verify message fully consumed
 */

For my color type, I needed the following:

pq_begintypsend: for initializing the buffer
pq_endtypsend: for “finalizing” the buffer and converting the buffer contents to bytea
pq_sendint: for appending an integer to the buffer
pq_getmsgint: for getting a binary integer from the buffer

Cool. Let’s use these routines for building our send and receive functionality:

/*
 * color_send - converts color to binary format
 */
PG_FUNCTION_INFO_V1(color_send);
Datum color_send(PG_FUNCTION_ARGS)
{
    uint32 c = PG_GETARG_UINT32(0);
    StringInfoData buf;

    pq_begintypsend(&buf);
    pq_sendint(&buf, c, 4);
    PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
}

/*
 * color_recv - converts external binary format back to the internal color type
 */
PG_FUNCTION_INFO_V1(color_recv);
Datum color_recv(PG_FUNCTION_ARGS)
{
    StringInfo buf = (StringInfo) PG_GETARG_POINTER(0);
    uint32 c;

    c = pq_getmsgint(buf, 4);
    PG_RETURN_UINT32(c);
}

Figuring out the system catalog ¶

Before testing it out, these functions should be registered in the pg_proc catalog (pg_proc.dat). Entries in pg_proc.dat define the functions associated with your custom data type, specify their return types, and declare the argument types they accept. Here’s what I added to pg_proc.dat for my color type:

{ oid => '7901', proname => 'color_in', prorettype => 'color',
  proargtypes => 'cstring', prosrc => 'color_in' },
{ oid => '7902', proname => 'color_out', prorettype => 'cstring',
  proargtypes => 'color', prosrc => 'color_out' },
{ oid => '7903', proname => 'color_send', prorettype => 'bytea',
  proargtypes => 'color', prosrc => 'color_send' },
{ oid => '7904', proname => 'color_recv', prorettype => 'color',
  proargtypes => 'internal', prosrc => 'color_recv' },

I’ll try to explain what these keys are (as per my understanding):

oid: This field specifies the OID (Object Identifier) of the function. OIDs are unique identifiers for objects in the PostgreSQL system catalog. (I had to find OIDs which aren’t referenced anywhere.) (There is a tool (unused_oids) to help simplify this process!)
proname: The name of the function.
prorettype: It specifies the return data type of the function. In this case, the function color_in returns the color data type.
proargtypes: Defines the argument types of the function. 'cstring' indicates that the input function color_in accepts a single argument of type cstring.
prosrc: The function’s C-language name (link symbol).

And now to register the color type, I added this to the type system catalog (pg_type.dat):

# custom type (color)
{ oid => '7012', array_type_oid => '7201',
  descr => 'color data type',
  typname => 'color', typlen => '4', typbyval => 't', typcategory => 'N',
  typinput => 'color_in', typoutput => 'color_out', typreceive => 'color_recv',
  typsend => 'color_send', typalign => 'i', typstorage => 'x' },

Each of these fields is explained nicely in the documentation. I’ve used typstorage as x because I didn’t know a suitable storage strategy, maybe p is the correct way to do it?

Edit: Thanks to Paul Jungwirth for letting me know that p (plain storage) should be the best choice for typstorage, as the color type is essentially a uint32 and is fixed-width in nature.

Solving compilation errors ¶

I was excited to test it out, so I set up everything:

$ ./configure --prefix=$(pwd)/test-install
$ make -j8 && make install
$ ./test-install/bin/initdb $(pwd)/test-data # Create a database
$ echo "port=8080" > $(pwd)/postgres.conf # This is the minimal postgres run config
$ mkdir test-run # Place for postgres to put lock files
$ ./test-install/bin/postgres --config-file=$(pwd)/postgres.conf -D $(pwd)/test-data -k $(pwd)/test-run

But on compiling, I encountered this error:

Use of uninitialized value in sprintf at genbki.pl line 1027.
unresolved OID reference "color_in" in pg_type.dat field typinput line
Use of uninitialized value in sprintf at genbki.pl line 1027.
unresolved OID reference "color_out" in pg_type.dat field typoutput line
Use of uninitialized value in sprintf at genbki.pl line 1027.
unresolved OID reference "color_recv" in pg_type.dat field typreceive line
Use of uninitialized value in sprintf at genbki.pl line 1027.
unresolved OID reference "color_send" in pg_type.dat field typsend line
make[2]: *** [Makefile:172: bki-stamp] Error 1
make[2]: Leaving directory '/home/burntcarrot/postgresql/src/backend/catalog'
make[1]: *** [Makefile:141: submake-catalog-headers] Error 2
make[1]: Leaving directory '/home/burntcarrot/postgresql/src/backend'
make: *** [src/Makefile.global:385: submake-generated-headers] Error 2

The error is originating from genbki.pl, which is a Perl script which generates postgres.bki:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/genbki.pl;h=380bc23c82e76ec175efd39100f9f76403c2250d;hb=HEAD#l4)

#!/usr/bin/perl
#----------------------------------------------------------------------
#
# genbki.pl
#    Perl script that generates postgres.bki and symbol definition
#    headers from specially formatted header files and data files.
#    postgres.bki is used to initialize the postgres template database.
#
# Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
# Portions Copyright (c) 1994, Regents of the University of California
#
# src/backend/catalog/genbki.pl
#
#----------------------------------------------------------------------

And what is postgres.bki? On searching through the docs, I found this (link):

To create the catalog files and load this initial data into them, a backend running in bootstrap mode reads a BKI (Backend Interface) file containing commands and initial data. The postgres.bki file used in this mode is prepared from the aforementioned header and data files, while building a PostgreSQL distribution, by a Perl script named genbki.pl.

Wow. A TIL moment. But the problem wasn’t solved yet.

The origin of the error was on line 1027, which seemed to have logic for performing OID lookups:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/catalog/genbki.pl;h=380bc23c82e76ec175efd39100f9f76403c2250d;hb=HEAD#l1025)

# ....

# Perform OID lookups on an array of OID names.
# If we don't have a unique value to substitute, warn and
# leave the entry unchanged.
# (We don't exit right away so that we can detect multiple problems
# within this genbki.pl run.)
sub lookup_oids
{
	my ($lookup, $catname, $attname, $lookup_opt, $bki_values, @lookupnames)
	  = @_;

	my @lookupoids;
	foreach my $lookupname (@lookupnames)
	{
		my $lookupoid = $lookup->{$lookupname};
		if (defined($lookupoid) and $lookupoid ne 'MULTIPLE')
		{
			push @lookupoids, $lookupoid;
		}
		else
		{
			push @lookupoids, $lookupname;
			if ($lookupname eq '-' or $lookupname eq '0')
			{
				if (!$lookup_opt)
				{
					warn sprintf
					  "invalid zero OID reference in %s.dat field %s line %s\n",
					  $catname, $attname, $bki_values->{line_number};
					$num_errors++;
				}
			}
			else
			{
				warn sprintf
				  "unresolved OID reference \"%s\" in %s.dat field %s line %s\n",
				  $lookupname, $catname, $attname, $bki_values->{line_number};
				$num_errors++;
			}
		}
	}
	return @lookupoids;
}

# ....

And the error message reflects that too:

unresolved OID reference "color_out" in pg_type.dat field typoutput line

“Wait. How is my function not accessible to the Perl script?”

After some debugging, I found out that there was a Meson config in the ADT directory, and it was missing my ADT’s source file, so I added it:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/meson.build;h=8515cd93650492cb79c74f58a5123a4ca59d9e31;hb=HEAD)

# Copyright (c) 2022-2023, PostgreSQL Global Development Group

backend_sources += files(
  'acl.c',
  'amutils.c',
  'array_expanded.c',
  'array_selfuncs.c',
  'array_typanalyze.c',
  'array_userfuncs.c',
  'arrayfuncs.c',
  'arraysubs.c',
  'arrayutils.c',
  'ascii.c',
  'bool.c',
  'cash.c',
  'char.c',
  'color.c', <-- added my ADT
  'cryptohashfuncs.c',
  'date.c',
  'datetime.c',
  ....

I also needed to modify the Makefile inside src/backend/utils/adt to include the object file for my ADT:

(https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/Makefile;h=0de0bbb1b8ac7ccb60ea1ab39dbadf7004ea7586;hb=HEAD)

#
# Makefile for utils/adt
#
# src/backend/utils/adt/Makefile
#

subdir = src/backend/utils/adt
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global

override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)

# keep this list arranged alphabetically or it gets to be a mess
OBJS = \
	acl.o \
	amutils.o \
	array_expanded.o \
	array_selfuncs.o \
	array_typanalyze.o \
	array_userfuncs.o \
	arrayfuncs.o \
	arraysubs.o \
	arrayutils.o \
	ascii.o \
	bool.o \
	cash.o \
	char.o \
	color.o \  <-- added my ADT
	cryptohashfuncs.o \
	....

Now, on trying again, the compilation process worked perfectly. Time to test it out!

Edit: If you change the system catalog (.dat files), you would need to re-initalize the DB:

$ rm -rf test-data
$ ./test-install/bin/initdb $(pwd)/test-data # Re-create the database

Initial test with the color type ¶

I created a table with my color type:

postgres=# CREATE TABLE color (val color);
CREATE TABLE
postgres=# INSERT INTO color VALUES ('FF0000');
INSERT 0 1
postgres=# SELECT * FROM color;
  val
--------
 FF0000
(1 row)

postgres=# \d color
              Table "public.color"
 Column | Type  | Collation | Nullable | Default
--------+-------+-----------+----------+---------
 val    | color |           |          |

A sigh of relief. It worked!

This was a good start, but the type is of no use if we can’t perform operations on top of it. For getting started, I wanted to go ahead with an operator that “mixes” two colors and returns the result of type color.

Implementing operators for the color type ¶

Color addition is tricky, there are different techniques to perform it; but to keep things simple, I chose additive mixing as the algorithm for color addition. I was able to implement this by taking help from some articles I found online:

/*
 * color_add - adds two values of type color and returns the result
 */
PG_FUNCTION_INFO_V1(color_add);
Datum color_add(PG_FUNCTION_ARGS)
{
    color arg1 = PG_GETARG_UINT32(0);
    color arg2 = PG_GETARG_UINT32(1);

    // Extract the Red (R) channel component of arg1 and arg2 by shifting the bits
    // to the right and applying a bit mask to isolate the 8 bits representing the Red channel.
    uint32 r1 = (arg1 >> COLOR_RED_SHIFT) & COLOR_CHANNEL_MASK;
    uint32 r2 = (arg2 >> COLOR_RED_SHIFT) & COLOR_CHANNEL_MASK;

    // Extract the Green (G) channel component.
    uint32 g1 = (arg1 >> COLOR_GREEN_SHIFT) & COLOR_CHANNEL_MASK;
    uint32 g2 = (arg2 >> COLOR_GREEN_SHIFT) & COLOR_CHANNEL_MASK;

    // Extract the Blue (B) channel component.
    uint32 b1 = (arg1 >> COLOR_BLUE_SHIFT) & COLOR_CHANNEL_MASK;
    uint32 b2 = (arg2 >> COLOR_BLUE_SHIFT) & COLOR_CHANNEL_MASK;

    // Add the Red (R) channel components, ensuring that the result does not exceed
    // the maximum value represented by COLOR_CHANNEL_MASK (0xFF), which corresponds
    // to the upper limit of the Red channel. This prevents overflow.
    uint32 r = Min(r1 + r2, COLOR_CHANNEL_MASK);

    // Add the Green (G) channel components.
    uint32 g = Min(g1 + g2, COLOR_CHANNEL_MASK);

    // Add the Blue (B) channel components.
    uint32 b = Min(b1 + b2, COLOR_CHANNEL_MASK);

    // Reconstruct the color by left-shifting the Red (R), Green (G), and Blue (B)
    // components back to their original bit positions and combining them with the OR (|) operation.
    color result_color = (r << COLOR_RED_SHIFT) | (g << COLOR_GREEN_SHIFT) | (b << COLOR_BLUE_SHIFT);

    // Return the final color as a uint32 value.
    PG_RETURN_UINT32(result_color);
}

In addition to this, I thought it would be a good idea to add support for equality and inequality operators as the implementation would be fairly easy, due to the internal representation of color being a uint32:

/*
 * color_eq - checks for equality between two values of type color
 */
PG_FUNCTION_INFO_V1(color_eq);
Datum color_eq(PG_FUNCTION_ARGS)
{
    color arg1 = PG_GETARG_UINT32(0);
    color arg2 = PG_GETARG_UINT32(1);

    PG_RETURN_BOOL(arg1 == arg2);
}

/*
 * color_ne - checks for inequality between two values of type color
 */
PG_FUNCTION_INFO_V1(color_ne);
Datum color_ne(PG_FUNCTION_ARGS)
{
    color arg1 = PG_GETARG_UINT32(0);
    color arg2 = PG_GETARG_UINT32(1);

    PG_RETURN_BOOL(arg1 != arg2);
}

Nice! Now, I had support for these operators: =, <> and +.

To register these functions in the catalog (pg_proc.dat), I followed the same process:

{ oid => '7905', proname => 'color_eq', prorettype => 'bool',
  proargtypes => 'color color', prosrc => 'color_eq' },
{ oid => '7906', proname => 'color_ne', prorettype => 'bool',
  proargtypes => 'color color', prosrc => 'color_ne' },
{ oid => '7907', proname => 'color_add', prorettype => 'color',
  proargtypes => 'color color', prosrc => 'color_add' },

In order to register operators for the color type, I added the following to the pg_operator system catalog (pg_operator.dat):

# custom color type

{ oid => '7915', descr => 'equal',
  oprname => '=', oprcanmerge => 't', oprcanhash => 't', oprleft => 'color',
  oprright => 'color', oprresult => 'bool', oprcom => '=(color,color)',
  oprnegate => '<>(color,color)', oprcode => 'color_eq' },
{ oid => '7916', descr => 'not equal',
  oprname => '<>', oprcanmerge => 't', oprcanhash => 't', oprleft => 'color',
  oprright => 'color', oprresult => 'bool', oprcom => '<>(color,color)',
  oprnegate => '=(color,color)', oprcode => 'color_ne' },
{ oid => '7917', descr => 'add',
  oprname => '+', oprleft => 'color', oprright => 'color', oprresult => 'color',
  oprcom => '+(color,color)', oprcode => 'color_add' },

Here’s what these fields mean:

oid: OID (Object Identifier)
descr: A short description of the operator.
oprname: The name of the operator. It is the symbol or name used to invoke the operator, for example: =, <>, or +.
oprcanmerge: This is a boolean flag (t/f) indicating whether the operator supports merge joins. (I’ve kept it as t for now, but I’d be glad to know the suitable value for this.)
oprcanhash: This is a boolean flag (t/f) indicating whether the operator supports hash joins. (I’ve kept it as t for now.)
oprleft: The left operand data type of the operator.
oprright: The right operand data type of the operator.
oprresult: The data type of the result produced by the operator.
oprcom: This specifies the commutator operator. The commutator is another operator that has the same functionality but with the operand order reversed. For example, = and = are commutator operators of each other.
oprnegate: The negate operator for a comparison operator. It represents the opposite of the operator. For =, its negate is <>.
oprcode: This refers to the function implementing the operator.

The final test ¶

To test it out, I recompiled and was excited to see if the operators worked perfectly:

postgres=# CREATE TABLE color (val color);
CREATE TABLE
postgres=# INSERT INTO color VALUES ('FF0000');
INSERT 0 1
postgres=# SELECT * FROM color;
  val
--------
 FF0000
(1 row)

postgres=# SELECT val+'00FF00' FROM color;
 ?column?
----------
 FFFF00
(1 row)

postgres=# SELECT 'FF0000'::color + '00FF00'::color;
 ?column?
----------
 FFFF00
(1 row)


postgres=# SELECT * FROM color WHERE val='FF0000';
  val
--------
 FF0000
(1 row)

postgres=# SELECT * FROM color WHERE val<>'FF0000';
 val
-----
(0 rows)

Yay! The operators work as intended, and I was finally relieved to see how it turned out. It may not be perfect, but it was exciting to work on!

I’ve pushed these changes here, please feel free to check it out!

Conclusion ¶

Yes, I should have RTFM’d it. But honestly, this was more fun. If I had already knew about which routines to use, how to structure things, it wouldn’t have been interesting.

I spent 2 days on this, and I’d like to say: it was worth it. I’ll take a break, enjoy a few games and come back to hacking Postgres once again (yes, I have something planned).

Please email (contact <at> databases.systems) or DM on Twitter if you have any questions, corrections, or if you are hiring.