Postgres and other musings

Hibernating and Restoring Postgres Buffer Cache

With the introduction of pg_prewarm extension in Postgres, it has become very easy to save and restore the contents of Postgres server’s buffer cache across a server restart. The tables, indexes and other on-disk data structures (like TOAST data, Visibility Map, etc.) are cached in shared-buffers before they can be read or modified by user queries. Hence, by saving the shared-buffers before a server restart, and restoring those buffers after the restart, you can expect to reduce the ramp-up time, and hence expect peak performance almost right off the bat.

Following is a brain-dead way of implementing hibernation, and I am sure it can be optimized to reduce the number of calls to pg_prewarm, since it allows the caller to specify a range of blocks to be loaded, and I’m asking it to load just one block per call.

We’re going to use pg_buffercache to extract the list of buffers currently loaded in buffer cache, and after a server restart, pg_prewarm will be used to load those buffers back in.

Declare environment variables to be used by scripts

HIBERNATE_DESTINATION=$HOME/pg_hibernate/

export PGUSER=postgres
PSQL="psql"
PSQL_TEMPL_DB="$PSQL -d template1"
PSQL_PG_DB="$PSQL -d postgres"

PSQL_TEMPL_NOFLUFF="$PSQL_TEMPL_DB -A -t"
PSQL_PG_NOFLUFF="$PSQL_PG_DB -A -t"

Prepare the databases

This is a one time operation. We avoid installing anything into template0 database, since it is a read-only database. But we do consciously install the extension into template1 database; this is so that any new databases created after this point will get this extension pre-installed.

Install extensions

for db in $($PSQL_TEMPL_NOFLUFF -c 'select datname from pg_database where datname <> $$template0$$'); do
  echo Installing pg_prewarm in $db
  $PSQL_TEMPL_DB -c 'create extension if not exists pg_prewarm;'
done

echo Installing pg_buffercache extension in postgres database
$PSQL_PG_DB -c 'create extension if not exists pg_buffercache;'

Save buffer information

We are actually generating a psql script, that can be later fed to psql, as is.

mkdir -p $HIBERNATE_DESTINATION

for db in $($PSQL_TEMPL_NOFLUFF -c 'select datname from pg_database where datname <> $$template0$$'); do
  $PSQL_PG_NOFLUFF -c 'select    $q$select pg_prewarm((    select    oid
                                                         from      pg_class
                                                         where     pg_relation_filenode(oid) = $q$ || relfilenode || $q$)::regclass,
                                                     $$buffer$$, $q$
                                                     || case relforknumber
                                                        when 0 then $q$$$main$$$q$
                                                        when 1 then $q$$$fsm$$$q$
                                                        when 2 then $q$$$vm$$$q$
                                                        when 3 then $q$$$init$$$q$
                                                        end || $q$, $q$
                                                     || relblocknumber || $q$, $q$
                                                     || relblocknumber || $q$);$q$
                                    from     pg_buffercache
                                    where    reldatabase = (select    oid
                                                            from      pg_database
                                                            where     datname = $$'${db}'$$)
                                    order by relfilenode, relforknumber, relblocknumber;' > $HIBERNATE_DESTINATION/${db}.save
done

Restore the buffers

Ideally, this would be performed after a server restart, but there’s no harm in doing it without the restart either (except the performance implications, if doing this eveicts buffers currently in use by other connections ;). Note that if some table/index’s underlying file storage has been renamed by the server since we extracted the buffers list, some of these calls will return with ERROR.

We connect to each database, and simply feed the script generated earlier, into psql.

for db in $($PSQL_TEMPL_NOFLUFF -c 'select datname from pg_database where datname <> $$template0$$'); do
  if [ ! -e $HIBERNATE_DESTINATION/${db}.save ]; then
    continue
  fi
  $PSQL -d $db -f $HIBERNATE_DESTINATION/${db}.save
done

At this point, the Postgres shared-buffers should contain all the buffers that were present when we extracted the buffer list from pg_buffercache.

PS: When reviewing the pg_prewarm code, I did not think through the user-experience aspect of this extension. But after it was committed, the more I thought about how it’s going to be used, the less appealing this solution became. As is evident from above, the administrator needs the help of two extensions, and then a storage location where to store the list of buffers. Ideally, as a DBA, I would like to see a feature which doesn’t require me to muck around with catalogs etc. I have a design for such an extension, and may start coding it some time soon.

Comments