Christopher B. Browne's Home Page
cbbrowne@acm.org

2. Slony-I "Best Practices"

It is common for managers to have a desire to operate systems using some available, documented set of "best practices." Documenting that sort of thing is essential to ISO 9000, ISO 9001, and other sorts of organizational certifications.

It is worthwhile to preface a discussion of "best practices" by mentioning that each organization that uses Slony-I is unique, and there may be a need for local policies to reflect unique local operating characteristics. It is for that reason that Slony-I does not impose its own policies for such things as failover ; those will need to be determined based on the overall shape of your network, of your set of database servers, and of your usage patterns for those servers.

There are, however, a number of things that early adopters of Slony-I have discovered which can at least help to suggest the sorts of policies you might want to consider.

  1. Slony-I is a complex multi-client, multi-server system, with the result that there are almost an innumerable set of places where problems can arise.

    As a natural result, maintaining a clean, consistent environment is really valuable, as any sort of environmental "messiness" can either cause unexpected problems or mask the real problem.

    Numerous users have reported problems resulting from mismatches between Slony-I versions, local libraries, and PostgreSQL libraries. Details count: you need to be clear on what hosts are running what versions of what software.

    This is normally a matter of being disciplined about how your software is deployed, and the challenges represent a natural consequence of being a distributed system comprised of a large number of components that need to match.

  2. If a slonik script does not run as expected in a first attempt, it would be foolhardy to attempt to run it again until a problem has been found and resolved.

    There are a very few slonik commands such as SLONIK STORE PATH(7) that behave in a nearly idempotent manner; if you run SLONIK STORE PATH(7) again, that merely updates table sl_path with the same value.

    In contrast SLONIK SUBSCRIBE SET(7) behaves in two very different ways depending on whether the subscription has been activated yet or not; if initiating the subscription didn't work at a first attempt, submitting the request again won't help make it happen.

  3. Principle: Use an unambiguous, stable time zone such as UTC or GMT.

    Users have run into problems with slon(1) functioning properly when their system uses a time zone that PostgreSQL was unable to recognize such as CUT0 or WST. It is necessary that you use a timezone that PostgreSQL can recognize correctly. It is furthermore preferable to use a time zone where times do not shift around due to Daylight Savings Time.

    The "geographically unbiased" choice seems to be TZ=UTC or TZ=GMT, and to make sure that systems are "in sync" by using NTP to synchronize clocks throughout the environment.

    See also Section 3.4.

  4. Principle: Long running transactions are Evil

    The FAQ has an entry on growth of pg_listener which discusses this in a fair bit of detail; the long and short is that long running transactions have numerous ill effects. They are particularly troublesome on an "origin" node, holding onto locks, preventing vacuums from taking effect, and the like.

    In version 1.2, some of the "evils" should be lessened, because:

    • Events in pg_listener are only generated when replication updates are relatively infrequent, which should mean that busy systems won't generate many dead tuples in that table

    • The system will periodically rotate (using TRUNCATE to clean out the old table) between the two log tables, sl_log_1 and sl_log_2, preventing unbounded growth of dead space there.

  5. Failover policies should be planned for ahead of time.

    Most pointedly, any node that is expected to be a failover target must have its subscription(s) set up with the option FORWARD = YES. Otherwise, that node is not a candidate for being promoted to origin node.

    This may simply involve thinking about what the priority lists should be of what should fail to what, as opposed to trying to automate it. But knowing what to do ahead of time cuts down on the number of mistakes made.

    At Afilias, a variety of internal [The 3AM Unhappy DBA's Guide to...] guides have been created to provide checklists of what to do when certain "unhappy" events take place. This sort of material is highly specific to the environment and the set of applications running there, so you would need to generate your own such documents. This is one of the vital components of any disaster recovery preparations.

  6. SLONIK MOVE SET(7) should be used to allow preventative maintenance to prevent problems from becoming serious enough to require failover .

  7. VACUUM policy needs to be carefully defined.

    As mentioned above, "long running transactions are Evil." VACUUMs are no exception in this. A VACUUM on a huge table will open a long-running transaction with all the known ill effects.

  8. If you are using the autovacuum process in recent versions of PostgreSQL, you may wish to leave Slony-I tables out, as Slony-I is a bit more intelligent about vacuuming when it is expected to be conspicuously useful (e.g. - immediately after purging old data) to do so than autovacuum can be.

    See Section 6.1 for more details.

  9. Running all of the slon(1) daemons on a central server for each network has proven preferable.

    Each slon(1) should run on a host on the same local network as the node that it is servicing, as it does a lot of communications with its database, and that connection needs to be as reliable as possible.

    In theory, the "best" speed might be expected to come from running the slon(1) on the database server that it is servicing.

    In practice, strewing slon(1) processes and configuration across a dozen servers turns out to be inconvenient to manage.

  10. slon(1) processes should run in the same "network context" as the node that each is responsible for managing so that the connection to that node is a "local" one. Do not run such links across a WAN. Thus, if you have nodes in London and nodes in New York, the slon(1)s managing London nodes should run in London, and the slon(1)s managing New York nodes should run in New York.

    A WAN outage (or flakiness of the WAN in general) can leave database connections "zombied", and typical TCP/IP behaviour will allow those connections to persist, preventing a slon restart for around two hours.

    It is not difficult to remedy this; you need only kill SIGINT the offending backend connection. But by running the slon(1) locally, you will generally not be vulnerable to this condition.

  11. Before getting too excited about having fallen into some big problem, consider killing and restarting all the slon(1) processes. Historically, this has frequently been able to resolve "stickiness."

    With a very few exceptions, it is generally not a big deal to kill off and restart the slon(1) processes. Each slon(1) connects to one database for which it is the manager, and then connects to other databases as needed to draw in events. If you kill off a slon(1), all you do is to interrupt those connections. If a SYNC or other event is sitting there half-processed, there's no problem: the transaction will roll back, and when the slon(1) restarts, it will restart that event from scratch.

    The exception scenario where it is undesirable to restart a slon(1) is where a COPY_SET is running on a large replication set, such that stopping the slon(1) may discard several hours worth of load work.

    In early versions of Slony-I, it was frequently the case that connections could get a bit "deranged" which restarting slon(1)s would clean up. This has become much more rare, but it has occasionally proven useful to restart the slon(1). If there has been any "network derangement", this can clear up the issue of defunct database connections.

  12. The Database Schema Changes section outlines some practices that have been found useful for handling changes to database schemas.

  13. Handling of Primary Keys

    Discussed in the section on Replication Sets, it is ideal if each replicated table has a true primary key constraint; it is acceptable to use a "candidate primary key."

  14. Grouping tables into sets suggests strategies for determining how to group tables and sequences into replication sets.

  15. It should be obvious that actions that can delete a lot of data should be taken with great care; the section on Dropping things from Slony-I Replication discusses the different sorts of "deletion" that Slony-I supports.

  16. Locking issues

    Certain Slony-I operations, notably set add table , move set , lock set , and execute script require acquiring exclusive locks on the tables being replicated.

    Depending on the kind of activity on the databases, this may or may not have the effect of requiring a (hopefully brief) database outage.

  17. What to do about DDL.

    Slony-I operates via detecting updates to table data via triggers that are attached to those tables. That means that updates that take place via methods that do not fire triggers will not notice those updates. ALTER TABLE, CREATE OR REPLACE FUNCTION, CREATE TABLE, all represent SQL requests that Slony-I has no way to notice.

    A philosophy underlying Slony-I's handling of this is that competent system designers do not write self-modifying code, and database schemas that get modified by the application are an instance of this. It does not try hard to make it convenient to modify database schemas.

    There will be cases where that is necessary, so the execute script is provided which will apply DDL changes at the same location in the transaction stream on all servers.

    Unfortunately, this introduces a great deal of locking of database objects. Altering tables requires taking out an exclusive lock on them; doing so via execute script requires that Slony-I take out an exclusive lock on all replicated tables. This can prove quite inconvenient if applications are running when running DDL; Slony-I is asking for those exclusive table locks, whilst, simultaneously, some application connections are gradually relinquishing locks, whilst others are backing up behind the Slony-I locks.

    One particularly dogmatic position that some hold is that all schema changes should always be propagated using execute script. This guarantees that nodes will be consistent, but the costs of locking and deadlocking may be too high for some users.

    At Afilias, our approach has been less dogmatic; there are sorts of changes that must be applied using execute script, but we apply others independently.

    • Changes that must be applied using execute script

      • All instances of ALTER TABLE

    • Changes that are not normally applied using execute script

      • CREATE INDEX

      • CREATE TABLE

        Tables that are not being replicated do not require Slony-I "permission".

      • CREATE OR REPLACE FUNCTION

        Typically, new versions of functions may be done without Slony-I being "aware" of them. The obvious exception is when a new function is being deployed to accomodate a table alteration; in that case, the new version must be added in in a manner synchronized with the execute script for the table alteration.

        Similarly, CREATE TYPE, CREATE AGGREGATE , and such will commonly not need to be forcibly applied in "perfectly synchronized" manner across nodes.

      • Security management, such as CREATE USER, CREATE ROLE , GRANT, and such are largely irrelevant to Slony-I - it runs in an orthogonal role to that.

        Indeed, we have frequently found it useful to have different security arrangements on different nodes. Access to the "master" node should be restricted to applications that truly need access to it; "reporting" users commonly are restricted much more there than on subscriber nodes.

  18. Slony-I-specific user names.

    It has proven useful to define a slony user for use by Slony-I, as distinct from a generic postgres or pgsql user.

    If all sorts of automatic "maintenance" activities, such as vacuuming and performing backups, are performed under the "ownership" of a single PostgreSQL user, it turns out to be pretty easy to run into deadlock problems.

    For instance, a series of vacuums that unexpectedly run against a database that has a large SUBSCRIBE_SET event under way may run into a deadlock which would roll back several hours worth of data copying work.

    If, instead, different maintenance roles are performed by different users, you may, during vital operations such as SUBSCRIBE_SET, lock out other users at the pg_hba.conf level, only allowing the slony user in, which substantially reduces the risk of problems while the subscription is in progress.

  19. Path configuration

    The section on Path Communications discusses the issues surrounding what network connections need to be in place in order for Slony-I to function.

  20. Lowering Authority Usage

    Traditionally, it has been stated that "Slony-I needs to use superuser connections." It turns out that this is not actually true, and and if there are particular concerns about excessive use of superuser accounts, it is possible to reduce this considerably.

    It is simplest to have the replication management user be a superuser, as, in that case, one need not think about what permissions to configure.

    There is only actually one place where Slony-I requires superuser access, and that is during the subscription process, as it uses a "hack" to substantially improve performance, updating the system catalogue directly to shut off indices during the COPY. This functionality is restricted to two functions, as of version 2.1, with disable_indexes_on_table() and enable_indexes_on_table(), which may appropriate superuser permissions if configured with SECURITY DEFINER. If these functions are owned by a superuser, then that is all the database superuser access that is required.

    One could set up a "weak user" assigned to all SLONIK STORE PATH(7) requests. The minimal permissions that this user, let's call it weakuser, requires are as follows:

    • It must have read access to the Slony-I-specific namespace

    • It must have read access to all tables and sequences in that namespace

    • It must have write access to the Slony-I table sl_nodelock and sequence sl_nodelock_nl_conncnt_seq

    • At subscribe time, it must have read access to all of the replicated tables.

      Outside of subscription time, there is no need for access to access to the replicated tables.

    • There is some need for read access to tables in pg_catalog; it has not been verified how little access would be suitable.

    • As mentioned, the functions for disabling and re-enabling indexes during the subscription process must be owned by a superuser, and must be accessible to the "weak user."

    In version 1.3, the tests in the Section 23 support using a WEAKUSER so that testing can regularly confirm (for the most part) the minimal set of permissions needed to support replication.

  21. Further lowering authority usage.

    We may further reclassify required access privileges between three roles:

    • Privileges for the user that slon(1) uses to manage its own node.

      The slon(1) requires read and write access to the Slony-I tables and schema, and write access to the tables and schemas containing replicable/replicated data, on the local node that it is managing.

      At the time that SLONIK EXECUTE SCRIPT(7) runs, sufficient access to the local node is required to perform whatever the script requires.

    • Privileges for the user that slon(1) uses to access remote nodes.

      This is always exclusively read-only access, and a separate PostgreSQL role (and/or user) could be set up which only offers read access to:

      • Slony-I's own tables, in the schema created for that purpose.

      • Replicated tables, which is only used during the SLONIK SUBSCRIBE SET(7) event.

    • At installation time, slonik(1) requires sufficient access to create the new schema containing replication-specific tables.

    Note that despite Slony-I's agnosticism as to node roles, as to whether they are "masters" or slaves, the set of privileges needed should not need to vary.

  22. The section on listen paths discusses the issues surrounding the table sl_listen.

    As of Slony-I 1.1, its contents are computed automatically based on the communications information available to Slony-I which should alleviate the problems found in earlier versions where this had to be configured by hand. Many seemingly inexplicable communications failures, where nodes failed to talk to one another even though they technically could, were a result of incorrect listen path configuration.

  23. Run Section 5.1 frequently to discover configuration problems as early as possible.

    This is a Perl script which connects to a Slony-I node and then rummages through Slony-I configuration looking for quite a variety of conditions that tend to indicate problems, including:

    • Bloating of some config tables

    • Analysis of listen paths

    • Analysis of event propagation and confirmation

    If replication mysteriously "isn't working", this tool can run through many of the possible problems for you.

    It will also notice a number of sorts of situations where something has broken. Not only should it be run when problems have been noticed - it should be run frequently (e.g. - hourly, or thereabouts) as a general purpose "health check" for each Slony-I cluster.

  24. Configuring slon(1)

    As of version 1.1, slon(1) configuration may be drawn either from the command line or from configuration files. "Best" practices have yet to emerge from the two options:

It may be worth considering turning the PostgreSQL fsync functionality off during the copying of data, as this will improve performance, and if the database "falls over" during the COPY_SET event, you will be restarting the copy of the whole replication set.

Google

If this was useful, let others know by an Affero rating

Contact me at cbbrowne@acm.org