Backup MX

From FixForwarding
Jump to navigationJump to search

A backup MX, also called a secondary MX, is an optional server that accepts mail for the target domain on behalf of the primary MX. It does not store messages in users' mailboxes, but will attempt to relay them to a primary server.

Why have a backup MX

There are two main reasons why a primary MX may not be reachable: Server or network outages.

Server outages are a problem also for retrieving mail, via IMAP or POP. It may make sense to run a backup MX in case backup hosts are available for retrieving mail, in a fault-tolerant infrastructure. Minor annoyance may result from non-compliant transmitters that are configured with an unacceptably low retry interval. According to RFC 5321, the give-up time generally needs to be at least 4-5 days. A few postmasters configure much shorter intervals, perhaps because they think that delayed DSNs are difficult to understand for their users. That way, they won't be able to cope with a target host being temporarily down for a few days.

Network outages are more subtle. There was an era when backup MXes were useful to deal with networks that didn't have routes to some parts of the net. That's a pretty small niche now. However, asymmetric routing faults are still pretty common, especially in developing countries. A conservative setup may provide for a backup MX somewhere else that has a backup link to get mail to the primary when regular connectivity isn't working. An additional concern is that backbone links may go down due to solar activity, terrorism, or war. A backup MX in Australia, for example, may make a message to be timely delivered from the US to Europe even while the Atlantic link is down.

A third reason for setting up a "backup MX" is to explicitly acquire connectivity even when no outage disrupts normal operations. For example, the target host is on Mars and can be accessed, easily and normally, by other Martian hosts. However, for a host on Earth, the target host will rarely be reachable via a normal TCP/SMTP connection and the intermediate MX host may have the needed delay-tolerant machinery installed[1] On the Earth, a more common example is provided by (large) enterprises who operate dedicated boundary servers as their MX platforms, and handle final delivery by an entirely different set of servers often totally invisible to the outside user.[2]

In any case, SMTP provides for a default for transferring mail from the backup MX to the target host. That is not a requirement, though. LMTP (RFC 2033), UUCP (RFC 976), or any any other mutually-acceptable transport mechanism may do.

Google

Digging for gmail.com's MXes brings about quite some:

   ;; ANSWER SECTION:
   gmail.com.		473	IN	MX	30 alt3.gmail-smtp-in.l.google.com.
   gmail.com.		473	IN	MX	5 gmail-smtp-in.l.google.com.
   gmail.com.		473	IN	MX	10 alt1.gmail-smtp-in.l.google.com.
   gmail.com.		473	IN	MX	40 alt4.gmail-smtp-in.l.google.com.
   gmail.com.		473	IN	MX	20 alt2.gmail-smtp-in.l.google.com.

They do so for two reasons: most mailers will retry immediately on some connection errors (even as late as starttls or helo) but only if you have "more" hosts/IPs for them to try. With load balancers, there's no other way to tell remote servers to try again quickly.

The other reason is that they are able to vend an IP to DNS requests based on load/availability and closeness to the requester. Their alt addresses vend other data centers down the list, which helps move the traffic in the first minutes of unavailability before the automated systems catch up and the DNS ttl expires.

Ah, and they also used multiple MXes to roll out changes. For example, when they rolled out ipv6 addresses, they added it on a higher number MX first.

Project Tar and nolisting

The approach preferred by Grant Taylor:

  1. Point the primary MX at a server with nothing listening. It will send TCP Resets —known as "No Listing", a variant of "Grey Listing".— I have yet to see any negative side effects with this.
  2. Point the secondary MX at your main mail server —Business as usual.
  3. Optionally - Point the tertiary at your backup mail server.
  4. Point the last MX at something like Project Tar.

Another participant report getting some hours delay with nolisting, and prefers (smart) greylisting.

Why not to have a backup MX

With multiple connections and good hardware, it is not difficult to have 99+% uptime. Considering that performance an indicator of future results, backup MXes can be ruled out. However, the real reason for doing so is backscatter.

According to current standards and best practices, in order to setup a backup MX for example.com, it is enough to

  1. configure the backup server, e.g. backup.example.org, to accept mail for example.com, and
  2. add a record in example.com DNS zone, saying that backup.example.org is an MX server with less preference (higher number) than the primary.

In principle, a client should attempt to deliver mail at the target domain's preferred (primary) server. However, spammers are exempted, and it is not easy for a secondary server to establish whether a client could have reached the primary at a given time (if the secondary can reach the primary, the primary may attempt to trace its route to the client and communicate the result back to the secondary.) A backup MX should accept mail destined to any user of the domain backed up (example.com), unless it has a copy of the users' database.

Currently, organizations that coordinate several servers across various networks, may internally arrange for a cache of the users database to be available at a backup MX's. Organizations whose networks would require to outsource them, while primary MXes are contracted in, cannot do that. They are better off avoiding a backup MX, because mail accepted by the backup for nonexistent users would result in backscatter. Thus, they may be in for a hard surprise in case connections worsen, e.g. because of solar flares[3].

Alternatives and requirements

The solution proposed for forwarding may somewhat lend itself for also doing backup MXes. That scenario implies that a backup MX has implicit forwarding recipes for all users to all more preferred servers in the domain being backed up. The backup server would still have to negotiate a forwarding agreement in each case where a message actually has to go through for the first time. (Existing recipes are to be searched before implicit ones.)

A cute trick to initialize the system so that it doesn't have to reply 4xx codes until agreements are finalized, which is likely to happen if the primary host is down, is to provide a list of hashed email addresses. That way, a backup MX may check that a given address actually exists, and accept it, being confident that the primary server will accept it in turn. The forwarding agreement will be negotiated when the primary (or a more preferred secondary) comes up. That circumstance more or less coincides with the moment when the cleartext address corresponding to the hashed entry becomes known at the server. Since provisions for accepting forwarding agreements from a backup MX should have been set up already at the primary's, we can regard as substantially irrelevant the short period of time while the first message for a given address remains stored in the backup MX's cache. (During that time, the server knows the email address without being bound by an agreement.)

Besides the list of hashes, the initialization data, may contain regular expressions to account for subaddresses, prefixes, catch-all's, honeypots, etcetera. More meta data may be needed. Provisions to periodically refresh the initialization data are also needed. Changes in the DNS that add or remove MXes, or just change preferences, have to be dealt with.

End users should not be able to delete their forwarding agreement at a backup MX, unless they also delete their mailbox at the primary MX. Blocking such agreements, i.e. leaving an existing recipe in place while configuring it to reply 551 User not local, may result in some disruption and noncompliance. Such blocking may be disallowed by domain policies. At any rate, privacy concerned users will gain the ability to retrieve the effective list of servers where their email addresses are stored (a somewhat idealistic wish, until some marketing function will say otherwise.)

See also

References

  1. John Klensin, ietf-smtp mailing list, Sat, 23 May 2009 11:10:48 -0400
  2. Malcolm Weir, courier-users, Wed, 22 Sep 2010 16:00:19 -0700
  3. NASA-Funded Study Reveals Hazards of Severe Space Weather 1 May 2009