Using DNS TTL to control migrations

Often when you’re moving services from one piece of hardware/location to another it will involve a DNS change. From my experience the DNS change is usually the final change that’s used to move the traffic.

DNS entries can have TTLs. TTL means “time to live” and is the expiry time of the record. For a normal running website you could expect a TTL of 86400 (seconds) or one day. This means that once a DNS server or other DNS client has requested the record it’ll hold onto a cached copy for up to a day before re-requesting it.

If you were to leave your TTL at 86400 and change the DNS entry to point to your new server it could take up to a day for the changeover to happen.

Let’s consider two common use cases:

You want the DNS switchover to happen quickly

Assuming you have a TTL of 86400 make sure you reduce the TTL at least before you want to perform the migration.

I would normally change the TTL to 3600 (1 hour) the day before the planned migration. Then at least one hour before the migration time reduce the TTL down to 600 secs (10 mins). Then at least 10 mins before down to 60 (1 min).

Then make the change and your traffic should flip over to the new IP address pretty quickly and if anything goes wrong you can change the entry back and all traffic should fail back within a minute.

After you’re confident all is working well from the new host you should increase the TTL back up to it’s normal setting (in steps if you want).

You want to move the traffic gradually over to a new service

If you’re not too fussed about which server the traffic hits (e.g. serving static content from file-synced servers) then you might want the traffic to move over gradually. This is a nice approach if you want more of a “soft launch” and don’t want to risk something bad happening to 100% of your traffic if there are problems on the new hardware.

In this case a larger TTL might be desirable. I’d probably go for an hour but it really depends on the situation.

You’d follow similar steps to the ones described above slowly reducing down the TTL till it’s at the value you want. Now modify the DNS record to point to your service but as you do this set a low TTL on the new record.

The low TTL on the new record won’t effect the speed that the new record rolls out but it does mean that if you need to fail back then entries should be re-cached quickly.

When everything’s failed over nicely increase the TTL again.

Why not keep my TTL low all the time then?

You can do but it’s generally not accepted as good practice. It’ll generate much more DNS traffic to your authoritative DNS servers too as other resolvers will need to re-cache entries.

Exceptions

These might catch you out:

  • Lots of OSes/browsers will cache a DNS entry for a minimum of 30 minutes so TTLs less than this might not be respected
  • Some caching name servers ignore the published TTL and will apply their own minimum (this is out of RFC and really frustrating)
Advertisements

About James Cohen
LAMP geek with interests in building scalable web applications

6 Responses to Using DNS TTL to control migrations

  1. Pingback: Migrating servers using DNS TTL for minimum downtime « Virendra's Blog

  2. Pingback: Migrating servers using DNS TTL for minimum downtime - Virendra's Blog

  3. Pingback: N00b to 1337-DNS

  4. jamesm says:

    Over 3 years old and still awesome! Thanks for the tips 🙂

  5. Jon says:

    Over four years old and still awesome. Great find for me, going into my snippets. Thanks for taking the time to write this.

  6. It should be noted that many ISPs “illegally” ignore TTLs and cache lookup for a minimum period of their own choosing, which can vary between minutes and hours. You *will* get overlap (old and new site accessed simultaneously) for more than a minute after the site IP changes, even with a TTL set to 60 seconds.

    The best thing to do is to put a maintenance page up on the old site once the DNS has changed (or disable all functionality that could change the Web tree/DB on the old site). Note that spiders (even legit ones) can cache the old site IP for days and will usually be last cockroaches accessing the old site days after the change.

    Also, when migrating a site, if it’s been redesigned (often a reason for an IP move – redesign and re-hosting often go hand in hand), don’t forget to put in redirect mappings (e.g. RewriteRule in Apache) from the old relative URLs to the new ones. Google will often show 6 or so sub-categories for the old site – that’s the bare minimum you’ll need to re-map.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: