Using DNS TTL to control migrations
October 12, 2010 6 Comments
Often when you’re moving services from one piece of hardware/location to another it will involve a DNS change. From my experience the DNS change is usually the final change that’s used to move the traffic.
DNS entries can have TTLs. TTL means “time to live” and is the expiry time of the record. For a normal running website you could expect a TTL of 86400 (seconds) or one day. This means that once a DNS server or other DNS client has requested the record it’ll hold onto a cached copy for up to a day before re-requesting it.
If you were to leave your TTL at 86400 and change the DNS entry to point to your new server it could take up to a day for the changeover to happen.
Let’s consider two common use cases:
You want the DNS switchover to happen quickly
Assuming you have a TTL of 86400 make sure you reduce the TTL at least before you want to perform the migration.
I would normally change the TTL to 3600 (1 hour) the day before the planned migration. Then at least one hour before the migration time reduce the TTL down to 600 secs (10 mins). Then at least 10 mins before down to 60 (1 min).
Then make the change and your traffic should flip over to the new IP address pretty quickly and if anything goes wrong you can change the entry back and all traffic should fail back within a minute.
After you’re confident all is working well from the new host you should increase the TTL back up to it’s normal setting (in steps if you want).
You want to move the traffic gradually over to a new service
If you’re not too fussed about which server the traffic hits (e.g. serving static content from file-synced servers) then you might want the traffic to move over gradually. This is a nice approach if you want more of a “soft launch” and don’t want to risk something bad happening to 100% of your traffic if there are problems on the new hardware.
In this case a larger TTL might be desirable. I’d probably go for an hour but it really depends on the situation.
You’d follow similar steps to the ones described above slowly reducing down the TTL till it’s at the value you want. Now modify the DNS record to point to your service but as you do this set a low TTL on the new record.
The low TTL on the new record won’t effect the speed that the new record rolls out but it does mean that if you need to fail back then entries should be re-cached quickly.
When everything’s failed over nicely increase the TTL again.
Why not keep my TTL low all the time then?
You can do but it’s generally not accepted as good practice. It’ll generate much more DNS traffic to your authoritative DNS servers too as other resolvers will need to re-cache entries.
These might catch you out:
- Lots of OSes/browsers will cache a DNS entry for a minimum of 30 minutes so TTLs less than this might not be respected
- Some caching name servers ignore the published TTL and will apply their own minimum (this is out of RFC and really frustrating)