Polling SRV Records for mongos Discovery
- Status: Accepted
- Minimum Server Version: N/A
Abstract
Currently the Initial DNS Seedlist Discovery functionality provides a static seedlist when a MongoClient is constructed. Periodically polling the DNS SRV records would allow for the mongos proxy list to be updated without having to change client configuration.
This specification builds on top of the original Initial DNS Seedlist Discovery specification, and modifies the Server Discovery and Monitoring specification's definition of monitoring a set of mongos servers in a Sharded TopologyType.
META
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Specification
Terms
rescan, rescanning
A rescan is the periodic scan of all DNS SRV records to discover a new set of mongos hosts.
rescanSRVIntervalMS
An internal value representing how often the DNS SRV records should be queried for.
Implementation
If the initial topology was created through a mongodb+srv://
URI, then drivers MUST implement this specification by
periodically rescanning the SRV DNS records. There MUST NOT be an option to turn this behaviour off.
Drivers MUST NOT implement this specification if they do not adhere fully to the Initial DNS Seedlist Discovery specification.
This feature is only available when the Server Discovery has determined that the TopologyType is Sharded, or Unknown. Drivers MUST NOT rescan SRV DNS records when the Topology is not Sharded (i.e. Single, ReplicaSetNoPrimary, or ReplicaSetWithPrimary).
The discovery of a set of mongos servers is explained in the seedlist discovery section of the original specification. The behaviour of the periodic rescan is similar, but not identical to the behaviour of initial seedlist discovery. Periodic scan MUST follow these rules:
- The driver will query the DNS server for SRV records on
{hostname}.{domainname}
, prefixed with the SRV service name and protocol. The SRV service name is provided in the srvServiceName URI option and defaults tomongodb
. The protocol is alwaystcp
. After prefixing, the URI should look like:_{srvServiceName}._tcp.{hostname}.{domainname}
. - A driver MUST verify that the host names returned through SRV records have the same parent
{domainname}
. When this verification fails, a driver:- MUST NOT add such a non-compliant host name to the topology
- MUST NOT raise an error
- SHOULD log the non-compliance, including the host name
- MUST NOT initiate a connection to any such host
- If the DNS request returns no verified hosts in SRV records, no SRV records at all, or a DNS error happens, the
driver:
- MUST NOT change the topology
- MUST NOT raise an error
- SHOULD log this situation, including the reason why the DNS records could not be found, if possible
- MUST temporarily set rescanSRVIntervalMS to heartbeatFrequencyMS until at least one verified SRV record is obtained.
- For all verified host names, as returned through the DNS SRV query, the driver:
- MUST remove all hosts that are part of the topology, but are no longer in the returned set of valid hosts
- MUST NOT remove all hosts, and then re-add the ones that were returned. Hosts that have not changed, MUST be left alone and unchanged.
- If srvMaxHosts is zero or greater than or equal to the number of valid hosts, each valid new host MUST be added to the topology as Unknown.
- If srvMaxHosts is greater than
zero and less than the number of valid hosts, valid new hosts MUST be randomly selected and added to the topology
as Unknown until the topology has
srvMaxHosts
hosts. Drivers MUST use the same randomization algorithm as they do for initial selection.
- Priorities and weights in SRV records MUST continue to be ignored, and MUST NOT dictate which mongos server is used for new connections.
The rescan needs to happen periodically. As SRV records contain a TTL value, this value can be used to indicate when a rescan needs to happen. Different SRV records can have different TTL values. The rescanSRVIntervalMS value MUST be set to the lowest of the individual TTL values associated with the different SRV records in the most recent rescan, but MUST NOT be lower than 60 seconds. If a driver is unable to access the TTL values of SRV records, it MUST rescan every 60 seconds.
Drivers SHOULD endeavour to rescan and obtain a new list of mongos servers every rescanSRVIntervalMS value. The rescanSRVIntervalMS period SHOULD be calculated from the end of the previous rescan (or the end of the initial DNS seedlist discovery scan).
Multi-Threaded Drivers
A threaded driver MUST use a separate monitoring thread for scanning the DNS records so that DNS lookups don't block other operations.
Single-Threaded Drivers
The rescan MUST happen before scanning all servers as part of the normal scanning functionality, but only if rescanSRVIntervalMS has passed.
Test Plan
See README.md in the accompanying test directory.
Motivation for Change
The original Initial DNS Seedlist Discovery specification only regulates the initial list of mongos hosts to be used instead of a single hostname from a connection URI. Although this makes the initial configuration of a set of mongos servers a lot easier, it does not provide a method for updating the list of mongos servers in the topology.
Since the introduction of the mongodb+srv://
schema to provide an initial seedlist, some users have requested
additional functionality to be able to update the configured list of mongos hosts that make up the initially seeded
topology:
Design Rationale
From the scope document
Should DNS polling use heartbeatFrequencyMS or DNS cache TTLs?
We have selected to use lowest TTLs among all DNS SRV records, with a caveat that the rescan frequency is not lower than 60 seconds.
Should DNS polling also have a "fast polling" mode when no servers are available?
We have not opted to have a "fast polling" mode, but we did include a provision that a rescan needs to happen when DNS records are not available. In that case, a rescan would happen every heartbeatFrequencyMS. The rationale being that polling DNS really often really fast does not make a lot of sense due to DNS caching, which often uses the TTL already anyway, but when we have no TTL records to reference we still need a fallback frequency.
For the design
No option to turn off periodic rescanning
The design does not allow for an option to turn off the periodic rescanning of SRV records on the basis that we try to have as few options as possible: the "no knobs" philosophy.
Backwards Compatibility
This specification changes the behaviour of server monitoring by introducing a repeating DNS lookup of the SRV records.
Although this is an improvement in the mongodb+srv://
scheme it can nonetheless break expectations with users that
were familiar with the old behaviour. We do not expect this to negatively impact users.
Reference Implementation
Reference implementations are made for the following drivers:
- Perl
- C#
Security Implication
This specification has no security implications beyond the ones associated with the original Initial DNS Seedlist Discovery specification.
Future work
No future work is expected.
Changelog
-
2024-08-22: Migrated from reStructuredText to Markdown.
-
2022-10-05: Revise spec front matter and reformat changelog.
-
2021-10-14: Specify behavior for
srvMaxHosts
MongoClient option. -
2021-09-15: Clarify that service name only defaults to
mongodb
, and should be defined by thesrvServiceName
URI option.