Client Side Operations Timeout

  • Status: Accepted
  • Minimum Server Version: 2.6

Abstract

This specification outlines a new timeoutMS option to govern the amount of time that a single operation can execute before control is returned to the user. This timeout applies to all of the work done to execute the operation, including but not limited to server selection, connection checkout, and server-side execution.

META

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Specification

Terms

min(a, b)
Shorthand for "the minimum of a and b" where a and b are numeric values. For any cases where 0 means "infinite" (e.g. timeoutMS), min(0, other) MUST evaluate to other.

MongoClient Configuration

This specification introduces a new configuration option and deprecates some existing options.

timeoutMS

This 64-bit integer option specifies the per-operation timeout value in milliseconds. The default value is unset which means this feature is not enabled, i.e. the existing timeout behavior is unchanged (including serverSelectionTimeoutMS, connectTimeoutMS, socketTimeoutMS etc..). An explicit value of 0 means infinite, though some client-side timeouts like serverSelectionTimeoutMS will still apply. Drivers MUST error if a negative value is specified. This value MUST be configurable at the level of a MongoClient, MongoDatabase, MongoCollection, or of a single operation. However, if the option is specified at any level, it cannot be later changed to unset. At each level, the value MUST be inherited from the previous level if it is not explicitly specified. Additionally, some entities like ClientSession and GridFSBucket either inherit timeoutMS from their parent entities or provide options to override it. The behavior for these entities is described in individual sections of this specification.

Drivers for languages that provide an idiomatic API for expressing durations of time (e.g. TimeSpan in .NET) MAY choose to leverage these APIs for the timeoutMS option rather than using int64. Drivers that choose to do so MUST also follow the semantics for special values defined by those types. Such drivers MUST also ensure that there is a way to explicitly set timeoutMS to infinite in the API.

See timeoutMS cannot be changed to unset once it's specified.

Backwards Breaking Considerations

This specification deprecates many existing timeout options and introduces a new exception type that is used to communicate timeout expiration. If drivers need to make backwards-breaking changes to support timeoutMS, the backwards breaking behavior MUST be gated behind the presence of the timeoutMS option. If the timeoutMS option is not set, drivers MUST continue to honor existing timeouts such as socketTimeoutMS. Backwards breaking changes include any changes to exception types thrown by stable API methods or changes to timeout behavior. Drivers MUST document these changes.

In a subsequent major release, drivers SHOULD drop support for legacy timeout behavior and only continue to support the timeout options that are not deprecated by this specification. Once legacy options are removed, drivers MUST make the backwards-breaking behavioral changes described in this specification regardless of whether or not timeoutMS is set by the application.

See the Errors section for explanations of the backwards-breaking changes to error reporting.

Deprecations

The following configuration timeout options MUST be deprecated in favor of timeoutMS:

  • socketTimeoutMS
  • waitQueueTimeoutMS
  • wTimeoutMS

The following options for CRUD methods MUST be deprecated in favor of timeoutMS:

  • maxTimeMS
  • maxCommitTimeMS

Timeout Behavior

The timeoutMS option specifies the best-effort maximum amount of time a single operation can take before control is returned to the application. Drivers MUST keep track of the remaining time before the timeout expires as an operation progresses.

Operations

The timeoutMS option applies to all operations defined in the following specifications:

In addition, it applies to all operations on cursor objects that may perform blocking work (e.g. methods to iterate or close a cursor, any method that reads documents from a cursor into an array, etc).

Validation and Overrides

When executing an operation, drivers MUST ignore any deprecated timeout options if timeoutMS is set on the operation or is inherited from the collection/database/client levels. In addition to being set at these levels, the timeout for an operation can also be expressed via an explicit ClientSession (see Convenient Transactions API). In this case, the timeout on the session MUST be used as the timeoutMS value for the operation. Drivers MUST raise a validation error if an explicit session with a timeout is used and the timeoutMS option is set at the operation level for operations executed as part of a withTransaction callback.

See timeoutMS overrides deprecated timeout options.

Errors

If the timeoutMS option is not set and support for deprecated timeout options has not been dropped but a timeout is encountered (e.g. server selection times out), drivers MUST continue to return existing errors. This ensures that error-handling code in existing applications does not break unless the user opts into using timeoutMS.

If the timeoutMS option is set and the timeout expires, drivers MUST abort all blocking work and return control to the user with an error. This error MUST be distinguished in some way (e.g. custom exception type) to make it easier for users to detect when an operation fails due to a timeout. If the timeout expires during a blocking task, drivers MUST expose the underlying error returned from the task from this new error type. The stringified version of the new error type MUST include the stringified version of the underlying error as a substring. For example, if server selection expires and returns a ServerSelectionTimeoutException, drivers must allow users to access that exception from this new error type. If there is no underlying error, drivers MUST add information about when the timeout expiration was detected to the stringified version of the timeout error.

Error Transformations

When using the new timeout error type, drivers MUST transform timeout errors from external sources into the new error. One such error is the MaxTimeMSExpired server error. When checking for this error, drivers MUST only check that the error code is 50 and MUST NOT check the code name or error message. This error can be present in a top-level response document where the ok value is 0, as part of an error in the writeErrors array, or in a nested writeConcernError document. For example, all three of the following server responses would match this criteria:

{ok: 0, code: 50, codeName: "MaxTimeMSExpired", errmsg: "operation time limit exceeded"}

{ok: 1, writeErrors: [{code: 50, codeName: "MaxTimeMSExpired", errmsg: "operation time limit exceeded"}]}

{ok: 1, writeConcernError: {code: 50, codeName: "MaxTimeMSExpired"}}

Timeouts from other sources besides MongoDB servers MUST also be transformed into this new exception type. These include socket read/write timeouts and HTTP request timeouts.

Blocking Sections for Operation Execution

The following pieces of operation execution are considered blocking:

  1. Implicit session acquisition if an explicit session was not provided for the operation. This is only considered blocking for drivers that perform server selection to determine session support when acquiring implicit sessions.
  2. Server selection
  3. Connection checkout - If maxPoolSize has already been reached for the selected server, this is the amount of time spent waiting for a connection to be available.
  4. Connection establishment - If the pool for the selected server is empty and a new connection is needed, the following pieces of connection establishment are considered blocking:
    1. TCP socket establishment
    2. TLS handshake
      1. All messages sent over the socket as part of the TLS handshake
      2. OCSP verification - HTTP requests sent to OCSP responders.
    3. MongoDB handshake (i.e. initial connection hello)
    4. Authentication
      1. SCRAM-SHA-1, SCRAM-SHA-256, PLAIN: Execution of the command required for the SASL conversation.
      2. GSSAPI: Execution of the commands required for the SASL conversation and requests to the KDC and TGS.
      3. MONGODB-AWS: Execution of the commands required for the SASL conversation and all HTTP requests to ECS and EC2 endpoints.
      4. MONGODB-X509: Execution of the commands required for the authentication conversation.
  5. Client-side encryption
    1. Execution of listCollections commands to get collection schemas.
    2. Execution of find commands against the key vault collection to get encrypted data keys.
    3. Requests to non-local key management servers (e.g. AWS KMS) to decrypt data keys.
    4. Requests to mongocryptd servers.
  6. Socket write to send a command to the server
  7. Socket read to receive the server’s response

The timeoutMS option MUST apply to all blocking sections. Drivers MUST document any exceptions. For example, drivers that do not have full control over OCSP verification might not be able to set timeouts for HTTP requests to responders and would document that OCSP verification could result in an execution time greater than timeoutMS.

Server Selection

If timeoutMS is set, drivers MUST use min(serverSelectionTimeoutMS, remaining timeoutMS), referred to as computedServerSelectionTimeout as the timeout for server selection and connection checkout. The server selection loop MUST fail with a timeout error once the timeout expires.

After a server has been selected, drivers MUST use the remaining computedServerSelectionTimeout value as the timeout for connection checkout. If a new connection is required, min(connectTimeoutMS, remaining computedServerSelectionTimeout) MUST be used as the timeout for TCP socket establishment. Any network requests required to create or authenticate a connection (e.g. HTTP requests to OCSP responders) MUST use min(operationTimeout, remaining computedServerSelectionTimeout) as a timeout, where operationTimeout is the specified default timeout for the network request. If there is no specified default, these operations MUST use the remaining computedServerSelectionTimeout value. All commands sent during the connection’s handshake MUST use the remaining computedServerSelectionTimeout as their timeoutMS value. Handshake commands MUST also set timeouts per the Command Execution section.

If timeoutMS is not set and support for waitQueueTimeoutMS has not been removed, drivers MUST continue to exhibit the existing timeout behavior by honoring serverSelectionTimeoutMS for server selection and waitQueueTimeoutMS for connection checkout. If a new connection is required, drivers MUST use connectTimeoutMS as the timeout for socket establishment and socketTimeoutMS as the socket timeout for all handshake commands.

See serverSelectionTimeoutMS is not deprecated and connectTimeoutMS is not deprecated.

Command Execution

If timeoutMS is set, drivers MUST append a maxTimeMS field to commands executed against a MongoDB server using the minRoundTripTime field of the selected server. Note that this value MUST be retrieved during server selection using the servers field of the same TopologyDescription that was used for selection before the selected server's description can be modified. Otherwise, drivers may be subject to a race condition where a server is reset to the default description (e.g. due to an error in the monitoring thread) after it has been selected but before the RTT is retrieved.

If the minRoundTripTime is less than the remaining timeoutMS, the value of this field MUST be remaining timeoutMS - minRoundTripTime. If not, drivers MUST return a timeout error without attempting to send the message to the server. This is done to ensure that an operation is not routed to the server if it will likely fail with a socket timeout as that could cause connection churn. The maxTimeMS field MUST be appended after all blocking work is complete.

After wire message construction, drivers MUST check for timeout before writing the message to the server. If the timeout has expired or the amount of time remaining is less than the selected server's minimum RTT, drivers MUST return the connection to the pool and raise a timeout exception. Otherwise, drivers MUST set the connection’s write timeout to the remaining timeoutMS value before writing a message to the server. After the write is complete, drivers MUST check for timeout expiration before reading the server’s response. If the timeout has expired, the connection MUST be closed and a timeout exception MUST be propagated to the application. If it has not, drivers MUST set the connection’s read timeout to the remaining timeoutMS value. The timeout MUST apply to the aggregate of all reads done to receive a server response, not to individual reads. If any read or write calls on the socket fail with a timeout, drivers MUST transform the error into the new timeout exception as described in the Error Transformations section.

If timeoutMS is not set and support for socketTimeoutMS has not been removed, drivers MUST honor socketTimeoutMS as the timeout for socket reads and writes.

See maxTimeMS accounts for server RTT.

Batching

If an operation must be sent to the server in multiple batches (e.g. collection.bulkWrite()), the timeoutMS option MUST apply to the entire operation, not to each individual batch.

Retryability

If an operation requires a retry per the retryable reads or writes specifications and timeoutMS is set, drivers MUST retry operations as many times as possible before the timeout expires or a retry attempt returns a non-retryable error. Once the timeout expires, a timeout error MUST be raised.

See Why don’t drivers use backoff/jitter between retry attempts?.

Client Side Encryption

If automatic client-side encryption or decryption is enabled, the remaining timeoutMS value MUST be used as the timeoutMS when executing listCollections commands to retrieve collection schemas, find commands to get data from the key vault, and any commands against mongocryptd. It MUST also be used as the request timeout for HTTP requests against KMS servers to decrypt data keys. When sending a command to mongocryptd, drivers MUST NOT append a maxTimeMS field. This is to ensure that a maxTimeMS field can be safely appended to the command after it has been marked by mongocryptd and encrypted by libmongocrypt. To determine whether or not the server is a mongocryptd, drivers MUST check that the iscryptd field in the server's description is true.

For explicit encryption and decryption, the ClientEncryptionOpts options type used to construct ClientEncryption instances MUST support a new timeoutMS option, which specifies the timeout for all operations executed on the ClientEncryption object.

See maxTimeMS is not added for mongocryptd.

Background Connection Pooling

Connections created as part of a connection pool’s minPoolSize maintenance routine MUST use connectTimeoutMS as the timeout for connection establishment. After the connection is established, if timeoutMS is set at the MongoClient level, it MUST be used as the timeout for all commands sent as part of the MongoDB or authentication handshakes. The timeout MUST be refreshed after each command. These commands MUST set timeouts per the Command Execution section. If timeoutMS is not set, drivers MUST continue to honor socketTimeoutMS as the socket timeout for handshake and authentication commands.

Server Monitoring

Drivers MUST NOT use timeoutMS for commands executed by the server monitoring and RTT calculation threads.

See Monitoring threads do not use timeoutMS.

Cursors

For operations that create cursors, timeoutMS can either cap the lifetime of the cursor or be applied separately to the original operation and all next calls. To support both of these use cases, these operations MUST support a timeoutMode option. This option is an enum with possible values CURSOR_LIFETIME and ITERATION. The default value depends on the type of cursor being created. Drivers MUST error if timeoutMode is set and timeoutMS is not.

When applying the timeoutMS option to next calls on cursors, drivers MUST ensure it applies to the entire call, not individual commands. For drivers that send getMore requests in a loop when iterating tailable cursors, the timeout MUST apply to the totality of all getMore’s, not to each one individually. If a resume is required for a next call on a change stream, the timeout MUST apply to the entirety of the initial getMore and all commands sent as part of the resume attempt.

For close methods, drivers MUST allow timeoutMS to be overridden if doing so is possible in the language. If explicitly set for the operation, it MUST be honored. Otherwise, if timeoutMS was applied to the operation that created the cursor, it MUST be refreshed for the killCursors command if one is required. Note that this means timeoutMS will be refreshed for the close call even if the cursor was created with a timeoutMode of CURSOR_LIFETIME and the timeout associated with the cursor has expired. The calculated timeout MUST apply to explicit close methods that can be invoked by users as well as implicit destructors that are automatically invoked when exiting resource blocks.

See Cursor close() methods refresh timeoutMS.

Non-tailable Cursors

For non-tailable cursors, the default value of timeoutMode is CURSOR_LIFETIME. If timeoutMS is set, drivers MUST apply it to the original operation and the lifetime of the created cursor. For example, if a find is executed at time T, the find and all getMore’s on the cursor must finish by time T + timeoutMS. When executing next calls on the cursor, drivers MUST use the remaining timeout as the timeoutMS value for the operation but MUST NOT append a maxTimeMS field to getMore commands. If there are documents remaining in a previously retrieved batch, the next method MUST return them even if the timeout has expired and MUST only return a timeout error if a getMore is required.

If timeoutMode is set to ITERATION, drivers MUST raise a client-side error if the operation is an aggregate with a $out or $merge pipeline stage. If the operation is not an aggregate with $out or $merge, drivers MUST honor the timeoutMS option for the initial command but MUST NOT append a maxTimeMS field to the command sent to the server. After the operation has executed, the original timeoutMS value MUST also be applied to each next call on the created cursor. Drivers MUST NOT append a maxTimeMS field to getMore commands.

See Non-tailable cursor behavior.

Tailable Cursors

Tailable cursors only support the ITERATION value for the timeoutMode option. This is the default value and drivers MUST error if the option is set to CURSOR_LIFETIME.

Tailable non-awaitData Cursors

If timeoutMS is set, drivers MUST apply it separately to the original operation and to all next calls on the resulting cursor but MUST NOT append a maxTimeMS field to any commands.

Tailable awaitData Cursors

If timeoutMS is set, drivers MUST apply it to the original operation. Drivers MUST also apply the original timeoutMS value to each next call on the resulting cursor but MUST NOT use it to derive a maxTimeMS value for getMore commands. Helpers for operations that create tailable awaitData cursors MUST also support the maxAwaitTimeMS option. Drivers MUST error if this option is set, timeoutMS is set to a non-zero value, and maxAwaitTimeMS is greater than or equal to timeoutMS. If this option is set, drivers MUST use it as the maxTimeMS field on getMore commands.

See Tailable cursor behavior for rationale regarding both non-awaitData and awaitData cursors.

Change Streams

Driver watch helpers MUST support both timeoutMS and maxAwaitTimeMS options. Drivers MUST error if maxAwaitTimeMS is set, timeoutMS is set to a non-zero value, and maxAwaitTimeMS is greater than or equal to timeoutMS. These helpers MUST NOT support the timeoutMode option as change streams are an abstraction around tailable-awaitData cursors, so they implicitly use ITERATION mode. If set, drivers MUST apply the timeoutMS option to the initial aggregate operation. Drivers MUST also apply the original timeoutMS value to each next call on the change stream but MUST NOT use it to derive a maxTimeMS field for getMore commands. If the maxAwaitTimeMS option is set, drivers MUST use it as the maxTimeMS field on getMore commands.

If a next call fails with a timeout error, drivers MUST NOT invalidate the change stream. The subsequent next call MUST perform a resume attempt to establish a new change stream on the server. Any errors from the aggregate operation done to create a new change stream MUST be propagated to the application. Drivers MUST document that users can either call next again or close the existing change stream and create a new one if a previous next call times out. The documentation MUST suggest closing and re-creating the stream with a higher timeout if the timeout occurs before any events have been received because this is a signal that the server is timing out before it can finish processing the existing oplog.

See Change stream behavior.

Sessions

The SessionOptions used to construct explicit ClientSession instances MUST accept a new defaultTimeoutMS option, which specifies the timeoutMS value for the following operations executed on the session:

  1. commitTransaction
  2. abortTransaction
  3. withTransaction
  4. endSession

If this option is not specified for a ClientSession, it MUST inherit the timeoutMS of its parent MongoClient.

Session checkout

As noted in Blocking Sections for Operation Execution, implicit session checkout can be considered a blocking process for some drivers. Such drivers MUST apply the remaining timeoutMS value to this process when executing an operation. For explicit session checkout, drivers MUST apply the timeoutMS value of the MongoClient to the startSession call if set. Drivers MUST NOT allow users to override timeoutMS for startSession operations.

See timeoutMS cannot be overridden for startSession calls.

Convenient Transactions API

If timeoutMS is set, drivers MUST apply it to the entire withTransaction call. To propagate the timeout to the user-supplied callback, drivers MUST store the timeout as a field on the ClientSession object. This field SHOULD be private to ensure that a user can not modify it while a withTransaction call is in progress. Drivers that cannot make this field private MUST signal that the field should not be accessed or modified by users if there is an idiomatic way to do so in the language (e.g. underscore-prefixed variable names in Python) and MUST document that modification of the field can cause unintended correctness issues for applications. Drivers MUST document that the remaining timeout will not be applied to callback operations that do not use the ClientSession. Drivers MUST also document that overriding timeoutMS for operations executed using the explicit session inside the provided callback will result in a client-side error, as defined in Validation and Overrides. If the callback returns an error and the transaction must be aborted, drivers MUST refresh the timeoutMS value for the abortTransaction operation.

If timeoutMS is not set, drivers MUST continue to exhibit the existing 120 second timeout behavior. Drivers MUST NOT change existing implementations to use timeoutMS=120000 for this case.

See withTransaction communicates timeoutMS via ClientSession and withTransaction refreshes the timeout for abortTransaction.

GridFS API

GridFS buckets MUST inherit timeoutMS from their parent MongoDatabase instance and all methods in the GridFS Bucket API MUST support the timeoutMS option. For methods that create streams (e.g. open_upload_stream), the option MUST cap the lifetime of the entire stream. This MUST include the time taken by any operations executed during stream construction, reads/writes, and close/abort calls. For example, if a stream is created at time T, the final close call on the stream MUST finish all blocking work before time T + timeoutMS. Methods that interact with a user-provided stream (e.g. upload_from_stream) MUST use timeoutMS as the timeout for the entire upload/download operation. If the user-provided streams do not support timeouts, drivers MUST document that the timeout for these methods may be breached if calls to interact with the stream take longer than the remaining timeout. If timeoutMS is set, all cursors created for GridFS API operations MUST internally set the timeoutMode option to CURSOR_LIFETIME.

See GridFS streams behavior.

RunCommand

The behavior of runCommand is undefined if the provided command document includes a maxTimeMS field and the timeoutMS option is set. Drivers MUST document the behavior of runCommand for this case and MUST NOT attempt to check the command document for the presence of a maxTimeMS field.

See runCommand behavior.

Test Plan

See the README.rst in the tests directory.

Motivation for Change

Users have many options to set timeouts for various parts of operation execution including, but not limited to, serverSelectionTimeoutMS, socketTimeoutMS, connectTimeoutMS, maxTimeMS, and wTimeoutMS. As a result, users are often unsure which timeout to use. Because some of these timeouts are additive, it is difficult to set a combination which ensures control will be returned to the user after a specified amount of time. To make timeouts more intuitive, changes are required to the drivers API to deprecate some of the existing timeouts and add a new one to specify the maximum execution time for an entire operation from start to finish.

In addition, automatically retrying reads and writes that failed due to transient network blips or planned maintenance scenarios has improved application resiliency but the original behavior of only retrying once still allowed some errors to be propagated to applications. Supporting a timeout for an entire operation allows drivers to retry operations multiple times while still guaranteeing that an application can get back control once the specified amount of time has elapsed.

Design Rationale

timeoutMS cannot be changed to unset once it's specified

If timeoutMS is specified at any level, it cannot be later changed to unset at a lower level. For example, a user cannot do:

client = MongoClient(uri, timeoutMS=1000)
db = client.database("foo", timeoutMS=None)

This is because drivers return existing exception types if timeoutMS is not specified, but will return new exception types and use new timeout behaviors if it is. Once the user has opted into this behavior, we should not allow them to opt out of it at a lower level. If a user wishes to set the timeout to infinite for a specific database, collection, or operation, they can explicitly set timeoutMS to 0.

serverSelectionTimeoutMS is not deprecated

The original goal of the project was to expose a single timeout and deprecate all others. This was not possible, however, because executing an operation consists of two distinct parts. The first is selecting a server and checking out a connection from its pool. This should have a default timeout because failure to do this indicates that the deployment is not in a healthy state or that there was a configuration error which prevents the driver from successfully connecting. The second is server-side operation execution, which cannot have a default timeout. Some operations finish in a few milliseconds, while others can run for many hours. Adding a default would inevitably break applications. To accomplish both of these goals, serverSelectionTimeoutMS was preserved and is used to timeout the client-side section of operation execution.

connectTimeoutMS is not deprecated

Similar to the reasoning for not deprecating serverSelectionTimeoutMS, socket establishment should have a default timeout because failure to create a socket likely means that the target server is not healthy or there is a network issue. To accomplish this, the connectTimeoutMS option is not deprecated by this specification. Drivers also use connectTimeoutMS to derive a socket timeout for monitoring connections, which are not subject to timeoutMS.

timeoutMS overrides deprecated timeout options

Applying both timeoutMS and a deprecated timeout option like socketTimeoutMS at the same time would lead to confusing semantics that are difficult to document and understand. When first writing this specification, we considered having drivers error in this situation to catch mismatched timeouts as early as possible. However, because timeoutMS can be set at any level, this behavior could lead to unanticipated runtime errors if an application set timeoutMS for a specific operation and the MongoClient used in production was configured with a deprecated timeout option. To have clear semantics and avoid unexpected errors in applications, we decided that timeoutMS should override deprecated timeout options.

maxTimeMS is not added for mongocryptd

The mongocryptd server annotates the provided command to indicate encryption requirements and returns the marked up result. If the command sent to mongocryptd contained maxTimeMS, the final command sent to MongoDB would contain two maxTimeMS fields: one added by the regular MongoClient and another added by the mongocryptd client. To avoid this complication, drivers do not add this field when sending commands to mongocryptd at all. Doing so does not sacrifice any functionality because mongocryptd always runs on localhost and does not perform any blocking work, so execution or network timeouts cannot occur.

maxTimeMS accounts for server RTT

When constructing a command, drivers use the timeoutMS option to derive a value for the maxTimeMS command option and the socket timeout. The full time to round trip a command is (network RTT + server-side execution time). If both maxTimeMS and socket timeout were set to the same value, the server would never be able to respond with a MaxTimeMSExpired error because drivers would hit the socket timeout first and close the connection. This would lead to connection churn if the specified timeout is too low. To allow the server to gracefully error and avoid churn, drivers must account for the network round trip in the maxTimeMS calculation.

Monitoring threads do not use timeoutMS

Using timeoutMS in the monitoring and RTT calculation threads would require another special case in the code that derives maxTimeMS from timeoutMS because the awaitable hello requests sent to 4.4+ servers already have a maxAwaitTimeMS field. Adding maxTimeMS also does not help for non-awaitable hello commands because we expect them to execute quickly on the server. The Server Monitoring spec already mandates that drivers set and dynamically update the read/write timeout of the dedicated connections used in monitoring threads, so we rely on that to time out commands rather than adding complexity to the behavior of timeoutMS.

runCommand behavior

The behavior of runCommand varies across drivers. If the provided command document includes a maxTimeMS field and the timeoutMS option is set, some drivers would overwrite the maxTimeMS field with the value derived from timeoutMS, while others would append a second maxTimeMS field, which would cause a server error on versions 3.4+. To be prescriptive, we could mandate that drivers raise a client-side error in this case, but this would require a potentially expensive lookup in the command document. To avoid this additional cost, drivers are only required to document the behavior and suggest that timeoutMS be used instead of including a manual maxTimeMS field.

Why don't drivers use backoff/jitter between retry attempts?

Earlier versions of this specification proposed adding backoff and/or jitter between retry attempts to avoid connection storming or overloading the server, but we later deemed this unnecessary. If multiple concurrent operations select the same server for a retry and its connection pool is empty, we rely on the maxConnecting parameter introduced in DRIVERS-781 to rate limit new connection attempts, which mitigates the risk of connection storms. Even if the new server has enough connections in its pool to service the operations, recent server versions do very little resource-intensive work until execution reaches the storage layer, which is already guarded by read/write tickets, so we don’t expect the server to be overwhelmed. If we later decide that adding jitter would be useful, it may be easier to do so in the server itself via a ticket-based admission system earlier in the execution stack.

Cursor close() methods refresh timeoutMS

If a cursor times out client-side (e.g. a non-tailable cursor created with timeoutMode=CURSOR_LIFETIME), it’s imperative that drivers make a good-faith effort to close the server-side cursor even though the timeout has expired because failing to do so would leave resources open on the server for a potentially long time. It was decided that timeoutMS will be refreshed for close operations to allow the cursor to be killed server-side.

Non-tailable cursor behavior

There are two usage patterns for non-tailable cursors. The first is to read documents from a cursor into an iterable object, either by explicitly iterating the cursor in a loop or using a language construct like Python list comprehensions. To supply a timeout for the entire process, drivers use timeoutMS to cap the execution time for the initial command and all required getMore’s. This use case also matches the server behavior; if maxTimeMS is set for an operation that creates a non-tailable cursor, the server will use the time limit to cap the total server-side execution time for future getMore’s. Because this type of usage matches the server behavior and is the more common case, this is the default behavior.

The second use case is batch processing, where the user takes advantage of the lazy nature of cursors to process documents from a large collection. In this case, the user does not want all documents from the collection to be in an array because that would require too much memory. To accommodate this use case, drivers support a new timeoutMode option. Users can set the value for this option to ITERATION to have timeoutMS apply to the original command and then separately to each next call. When this option is used, drivers do not set maxTimeMS on the initial command to avoid capping the cursor lifetime in the server.

Tailable cursor behavior

Once a tailable cursor is created, it conceptually lives forever. Therefore, it only makes sense to support timeoutMode=ITERATION for these cursors and drivers error if timeoutMode=CURSOR_LIFETIME is specified.

There are two types of tailable cursors. The first, tailable non-awaitData cursors, support maxTimeMS for the original command but not for any getMore requests. However, setting maxTimeMS on the original command also incorrectly caps the server-side execution time for future getMore’s (SERVER-51153). This is undesirable behavior because it does not match the guarantees made by timeoutMode=ITERATION. To work around this, drivers honor timeoutMS for both the original operation and all getMore’s but only use it to derive client-side timeouts and do not append a maxTimeMS field to any commands. The server-side execution time is enforced via socket timeouts.

The second type is tailable awaitData cursors. The server supports the maxTimeMS option for the original command. For getMore’s, the option is supported, but instead of limiting the server-side execution time, it specifies how long the server should wait for new data to arrive if it reaches the end of the capped collection and the batch is still empty. If no new data arrives within that time limit, the server will respond with an empty batch. For these cursors, drivers support both the timeoutMS and maxAwaitTimeMS options. The timeoutMS option is used to derive client-side timeouts, while the maxAwaitTimeMS option is used as the maxTimeMS field for getMore commands. These values have distinct meanings, so supporting both yields a more robust, albeit verbose, API. Drivers error if maxAwaitTimeMS is greater than or equal to timeoutMS because in that case, getMore requests would not succeed if the batch was empty: the server would wait for maxAwaitTimeMS, but the driver would close the socket after timeoutMS.

Change stream behavior

Change streams internally behave as tailable awaitData cursors, so the behavior of the timeoutMS option is the same for both. The main difference is that change streams are resumable and drivers automatically perform resume attempts when they encounter transient errors. This allows change streams to be resilient to timeouts. If timeoutMS expires during a next call, drivers can’t auto-resume, but they can make sure the change stream is not invalidated so the user can call next again. In this case, the subsequent call would perform the resume without doing a getMore first.

withTransaction communicates timeoutMS via ClientSession

Because the withTransaction API doesn’t allow drivers to plumb down the remaining timeout into the user-provided callback, this spec requires the remaining timeout to be stored on the ClientSession. Operations in the callback that run under that ClientSession can then extract the timeout from the session and apply it. To avoid confusing validation semantics, operations error if there is a timeout on the session but also an overridden timeout for the operation. It’s possible that the ability to communicate timeouts for a block of operations via a ClientSession is useful as a general purpose API, but we’ve decided to make it private until there are other known use cases.

withTransaction refreshes the timeout for abortTransaction

If the user-provided callback to withTransaction times out, it could leave a transaction running on the server. It’s imperative that drivers make an effort to abort the open transaction because failing to do so could result in the collections and databases affected by the transaction being locked for a long period of time, which could cause applications to stall. Because timeoutMS has expired before drivers attempt to abort the transaction, we require drivers to refresh it and apply the original value to the execution of the abortTransaction operation. This can cause the entire withTransaction call to take up to 2*timeoutMS, but it was decided that this risk is worthwhile given the importance of transaction cleanup.

GridFS streams behavior

Streams created by GridFS API operations (e.g. by open_upload_stream and open_download_stream) present a challenge for this specification. These types of streams execute multiple operations, but there can be artificial gaps between operations if the application does not invoke the stream functions for long periods of time. Generally, we expect users to upload or download an entire file as quickly as possible, so we decided to have timeoutMS cap the lifetime of the created stream. The other option was to apply the entire timeoutMS value to each operation executed by the stream, but streams perform many hidden operations, so this approach could cause an upload/download to take much longer than expected.

timeoutMS cannot be overridden for startSession calls

In general, users can override timeoutMS at the level of a single operation. The startSession operation, however, only inherits timeoutMS from the MongoClient and does not allow the option to be overridden. This was a conscious API design decision because drivers are moving towards only supporting MongoDB versions 3.6 and higher, so sessions will always be supported. Adding an override for startSession would introduce a new knob and increase the API surface of drivers without providing a significant benefit.

Drivers use minimum RTT to short circuit operations

A previous version of this spec used the 90th percentile RTT to short circuit operations that might otherwise fail with a socket timeout. We decided to change this logic to avoid canceling operations that may have a high chance of succeeding and also remove a dependency on t-digest. Instead, drivers use the minimum RTT from the last 10 samples, or 0 until at least 2 samples have been recorded.

Future work

Modify GridFS streams behavior via new options

As explained in the design rationale, drivers use timeoutMS to cap the entire lifetime of streams created by GridFS operations. If we find that users are often encountering timeout errors when using these APIs due to the time spent during non-MongoDB operations (e.g. streaming data read from a GridFS stream into another data store), we could consider toggling GridFS behavior via an option similar to timeoutMode for cursors. To avoid backwards-breaking behavioral changes, the default would continue to cap the stream lifetime but there could be another mode that refreshes the timeout for each database operation. This would mimic using timeoutMode=ITERATION for cursors.

Changelog

  • 2023-12-07: Migrated from reStructuredText to Markdown.
  • 2022-11-17: Use minimum RTT for maxTimeMS calculation instead of 90th percentile RTT.
  • 2022-10-05: Remove spec front matter.
  • 2022-01-19: Initial version.