Bulk Write
- Status: Accepted
- Minimum Server Version: 8.0
Abstract
This specification defines the driver API for the bulkWrite
server command introduced in MongoDB 8.0. The API defined
in this specification allows users to perform insert, update, and delete operations against mixed namespaces in a
minimized number of round trips, and to receive detailed results for each operation performed. This API is distinct from
the collection-level bulkWrite method defined in the
CRUD specification.
Specification
[!NOTE] The
BulkWriteOptions
,BulkWriteResult
, andBulkWriteException
types defined in this specification are similar to those used for theMongoCollection.bulkWrite
method. Statically typed drivers MUST NOT reuse their existing definitions for these types for theMongoClient.bulkWrite
API and MUST introduce new types. If naming conflicts arise, drivers SHOULD prepend "Client" to the new type names (e.g.ClientBulkWriteOptions
).
MongoClient.bulkWrite
Interface
interface MongoClient {
/**
* Executes a list of mixed write operations.
*
* @throws BulkWriteException
*/
bulkWrite(models: NamespaceWriteModelPair[], options: Optional<BulkWriteOptions>): BulkWriteResult;
}
Write Models
A WriteModel
defines a single write operation to be performed as part of a bulk write.
/**
* Unifying interface for the various write model types. Drivers may also use an enum with
* variants for each write model for this type.
*/
interface WriteModel {}
class InsertOneModel implements WriteModel {
/**
* The document to insert.
*/
document: Document;
}
class UpdateOneModel implements WriteModel {
/**
* The filter to apply.
*/
filter: Document;
/**
* The update document or pipeline to apply to the selected document.
*/
update: (Document | Document[]);
/**
* A set of filters specifying to which array elements an update should apply.
*
* This option is sent only if the caller explicitly provides a value.
*/
arrayFilters: Optional<Document[]>;
/**
* Specifies a collation.
*
* This option is sent only if the caller explicitly provides a value.
*/
collation: Optional<Document>;
/**
* The index to use. Specify either the index name as a string or the index key pattern. If
* specified, then the query system will only consider plans using the hinted index.
*
* This option is only sent if the caller explicitly provides a value.
*/
hint: Optional<Document | String>;
/**
* When true, creates a new document if no document matches the query.
*
* This option is only sent if the caller explicitly provides a value. The server's default
* value is false.
*/
upsert: Optional<Boolean>;
/**
* Specify which document the operation updates if the query matches multiple
* documents. The first document matched by the sort order will be updated.
*
* This option is only sent if the caller explicitly provides a value.
*/
sort: Optional<Document>;
}
class UpdateManyModel implements WriteModel {
/**
* The filter to apply.
*/
filter: Document;
/**
* The update document or pipeline to apply to the selected documents.
*/
update: (Document | Document[]);
/**
* A set of filters specifying to which array elements an update should apply.
*
* This option is sent only if the caller explicitly provides a value.
*/
arrayFilters: Optional<Document[]>;
/**
* Specifies a collation.
*
* This option is sent only if the caller explicitly provides a value.
*/
collation: Optional<Document>;
/**
* The index to use. Specify either the index name as a string or the index key pattern. If
* specified, then the query system will only consider plans using the hinted index.
*
* This option is only sent if the caller explicitly provides a value.
*/
hint: Optional<Document | String>;
/**
* When true, creates a new document if no document matches the query.
*
* This option is only sent if the caller explicitly provides a value. The server's default
* value is false.
*/
upsert: Optional<Boolean>;
}
class ReplaceOneModel implements WriteModel {
/**
* The filter to apply.
*/
filter: Document;
/**
* The replacement document.
*/
replacement: Document;
/**
* Specifies a collation.
*
* This option is sent only if the caller explicitly provides a value.
*/
collation: Optional<Document>;
/**
* The index to use. Specify either the index name as a string or the index key pattern. If
* specified, then the query system will only consider plans using the hinted index.
*
* This option is only sent if the caller explicitly provides a value.
*/
hint: Optional<Document | String>;
/**
* When true, creates a new document if no document matches the query.
*
* This option is only sent if the caller explicitly provides a value. The server's default
* value is false.
*/
upsert: Optional<Boolean>;
/**
* Specify which document the operation replaces if the query matches multiple
* documents. The first document matched by the sort order will be replaced.
*
* This option is only sent if the caller explicitly provides a value.
*/
sort: Optional<Document>;
}
class DeleteOneModel implements WriteModel {
/**
* The filter to apply.
*/
filter: Document;
/**
* Specifies a collation.
*
* This option is sent only if the caller explicitly provides a value.
*/
collation: Optional<Document>;
/**
* The index to use. Specify either the index name as a string or the index key pattern. If
* specified, then the query system will only consider plans using the hinted index.
*
* This option is only sent if the caller explicitly provides a value.
*/
hint: Optional<Document | String>;
}
class DeleteManyModel implements WriteModel {
/**
* The filter to apply.
*/
filter: Document;
/**
* Specifies a collation.
*
* This option is sent only if the caller explicitly provides a value.
*/
collation: Optional<Document>;
/**
* The index to use. Specify either the index name as a string or the index key pattern. If
* specified, then the query system will only consider plans using the hinted index.
*
* This option is only sent if the caller explicitly provides a value.
*/
hint: Optional<Document | String>;
}
Each write model provided to MongoClient.bulkWrite
in the models
parameter MUST have a corresponding namespace that
defines the collection on which the operation should be performed. Drivers SHOULD design this pairing in whichever way
is most idiomatic for its language. For example, drivers may:
- Include a required
namespace
field on eachWriteModel
variant and accept a list ofWriteModel
objects for themodels
parameter. - Accept a list of
(Namespace, WriteModel)
tuples formodels
. - Define the following pair class:
class NamespaceWriteModelPair {
/**
* The namespace on which to perform the write.
*/
namespace: Namespace;
/**
* The write to perform.
*/
model: WriteModel;
}
Drivers MUST throw an exception if the list provided for models
is empty.
Update vs. replace document validation
Update documents provided in UpdateOne
and UpdateMany
write models are required only to contain atomic modifiers
(i.e. keys that start with $
). Drivers MUST throw an error if an update document is empty or if the document's first
key does not start with $
. Drivers MUST rely on the server to return an error if any other entries in the update
document are not atomic modifiers. Drivers are not required to perform validation on update pipelines.
Replacement documents provided in ReplaceOne
write models are required not to contain atomic modifiers. Drivers MUST
throw an error if a replacement document is nonempty and its first key starts with $
. Drivers MUST rely on the server
to return an error if any other entries in the replacement document are atomic modifiers.
Options
class BulkWriteOptions {
/**
* Whether the operations in this bulk write should be executed in the order in which they were
* specified. If false, writes will continue to be executed if an individual write fails. If
* true, writes will stop executing if an individual write fails.
*
* Defaults to true.
*/
ordered: Optional<Boolean>;
/**
* If true, allows the writes to opt out of document-level validation.
*
* This option is only sent if the caller explicitly provides a value. The server's default
* value is false.
*/
bypassDocumentValidation: Optional<Boolean>;
/**
* A map of parameter names and values to apply to all operations within the bulk write. Value
* must be constant or closed expressions that do not reference document fields. Parameters can
* then be accessed as variables in an aggregate expression context (e.g. "$$var").
*
* This option is only sent if the caller explicitly provides a value.
*/
let: Optional<Document>;
/**
* The write concern to use for this bulk write.
*/
writeConcern: Optional<WriteConcern>;
/**
* Enables users to specify an arbitrary comment to help trace the operation through
* the database profiler, currentOp and logs.
*
* This option is only sent if the caller explicitly provides a value.
*/
comment: Optional<BSON value>;
/**
* Whether detailed results for each successful operation should be included in the returned
* BulkWriteResult.
*
* Defaults to false. This value corresponds inversely to the errorsOnly field in the bulkWrite
* command.
*/
verboseResults: Optional<Boolean>;
}
Result
class BulkWriteResult {
/**
* Indicates whether this write result was acknowledged.
*
* NOT REQUIRED TO IMPLEMENT. See below for guidance on modeling unacknowledged results.
*/
acknowledged: Boolean;
/**
* Indicates whether this result contains verbose results.
*
* NOT REQUIRED TO IMPLEMENT. See below for guidance on modeling verbose results.
*/
hasVerboseResults: Boolean;
/**
* The total number of documents inserted across all insert operations.
*/
insertedCount: Int64;
/**
* The total number of documents upserted across all update operations.
*/
upsertedCount: Int64;
/**
* The total number of documents matched across all update operations.
*/
matchedCount: Int64;
/**
* The total number of documents modified across all update operations.
*/
modifiedCount: Int64;
/**
* The total number of documents deleted across all delete operations.
*/
deletedCount: Int64;
/**
* The results of each individual insert operation that was successfully performed.
*
* NOT REQUIRED TO IMPLEMENT. See below for guidance on modeling verbose results.
*/
insertResults: Map<Index, InsertOneResult>;
/**
* The results of each individual update operation that was successfully performed.
*
* NOT REQUIRED TO IMPLEMENT. See below for guidance on modeling verbose results.
*/
updateResults: Map<Index, UpdateResult>;
/**
* The results of each individual delete operation that was successfully performed.
*
* NOT REQUIRED TO IMPLEMENT. See below for guidance on modeling verbose results.
*/
deleteResults: Map<Index, DeleteResult>;
}
class InsertOneResult {
/**
* The _id of the inserted document.
*/
insertedId: Any;
}
class UpdateResult {
/**
* The number of documents that matched the filter.
*/
matchedCount: Int64;
/**
* The number of documents that were modified.
*/
modifiedCount: Int64;
/**
* The _id field of the upserted document if an upsert occurred.
*
* It MUST be possible to discern between a BSON Null upserted ID value and this field being
* unset. If necessary, drivers MAY add a didUpsert boolean field to differentiate between
* these two cases.
*/
upsertedId: Optional<BSON value>;
}
class DeleteResult {
/**
* The number of documents that were deleted.
*/
deletedCount: Int64;
}
Unacknowledged results
Users MUST be able to discern whether a BulkWriteResult
contains acknowledged results without inspecting the
configured write concern. Drivers should follow the guidance in the CRUD specification
here to determine how to model unacknowledged results.
If drivers expose the acknowledged
field, they MUST document what will happen if a user attempts to access a result
value when acknowledged
is false
(e.g. an undefined value is returned or an error is thrown).
Summary vs. verbose results
When a user does not set the verboseResults
option to true
, drivers MUST NOT populate the insertResults
,
updateResults
, and deleteResults
fields. Users MUST be able to discern whether a BulkWriteResult
contains these
verbose results without inspecting the value provided for verboseResults
in BulkWriteOptions
. Drivers can implement
this in a number of ways, including:
- Expose the
hasVerboseResults
field inBulkWriteResult
as defined above. Document what will happen if a user attempts to access theinsertResults
,updateResults
, ordeleteResults
values whenhasVerboseResults
is false. Drivers MAY raise an error if a user attempts to access one of these values whenhasVerboseResults
is false. - Embed the verbose results in an optional type:
class BulkWriteResult {
/**
* The results of each individual write operation that was successfully performed.
*
* This value will only be populated if the verboseResults option was set to true.
*/
verboseResults: Optional<VerboseResults>;
/* rest of fields */
}
class VerboseResults {
/**
* The results of each individual insert operation that was successfully performed.
*/
insertResults: Map<Index, InsertOneResult>;
/**
* The results of each individual update operation that was successfully performed.
*/
updateResults: Map<Index, UpdateResult>;
/**
* The results of each individual delete operation that was successfully performed.
*/
deleteResults: Map<Index, DeleteResult>;
}
- Define separate
SummaryBulkWriteResult
andVerboseBulkWriteResult
types.SummaryBulkWriteResult
MUST only contain the summary result fields, andVerboseBulkWriteResult
MUST contain both the summary and verbose result fields. ReturnVerboseBulkWriteResult
whenverboseResults
was set to true andSummaryBulkWriteResult
otherwise.
Individual results
The InsertOneResult
, UpdateResult
, and DeleteResult
classes are the same as or similar to types of the same name
defined in the CRUD specification. Drivers MUST redefine these classes if their existing result classes
deviate from the definitions in this specification (e.g. if they contain acknowledgement information, which is not
applicable for individual bulk write operations). Drivers MAY reuse their existing types for these classes if they match
the ones defined here exactly.
Exception
class BulkWriteException {
/**
* A top-level error that occurred when attempting to communicate with the server or execute
* the bulk write. This value may not be populated if the exception was thrown due to errors
* occurring on individual writes.
*/
error: Optional<Error>;
/**
* Write concern errors that occurred while executing the bulk write. This list may have
* multiple items if more than one server command was required to execute the bulk write.
*/
writeConcernErrors: WriteConcernError[];
/**
* Errors that occurred during the execution of individual write operations. This map will
* contain at most one entry if the bulk write was ordered.
*/
writeErrors: Map<Index, WriteError>;
/**
* The results of any successful operations that were performed before the error was
* encountered.
*/
partialResult: Optional<BulkWriteResult>;
}
Index Types
The insertResults
, updateResults
, and deleteResults
maps in BulkWriteResult
and the writeErrors
map in
BulkWriteException
specify Index
as their key type. This value corresponds to the index of the operation in the
writeModels
list that was provided to MongoClient.bulkWrite
. Drivers SHOULD use their language's standard numeric
type for indexes for this type (e.g. usize
in Rust). If no standard index type exists, drivers MUST use Int64
.
Building a bulkWrite
Command
The bulkWrite
server command has the following format:
{
"bulkWrite": 1,
"ops": <Array>,
"nsInfo": <Array>,
"errorsOnly": Optional<Boolean>,
"ordered": Optional<Boolean>,
"bypassDocumentValidation": Optional<Boolean>,
"comment": Optional<BSON value>,
"let": Optional<Document>,
...additional operation-agnostic fields
}
Drivers MUST use document sequences (OP_MSG
payload type 1) for the ops
and nsInfo
fields.
The bulkWrite
command is executed on the "admin" database.
Operations
The ops
field is a list of write operation documents. The first entry in each document has the name of the operation
(i.e. "insert", "update", or "delete") as its key and the index in the nsInfo
array of the namespace on which the
operation should be performed as its value. The documents have the following format:
Insert
{
"insert": <Int32>,
"document": <Document>
}
If the document to be inserted does not contain an _id
field, drivers MUST generate a new
ObjectId
and add it as the _id
field at the beginning of the document.
Update
{
"update": <Int32>,
"filter": <Document>,
"updateMods": <Document | Array>,
"multi": Optional<Boolean>,
"upsert": Optional<Boolean>,
"arrayFilters": Optional<Array>,
"hint": Optional<Document | String>,
"collation": Optional<Document>
}
The update
command document is used for update and replace operations. For update operations, the updateMods
field
corresponds to the update
field in UpdateOneModel
and UpdateManyModel
. For replace operations, the updateMods
field corresponds to the replacement
field in ReplaceOneModel
.
Delete
{
"delete": <Int32>,
"filter": <Document>,
"multi": Optional<Boolean>,
"hint": Optional<Document | String>,
"collation": Optional<Document>
}
Namespace Information
The nsInfo
field is an array containing the namespaces on which the write operations should be performed. Drivers MUST
NOT include duplicate namespaces in this list. The documents in the nsInfo
array have the following format:
{
"ns": <String>
}
errorsOnly
and verboseResults
The errorsOnly
field indicates whether the results cursor returned in the bulkWrite
response should contain only
errors and omit individual results. If false, both individual results for successful operations and errors will be
returned. This field is optional and defaults to false on the server.
errorsOnly
corresponds inversely to the verboseResults
option defined on BulkWriteOptions
. If the user specified a
value for verboseResults
, drivers MUST define errorsOnly
as the opposite of verboseResults
. If the user did not
specify a value for verboseResults
, drivers MUST define errorsOnly
as true
.
Drivers MUST return a client-side error if verboseResults
is true with an unacknowledged write concern containing the
following message:
Cannot request unacknowledged write concern and verbose results
ordered
The ordered
field defines whether writes should be executed in the order in which they were specified, and, if an
error occurs, whether the server should halt execution of further writes. It is optional and defaults to true on the
server. Drivers MUST explicitly define ordered
as true
in the bulkWrite
command if a value is not specified in
BulkWriteOptions
. This is required to avoid inconsistencies between server and driver behavior if the server default
changes in the future.
Drivers MUST return a client-side error if ordered
is true (including when default is applied) with an unacknowledged
write concern containing the following message:
Cannot request unacknowledged write concern and ordered writes
Auto-Encryption
If MongoClient.bulkWrite
is called on a MongoClient
configured with AutoEncryptionOpts
, drivers MUST return an
error with the message: "bulkWrite does not currently support automatic encryption".
This is expected to be removed once DRIVERS-2888 is implemented.
Command Batching
Drivers MUST accept an arbitrary number of operations as input to the MongoClient.bulkWrite
method. Because the server
imposes restrictions on the size of write operations, this means that a single call to MongoClient.bulkWrite
may
require multiple bulkWrite
commands to be sent to the server. Drivers MUST split bulk writes into separate commands
when the user's list of operations exceeds one or more of these maximums: maxWriteBatchSize
, maxBsonObjectSize
(for
OP_MSG
payload type 0), and maxMessageSizeBytes
(for OP_MSG
payload type 1). Each of these values can be retrieved
from the selected server's hello
command response. Drivers MUST merge results from multiple batches into a single
BulkWriteResult
or BulkWriteException
to return from MongoClient.bulkWrite
.
When constructing the nsInfo
array for a bulkWrite
batch, drivers MUST only include the namespaces that are
referenced in the ops
array for that batch.
Number of Writes
maxWriteBatchSize
defines the total number of writes allowed in one command. Drivers MUST split a bulk write into
multiple commands if the user provides more than maxWriteBatchSize
operations in the argument for models
.
Total Message Size
Drivers MUST ensure that the total size of the OP_MSG
built for each bulkWrite
command does not exceed
maxMessageSizeBytes
.
The upper bound for the size of an OP_MSG
includes opcode-related bytes (e.g. the OP_MSG
header) and
operation-agnostic command field bytes (e.g. txnNumber
, lsid
). Drivers MUST limit the combined size of the
bulkWrite
command document (excluding command-agnostic fields), ops
document sequence, and nsInfo
document
sequence to maxMessageSizeBytes - 1000
to account for this overhead. The following pseudocode demonstrates how to
apply this limit in batch-splitting logic:
MESSAGE_OVERHEAD_BYTES = 1000
bulkWriteCommand = Document { "bulkWrite": 1 }
bulkWriteCommand.appendOptions(bulkWriteOptions)
maxOpsNsInfoBytes = maxMessageSizeBytes - (MESSAGE_OVERHEAD_BYTES + bulkWriteCommand.numBytes())
while (writeModels.hasNext()) {
ops = DocumentSequence {}
nsInfo = DocumentSequence {}
while (true) {
if (!writeModels.hasNext()) {
break
}
model = writeModels.next()
modelDoc = writeModel.toOpsDoc()
bytesAdded = modelDoc.numBytes()
nsInfoDoc = null
if (!nsInfo.contains(model.namespace)) {
nsInfoDoc = model.namespace.toNsInfoDoc()
bytesAdded += nsInfoDoc.numBytes()
}
newSize = ops.numBytes() + nsInfo.numBytes() + bytesAdded
if (newSize > maxOpsNsInfoBytes) {
break
} else {
ops.push(modelDoc)
if (nsInfoDoc != null) {
nsInfo.push(nsInfoDoc)
}
}
}
// construct and send OP_MSG
}
See this Q&A entry for more details on how the overhead allowance was determined.
Drivers MUST return an error if there is not room to add at least one operation to ops
.
Handling the bulkWrite
Server Response
The server's response to bulkWrite
has the following format:
{
"ok": <0 | 1>,
"cursor": {
"id": <Int64>,
"firstBatch": <Array>,
"ns": <String>
},
"nErrors": <Int32>,
"nInserted": <Int32>,
"nUpserted": <Int32>,
"nMatched": <Int32>,
"nModified": <Int32>,
"nDeleted": <Int32>,
...additional command-agnostic fields
}
Drivers MUST record the summary count fields in a BulkWriteResult
to be returned to the user or embedded in a
BulkWriteException
if the response indicates that at least one write was successful:
- For ordered bulk writes, at least one write was successful if
nErrors
is 0 or if theidx
value for the write error returned in the results cursor is greater than 0. - For unordered bulk writes, at least one write was successful if
nErrors
is less than the number of operations that were included in thebulkWrite
command.
Drivers MUST NOT populate the partialResult
field in BulkWriteException
if it cannot be determined that at least one
write was successfully performed.
Drivers MUST attempt to consume the contents of the cursor returned in the server's bulkWrite
response before
returning to the user. This is required regardless of whether the user requested verbose or summary results, as the
results cursor always contains any write errors that occurred. If the cursor contains a nonzero cursor ID, drivers MUST
perform getMore
until the cursor has been exhausted. Drivers MUST use the same session used for the bulkWrite
command for each getMore
call. When connected to a load balancer, drivers MUST use the connection used for the
bulkWrite
command to create the cursor to ensure the same server is targeted.
The documents in the results cursor have the following format:
{
"ok": <0 | 1>,
"idx": Int32,
"code": Optional<Int32>,
"errmsg": Optional<String>,
"errInfo": Optional<Document>,
"n": <Int32>,
"nModified": Optional<Int32>,
"upserted": Optional<Document with "_id" field>
}
If an error occurred (i.e. the value for ok
is 0), the code
, errmsg
, and optionally errInfo
fields will be
populated with details about the failure.
If the write succeeded, (i.e. the value for ok
is 1), n
, nModified
, and upsertedId
will be populated with the
following values based on the type of write:
Response Field | Insert | Update | Delete |
---|---|---|---|
n | The number of documents that were inserted. | The number of documents that matched the filter. | The number of documents that were deleted. |
nModified | Not present. | The number of documents that were modified. | Not present. |
upserted | Not present. | A document containing the _id value for the upserted document. Only present if an upsert took place. | Not present. |
Note that the responses do not contain information about the type of operation that was performed. Drivers may need to
maintain the user's list of write models to infer which type of result should be recorded based on the value of idx
.
Handling Insert Results
Unlike the other result types, InsertOneResult
contains an insertedId
field that is generated driver-side, either by
recording the _id
field present in the user's insert document or creating and adding one. Drivers MUST only record
these insertedId
values in a BulkWriteResult
when a successful response for the insert operation (i.e.
{ "ok": 1, "n": 1 }
) is received in the results cursor. This ensures that drivers only report an insertedId
when it
is confirmed that the insert succeeded.
Handling Errors
Top-Level Errors
A top-level error is any error that occurs that is not the result of a single write operation failing or a write concern
error. Examples include network errors that occur when communicating with the server, command errors ({ "ok": 0 }
)
returned from the server, client-side errors, and errors that occur when attempting to perform a getMore
to retrieve
results from the server.
When a top-level error is caused by a command error (i.e. an { "ok": 0 }
server response), drivers MUST provide access
to the raw server reply in the error returned to the user.
When a top-level error is encountered and individual results and/or errors have already been observed, drivers MUST
embed the top-level error within a BulkWriteException
as the error
field to retain this information. Otherwise,
drivers MAY throw an exception containing only the top-level error.
Encountering a top-level error MUST halt execution of a bulk write for both ordered and unordered bulk writes. This
means that drivers MUST NOT attempt to retrieve more responses from the cursor or execute any further bulkWrite
batches and MUST immediately throw an exception. If the results cursor has not been exhausted on the server when a
top-level error occurs, drivers MUST send the killCursors
command to attempt to close it. The result returned from the
killCursors
command MAY be ignored.
Write Concern Errors
Write concern errors are recorded in the writeConcernErrors
field on BulkWriteException
. When a write concern error
is encountered, it should not terminate execution of the bulk write for either ordered or unordered bulk writes.
However, drivers MUST throw an exception at the end of execution if any write concern errors were observed.
Individual Write Errors
Individual write errors retrieved from the cursor are recorded in the writeErrors
field on BulkWriteException
. If an
individual write error is encountered during an ordered bulk write, drivers MUST record the error in writeErrors
and
immediately throw the exception. Otherwise, drivers MUST continue to iterate the results cursor and execute any further
bulkWrite
batches.
Test Plan
The majority of tests for MongoClient.bulkWrite
are written in the
Unified Test Format and reside in the CRUD unified tests directory.
Additional prose tests are specified here. These tests require constructing very large documents to test batch splitting, which is not feasible in the unified test format at the time of writing this specification.
Future Work
Retry bulkWrite
when getMore
fails with a retryable error
When a getMore
fails with a retryable error when attempting to iterate the results cursor, drivers could retry the
entire bulkWrite
command to receive a fresh cursor and retry iteration. This work was omitted to minimize the scope of
the initial implementation and testing of the new bulk write API, but may be revisited in the future.
Q&A
Is bulkWrite
supported on Atlas Serverless?
No. See CLOUDP-256344
Why are we adding a new bulk write API rather than updating the MongoCollection.bulkWrite
implementation?
The new bulkWrite
command is only available in MongoDB 8.0+, so it cannot function as a drop-in replacement for the
existing bulk write implementation that uses the insert
, update
, and delete
commands. Additionally, because the
new bulkWrite
command allows operations against multiple collections and databases, MongoClient
is a more
appropriate place to expose its functionality.
Why can't drivers reuse existing bulk write types?
This specification introduces several types that are similar to existing types used in the MongoCollection.bulkWrite
API. Although these types are similar now, they may diverge in the future with the introduction of new options and
features to the bulkWrite
command. Introducing new types also provides more clarity to users on the existing
differences between the collection-level and client-level bulk write APIs. For example, the verboseResults
option is
only available for MongoClient.bulkWrite
.
Why are bulk write operation results returned in a cursor?
Returning results via a cursor rather than an array in the bulkWrite
response allows full individual results and
errors to be returned without the risk of the response exceeding the maximum BSON object size. Using a cursor also
leaves open the opportunity to add findAndModify
to the list of supported write operations in the future.
Why was the verboseResults
option introduced, and why is its default false
?
The bulkWrite
command returns top-level summary result counts and, optionally, individual results for each operation.
Compiling the individual results server-side and consuming these results driver-side is less performant than only
recording the summary counts. We expect that most users are not interested in the individual results of their operations
and that most users will rely on defaults, so verboseResults
defaults to false
to improve performance in the common
case.
Why is providing access to the raw server response when a command error occurs required?
This allows users to access new error fields that the server may add in the future without needing to upgrade their driver version. See DRIVERS-2385 for more details.
Why are drivers required to send nsInfo
as a document sequence?
nsInfo
could exceed maxBsonObjectSize
if a user is doing maxWriteBatchSize
operations, each operation is on a
unique namespace, and each namespace is near the
maximum length
allowed for namespaces given the values for these limits at the time of writing this specification. Providing nsInfo
as a document sequence reduces the likelihood that a driver would need to batch split a user's bulk write in this
scenario.
How was the OP_MSG
overhead allowance determined?
The Command Batching Total Message Size section uses a 1000 byte overhead allowance to
approximate the number of non-bulkWrite
-specific bytes contained in an OP_MSG
sent for a bulkWrite
batch. This
number was determined by constructing OP_MSG
messages with various fields attached to the command, including
startTransaction
, autocommit
, and apiVersion
. Additional room was allocated to allow for future additions to the
OP_MSG
structure or the introduction of new command-agnostic fields.
Drivers are required to use this value even if they are capable of determining the exact size of the message prior to batch-splitting to standardize implementations across drivers and simplify batch-splitting testing.
Why is there no requirement to validate the size of a BSON document?
Following "Where possible, depend on server to return errors", drivers should rely on the server to return errors about exceeded size limits. Such reliance is not possible for unacknowledged writes. This specification previously required drivers to check size limits for unacknowledged writes. The requirement has since been removed. Checking size limits complicates some driver implementations. Returning a driver error in this specific situation does not seem helpful enough to require size checks.
Changelog
-
2024-11-05: Updated the requirements regarding the size validation.
-
2024-10-07: Error if
w:0
is used withordered=true
orverboseResults=true
. -
2024-10-01: Add sort option to
replaceOne
andupdateOne
. -
2024-09-30: Define more options for modeling summary vs. verbose results.
-
2024-09-25: Add
collation
field toupdate
document and clarify usage ofupdateMods
. -
2024-09-25: Update the
partialResult
population logic to account for ordered bulk writes. -
2024-09-18: Relax numeric type requirements for indexes.
-
2024-05-17: Update specification status to "Accepted".
-
2024-05-10: Improve rendered format for JSON-like code blocks.
-
2024-05-08: Bulk write specification created.