Link Search Menu Expand Document

Bulk Operations


A bulk operation is a process of bringing all the data from the application into the database server at once. It ignores some database specific activities (i.e.: Logging, Audits, Data-Type Checks, Constraints, etc) behind the scene, thus gives you maximum performance during the operation.

Basically, you mostly do the normal Delete, Insert, Merge and Update operations when interacting with the database. Through this, the data is being processed in an atomic way. If you do call the batch operations, it only execute the multiple single-operations together and does not completely eliminate the round-trips between your application and the database.

With the bulk operations, as mentioned above, all data is brought from the client application into the database at one go. Once the data is on the server, it is then being processed together within the database (server), maximizing the performance.

Image below shows the data flow of the BulkInsert operation.

The bulk operations can improve the performance by more than 90% when processing a large dataset.

This feature only supports the SQL Server RDBMS data provider.

How does it works?

It is leveraging the ADO.NET SqlBulkCopy class of the both System.Data.SqlClient and the Microsoft.Data.SqlClient namespaces.

For BulkInsert operation, it simply calls the WriteToServer() method to bring all the data into the database. Unless you would like to bring the newly generated identities back to the application after the execution, there is no additional logic is implied.

For the BulkDelete, BulkMerge and BulkUpdate operations, an implied logic and technique has been utilized.

Image below shows the data flow of the BulkMerge operation.

Basically, a pseudo-temporary table will be created in the database under the transaction context. It then uses the BulkInsert operation to target that pseudo-temporary table and process the data afterwards. Through this technique, we brought all the data together from the client application into the database server (at one-go) and process them together at the same time.

You can maximize the execution by targeting your underlying table indexes via qualifiers, simply pass a list of Field object. The library will then create a CLUSTERED INDEX on the pseudo-temporary table through the passed qualifiers and do the actual joins to the original table using that index.

If you have not passed any qualifiers, the primary column will be used by default. If the primary column is not present, it will use the identity column.

Supported Objects

Below are the following objects supported by the bulk operations.

  • System.DataTable
  • System.Data.Common.DbDataReader
  • IEnumerable<T>
  • ExpandoObject
  • IDictionary<string, object>

Operation SQL Statements

Once all the data is in the database pseudo-temporary table, the correct SQL statement will be used to cascade the changes towards the original table.

For BulkDelete

> DELETE T
> FROM [dbo].[OriginalTable] T
> INNER JOIN [PseudoTempTable] TMP ON TMP.QualiferField1 = T.Field1 AND TMP.QualifierField2 = T.Field2;

For BulkMerge

> MERGE [dbo].[OriginalTable] T
> USING [PseudoTempTable] S ON S.QualiferField1 = T.Field1 AND S.QualifierField2 = T.Field2
> WHEN NOT MATCHED THEN
>    INSERT (...) VALUES (...)
> WHEN MATCHED THEN
>    UPDATE
>    SET (...);

For BulkUpdate

> UPDATE T
> SET T.Field3 = TMP.Field3
>     , T.Field4 = TMP.Field4
>     , ...
> FROM [OriginalTable] T
> INNER JOIN [PseudoTempTable] TMP ON TMP.QualiferField1 = T.Field1 AND TMP.QualifierField2 = T.Field2;

Special Arguments

The arguments qualifiers, isReturnIdentity and usePhysicalPseudoTempTable were provided to the BulkDelete, BulkMerge and BulkUpdate operations.

The argument qualifiers is used to define the qualifier fields to be used in the operation. It usually refers to the WHERE expression of the SQL Statement. If not given, the primary or identity column will be used.

The argument isReturnIdentity is used to define the behaviour of the execution whether the newly generated identities will be set-back to the data entities. By default, it is disabled.

The argument usePhysicalPseudoTempTable is used to define whether a physical pseudo-table will be created during the operation. By default, a temporary table (i.e.: #TableName) is used.

Please be noted that it is not recommended to enable the usePhysicalPseudoTempTable argument if you are to work with parallelism. Ensure to always utilize the session-based non-physical pseudo-temporary table when working with parallelism.

Identity Setting Alignment

The library has enforced an additional logic to ensure the identity setting alignment if the isReturnIdentity is enabled during the calls. This affects both the BulkInsert and BulkMerge operations.

Basically, a new column named __RepoDb_OrderColumn is being added into the pseudo-temporary table if the identity field is present on the underlying target table. This column will contain the actual index of the entity model from the IEnumerable<T> object.

During the bulk operation, a dedicated DbParameter object is created that targets this additional column with a value of the entity model index, thus ensuring that the index value is really equating the index of the entity data from the IEnumerable<T> object. The resultsets of the pseudo-temporary table are being ordered using this newly generated column prior the actual merge to the underlying table.

When the newly generated identity value is being set back to the data model, the value of the __RepoDb_OrderColumn column is being used to look-up the proper index of the equating entity model from the IEnumerable<T> object, then, the compiled identity-setter function is used to assign back the identity value into the identity property.

Caveats

The library is automatically setting the value of the options argument to SqlBulkCopyOptions.KeepIdentity when calling the BulkDelete, BulkMerge and BulkUpdate if you have not passed any qualifiers and if your table has an identity key column.

In addition, when calling the BulkDelete, BulkMerge and BulkUpdate operations, the library is creating a pseudo-temporary table behind the scene. It requires your user to have the proper CREATE TABLE privilege to create a table in the database, otherwise a SqlException will be thrown.

Calling the Operations

Below are the ways on how to call the operations.

For BulkDelete

The code snippets below only showcasing the BulkDelete via IEnumerable<T>.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete<Customer>(customers);
}

Or with qualifiers.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var qualifiers = Field.Parse<Customer>(e => new { e.LastName, e.BirthDate });
    var rows = connection.BulkDelete<Customer>(customers, qualifiers: qualifiers);
}

Or via primary keys.

using (var connection = new SqlConnection(connectionString))
{
    var keys = new object[] { 10045, ..., 11211 }
    var rows = connection.BulkDelete<Customer>(keys);
}

You can also target via table-name.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkDelete("[dbo].[Customer]", customers);
}

For BulkInsert

The code snippets below only showcasing the BulkInsert via IEnumerable<T>.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkInsert<Customer>(customers);
}

Or via table-name.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkInsert("[dbo].[Customer]", customers);
}

For BulkMerge

The code snippets below only showcasing the BulkMerge via IEnumerable<T>.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge<Customer>(customers);
}

Or with qualifiers.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var qualifiers = Field.Parse<Customer>(e => new { e.LastName, e.BirthDate });
    var rows = connection.BulkMerge<Customer>(customers, qualifiers: qualifiers);
}

You can also target via table-name.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkMerge("[dbo].[Customer]", customers);
}

For BulkUpdate

The code snippets below only showcasing the BulkUpdate via IEnumerable<T>.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkUpdate<Customer>(customers);
}

Or with qualifiers.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var qualifiers = Field.Parse<Customer>(e => new { e.LastName, e.BirthDate });
    var rows = connection.BulkUpdate<Customer>(customers, qualifiers: qualifiers);
}

You can also target via table-name.

using (var connection = new SqlConnection(connectionString))
{
    var customers = GetCustomers();
    var rows = connection.BulkUpdate("[dbo].[Customer]", customers);
}

When to use the Batch and Bulk Operations?

There is no standard of when to use what. It all depends on your situation (i.e.: Network LatencyData, No. of Columns, Type of Data, etc).

The pros of using a bulk operation is maximum performance, however, it keeps blocking the underlying table while being under the context of bulk operation transaction. Therefore, it might trigger a deadlock if not handled by the developers properly.

We highly recommend to use the bulk operations if the data sets you are working is beyond 1000, otherwhise, just use the batch operations.