Quickly insert millions of rows in SQL Server in .NET

02/20/2017

Gérald Barré

A few days ago, I needed to insert about 3 million rows in a table. This is not a frequent operation but I will do it about once a month, and of course, I want it to be as fast as possible to not lock the table for too long. The data come from several CSV files, on which I apply some processing using .NET code. This means I cannot use the great BULK INSERT statement. Instead, I must use the SqlBulkCopy class from the .NET framework. Let's see how to use it with .NET objects!

Let's use this table:

SQL

CREATE TABLE [dbo].[Customer](
    [Id] [uniqueidentifier] NOT NULL,
    [FirstName] [nvarchar](50) NULL,
    [LastName] [nvarchar](50) NULL,
    [DateOfBirth] [datetime2](7) NULL
)

And the corresponding class:

public class Customer
{
    public Guid Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime DateOfBirth { get; set; }
}

For testing purpose, I've created a method that generates as many customer instances as needed:

public static IEnumerable<Customer> Generate(int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return new Customer
        {
            Id = Guid.NewGuid(),
            FirstName = "FirstName" + i,
            LastName = "LastName" + i,
            DateOfBirth = DateTime.UtcNow
        };
    }
}

At this step, you have an IEnumerable<Customer>. If you look at the available overloads of the SqlBulkCopy.WriteToServerAsync method, you can use an array, a DataTable or DbDataReader. Only the last one allows to stream the data. This is very important for the high volumetry of data I want to insert (a few million rows). Indeed, you do not want to load everything in memory before sending the first row to SQL Server. The important question is how to transform an IEnumerable<T> to a DbDataReader?

The DbDataReader class has many methods: Read, GetOrdinal, IsDbNull, GetValue, GetByte, GetChar, GetString, and so on. In practice, you don't need to implement all of them. The 4 firsts are enough. Also, DbDataReader uses ordinals (the column index) to get a value, whereas in .NET we use a property name. This means, we have to add a mapping between property names and ordinals. You also get property values a lot of times. So, instead of using reflection, which may be slow, you can create a compiled expression.

public class ObjectDataReader<T> : DbDataReader
{
    private IEnumerator<T> _iterator;
    private IDictionary<string, int> _propertyNameToOrdinal = new Dictionary<string, int>();
    private IDictionary<int, string> _ordinalToPropertyName = new Dictionary<int, string>();
    private Func<T, object>[] _getPropertyValueFuncs;

    public ObjectDataReader(IEnumerator<T> items)
    {
        _iterator = items ?? throw new ArgumentNullException(nameof(items));

        Initialize();
    }

    private void Initialize()
    {
        var properties = typeof(T).GetProperties();
        _getPropertyValueFuncs = new Func<T, object>[properties.Length];

        int ordinal = 0;
        foreach (var property in properties)
        {
            string propertyName = property.Name;
            _propertyNameToOrdinal.Add(propertyName, ordinal);
            _ordinalToPropertyName.Add(ordinal, propertyName);

            // Lambda: x => x.PropertyName
            var parameterExpression = Expression.Parameter(typeof(T), "x");
            var func = (Func<T, object>)Expression.Lambda(
                  Expression.Convert(
                      Expression.Property(parameterExpression, propertyName),
                      typeof(object)),
                      parameterExpression)
                  .Compile();
            _getPropertyValueFuncs[ordinal] = func;

            ordinal++;
        }
    }
}

Once this is done, the next methods are very easy to implements:

    public override bool Read()
    {
        return _iterator.MoveNext();
    }

    public override int GetOrdinal(string name)
    {
        if (_propertyNameToOrdinal.TryGetValue(name, out var ordinal))
            return ordinal;

        return -1;
    }

    public override object GetValue(int ordinal)
    {
        var func = _getPropertyValueFuncs[ordinal];
        return func(_iterator.Current);
    }

    public override bool IsDBNull(int ordinal)
    {
        return GetValue(ordinal) == null;
    }

    // code omitted for brevity
    // https://gist.github.com/meziantou/174e2791dec966be837746750b87d069

And that's enough. Now, you can test the code:

using (var connection = new SqlConnection())
{
    connection.ConnectionString = "Server=(local);Database=Sample;Trusted_Connection=True;";
    await connection.OpenAsync(ct);

    using (var bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, null))
    {
        var customers = Customer.Generate(5_000_000); // digit separator (C#7)
        using (var enumerator = customers.GetEnumerator())
        using (var customerReader = new ObjectDataReader<Customer>(enumerator))
        {
            // Mapping PropertyName - ColumnName
            bulkCopy.DestinationTableName = "Customer";
            bulkCopy.ColumnMappings.Add(nameof(Customer.Id), "Id");
            bulkCopy.ColumnMappings.Add(nameof(Customer.FirstName), "FirstName");
            bulkCopy.ColumnMappings.Add(nameof(Customer.LastName), "LastName");
            bulkCopy.ColumnMappings.Add(nameof(Customer.DateOfBirth), "DateOfBirth");

            bulkCopy.EnableStreaming = true;
            bulkCopy.BatchSize = 10000;
            bulkCopy.NotifyAfter = 1000;
            bulkCopy.SqlRowsCopied += (sender, e) => Console.WriteLine("RowsCopied: " + e.RowsCopied);

            await bulkCopy.WriteToServerAsync(customerReader, ct);
        }
    }
}

#Conclusion

Using SqlBulkInsert, we can insert a few millions of rows in a few seconds. If you use INSERT statements, you'll need a few minutes. To not use too much memory, you can stream the data to the database by creating your own DbDataReader.

A sample code is available on Gist: https://gist.github.com/meziantou/174e2791dec966be837746750b87d069

Do you have a question or a suggestion about this post? Contact me!

Follow me:

Enjoy this blog?

💖 Sponsor on GitHub