Table of Contents (top down ↓)
Batch Updates
Batching is an automatic process. It is an optimization by EF Core to prevent multiple round trips to a database server. The EF Core engine uses its entity tracking feature to make this optimization.
Consider this code. Let me explain it line by line. First of all we extract an item and make an update to its price. This translates to an SQL UPDATE statement.
// EF Core uses entity tracking to // automatically batch multiple updates var item = _ctx.Items.Single(x => x.ID == 2); // UPDATE item.Price = 200; // INSERT _ctx.Items.Add(new Item() { Name = "...", Price = 300 }); // INSERT _ctx.Items.Add(new Item() { Name = "...", Price = 200 }); // batching here - one trip to server _ctx.SaveChanges();
After that we add two items to the items table. This translates to two SQL INSERT statements.
These changes could need three different trips to the database. But EF Core uses entity tracking and batching to make these changes in a single trip to the server. Hence you can sequence your statements to ensure that the SaveChanges call takes place at the most optimal point.
Video Explanation (see it happen!)
Please watch the following youtube video:
MaxBatchSize and MinBatchSize
EF Core combines and clubs a number of statements for batch updates. The actual number of statements clubbed together is dependent on the database provider that you are using. You should refer the specific documentation for their recommendations and defaults.
It is possible to set a batch size for your specific needs. Consider the code shown here.
// 1. check your database provider documentation // 2. tweak only after extensive testing and benchmarking builder.Services.AddDbContext<MyProjectContext>( opt => opt.UseSqlServer( "Data Source=(localdb)\\ProjectModels;" + "Initial Catalog=mydb;Integrated Security=True;", ( so => { so.MinBatchSize(1) .MaxBatchSize(20); } ) ) .LogTo(Console.WriteLine, LogLevel.Information) );
In this code of AddDbContext we have set minimum and maximum batch sizes.
But the EF Core team recommends that this tweaking should be done on the basis of extensive testing and benchmarking. It is better left to the default values. But it is always there when you need it.
Case study of bulk updates
Finally, we take a case study of the bulk updates.
Consider this simple code on the C# side. The objective is to update the price of all items by a certain amount, say, 20.
// Performance Penalty 1: // must fetch all the columns for tracking // even though only price is to be updated var items = _ctx.Items; // second trip (thanks to bulk updates!) // Performance Penalty 2: // however, one SQL UPDATE statement // executes for each item at the database end // this means 20 updates = 20 different UPDATE foreach(var item in items) { item.Price += 10; } _ctx.SaveChanges();
The first problem is that we must fetch all the columns for tracking purposes even though we need only the price. This has a performance penalty.
The foreach loop runs to update each item. But EF Core bulk updating feature ensures that there is a single round trip to the database server. It is good; we are safe here!
But there is a problem at the database end. Each item has to be independently updated because there has to be a dedicated UPDATE statement for every item.
Let me give you a simple solution to this.
Why don't we use ExecuteSqlRaw to execute a direct SQL query? "UPDATE items SET price = price + 10"
// one step solution! String q = "UPDATE items SET price = price + 10"; _ctx.Database.ExecuteSqlRaw(q); // think about it!
Think about it! Thankyou!
Similar Posts
This Blog Post/Article "(EFCore Best Practices) Batching in EF Core and Planning your Bulk Updates" by Parveen is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.