Nested loops versus List Operations and effect on performance tuning

I was asked to help a fellow Mendix developer with some workflow issues and something stood out: There were a LOT of nested loops. Though preached (commanded?) by computer science professors as bad practice for very good reasons, we all know in the 'real world' outside of the walls of academia that sometimes you require the kind of result that can only be achieved by comparing multiple lists as they relate to each other. This creates a massive amount of overhead on a single-threaded application and can render an application useless if those lists are sizable. Sure, domain modeling is often to blame, but tools like what Mendix provides are supposed to make it easier for non-developers to do this kind of thing. There is a better way, and Mendix provides a set of actions to help you deal with these in a more efficient way: List Operations.

One of the most common business problems to deal with in a workflow is looking for a specific attribute in List B for each record in List A. There are two primary ways you can deal with this: Nested Loops that iterate through each record, or the use of List Operations.

Here is an example of how to solve the problem using either a Nested Loop or a set of List Operations. One requires an explicit procedural thread (Nested Loop) that iterates through unnecessary data to find the intersections, and the other lets the platform perform the function in-memory on only relevant information.

Suppose you have a business requirement that asks you sort an email account list based on some user pre-selected "favorites" in order to put them on the top of the potential email recipient list for ease of choice. Assume you built an association between an account to another account (self-referencing) as a Reference Set, because an account can have more than one favorite. Another option would be to create a new persistent entity that stored just the name and email of the individuals they choose as favorites and relate it back to the account. (On a side note, this second option is not a good domain model decision because you are duplicating data.) I present it only to showcase that there is always more than one way to solve a problem.

Now, in you microflow, if you went the nested loop route, it would look something like this:

Nested Loop Example

In this example, the two lists (EmailFavorites and Accounts) are retrieved first. Then the processor has to retrieve the first record of the AcccountList, retrieve the first record of the EmailFavorites List, compare the attributes that should match, and then toggle a Boolean 'Favorites' accordingly, then while on the same Account record, iterate through the next EmailFavorites record and repeat the process. What this flow does is write these records to a persistent entity during the loop (that gets wiped at the start of this process to ensure no duplicates are created), retrieves that list where the Boolean 'Favorites' resides, sorts it according to that Boolean to put the 'Favorites' at the top of the list, then returns that list.

Does it work? Yes, it absolutely does. And because this particular app only has a handful of accounts, it processes before you can blink. Imagine, however, that it's time to scale this application to tens of thousands of accounts each with their own unique lists of favorites. While this will still work, it will potentially bring the application to it's knees trying to process the loops, especially if even 1% of that population of accounts tries to send an email and run this microflow at the same time!

Obviously I would first address the domain model to stop all of this duplication of data into various entities; EmailFavorites should be that Account self-association reference set mentioned earlier and you shouldn't be creating records in this process at all. I'm not going to correct the first issue directly in this post in order to focus on fixing the second issue and evaluating the solution using an extreme case of List Operations. Here's what that looks like:

List Operation Example

Besides looking less busy, what are all of these list operations doing to replace Nested Loops?

  1. Notice the same two lists get retrieved. No difference here.
  2. Using an 'Intersect' list operation, I'm able to return a list with only Accounts that are in both lists. No needs to iterate and compare, the 'Intersect' function does it for me. Behind the scenes these functions have been tuned to operate much more efficiently than iterating through every record.
  3. Next I sort that intersected list.
  4. I removed the Accounts on the Intersected list from the original Account list. I do this because I'm going to "append" the Account list to the Intersected List. This ensures that the "Favorites", or the intersected list, are at the top of the list.
  5. This is the append action mentioned.
  6. Finally I return this list.

I don't have performance metrics to test the processing time gained with the second approach, but I know from experience the difference these kinds of actions have on the performance tuning of an application. At the very least you eliminate a bunch of data duplication and unnecessary I/O to the database.

Next time you face the prospect of using a loop, think long and hard to see if a List Operation is a better choice and save yourself some rework later on down the road!