How can code execute correctly sometimes, and incorrectly others?

EventHandler.png

This past week I encountered a problem that I was at my wits end to solve. The problems was this: Randomly, a process wouldn't work. The same code was getting executed by the press of a button, but sometimes it just wouldn't work. Was something wrong with the engine in the Cloud since it always worked in local testing?  After a brief discussion with a few members of Mendix’s own Success Team, we were able to find the root cause and redesign to care for the issue. Read on and hopefully I'll save you from some similar headaches.

Early on in the development of one of the apps I worked on, we built a microflow that needed to be executed each time a user pressed a certain button. As additional features were added we needed to execute this microflow after other button presses or automated background processes so we turned it into a sub-microflow to make the code management easier as a best practice. For those that don’t already know, anytime you need to do a specific set of process steps in a microflow more than once, the best practice is to select those steps, right-click and select ‘Create Submicroflow’, and give it a name like ‘SUB_...’. Doing this will prevent you from having to maintain those steps in multiple places and instead just modify the ‘SUB_...’ microflow.

A few months’ ago we were debugging another process and realized three of these ‘SUB_’ microflows were simply changing an attribute in an entity based on other attributes. We then removed these submicroflows from the parent microflows and just made the attributes ‘calculated’ instead of ‘stored’ and executed these whenever the data changed.

Well, not long into testing we realized a few things. First of all these calculations were updated upon every refresh which is not what we wanted for these particular attributes; we wanted them to update only when the data was committed because of what they triggered. Second we figured that we actually didn’t want two of them to trigger upon commit of the entity they were currently in, but upon commit of an associated object that then updated that entity. So we knew we had a problem with these fields and that event handlers were really the direction we needed to go.

We split our stories to fix this into multiple stories for each of the three microflows. That was a mistake as we should have just made this one story with three tasks. By splitting it, multiple developers picked it up and rewrote what was needed. The problem was that while each passed their unit testing, there wasn’t a test setup to regression test this against historical data. We ended up with a situation where the two entities had events being triggered that depended and called the other one, creating a loop that didn’t work.

We fixed that and the result was this:

  • An after commit event in the ‘many’ associated entity.
  • Two after commit events in the ‘one’ parent entity.

Everything was going to work now, right? Yes, and no. It would work sometimes, and then other times it wouldn’t work. I was completely stumped. How does code execute properly running a scenario, and then execute improperly running the identical scenario? I knew something must be different, but I also knew the scenarios were executing flawlessly. Also, it always worked running locally, but the cloud environments would work sometimes and sometimes they wouldn’t.

I hooked up the debugger to the cloud environment and wrote a bunch of trace logs. As I stepped through the scenario it worked perfectly every time. As soon as I detached the debugger it would fail. I was forced to contact my partners at Mendix to help me solve this one, which if you know me at all you would know how hard it is for me to throw my hands up and ask for that kind of help!

Within minutes, Robert Bond (Mendix) stated that it could be an issue with the order of the event handlers. Huh? He then explained that event handlers aren’t run in any particular order. The list that you see when creating them is not in any particular order (notice there is no ‘Move Up’ or ‘Move Down’ buttons?). That’s why they recommend you sequence all of your processes into one microflow for each event type: After Commit, Before Commit, Before Delete, and After Delete. This way you can guarantee that your order of operations is intact.

Sure enough, that was the issue. So what have we learned here?

  • Event Handlers aren’t processed in any particular order. Script them accordingly.
  • Calculated fields execute when the data is refreshed so be careful about their use or you’ll create a lot of overhead on your application. In fact, they are virtually worthless in my opinion except in very specific situations (like a mathematical equation such as ‘TOTAL AMOUNT’ which is based on Quantity in one entity and Price in another.
  • Consider Event Handlers in your design from the beginning. I’ve been guilty of just writing a bunch of microflows and submicroflows and calling them during each individual process instead of asking when the impacted data should be updated. If it depends on a commit or a delete, you should make those part of an Event Handler.
  • Don’t forget you can commit without events. This is very important to avoid infinite loop issues where the commit calls the microflow that commits again in perpetuity.

As I suspected, it was indeed a problem with the code and not the engine. It’s always the code.