A bad thing happened the other day. Here was the sequence:
- Batch job started.
start()
method returned 5000 + Opportunity rows. - Database.stateful used to record an internal log of activity for subsequent posting in the finish() method.
- Each batch of 200 was passed to execute() method.
- Execute method added bits to the stateful log (a string variable).
- Batch 21 (out 28) blew up on a Limits Heap Size exception. Blow up continued on batches 22-28.
- finish() method started and took value from Database.Stateful variable and persisted to Sobject
Log__c
(s). - AND HERE IS WHERE BAD THINGS HAPPENED…
finish()
method started a “finalize” batch job, passing a list of sobject IDs that had exceptions in any previous batch execute. The finalize batch job (i.e. the chained batch job), made updates to all Opportunities that weren’t selected from the previous batches start() method and weren’t already marked as exceptions. In my case, these Opportunities were marked as closed lost.
So .. because the Opportunities in batches 21-28 were never processed and never marked with an exception (because of the uncatchable Limits exception), the chained (second) batch job blithely assumed that the Opportunities in batches 21-28 had never been fetched in the previous batch job’s start() method. Hence, perfectly good Opportunities got marked as closed lost.
Uh-oh.
So, what should I have done differently?
First, I wrongly assumed that a Limits exception would terminate the entire batch job, not just the currently running execute()
batch.
And, since of this misconception, the finish()
method unconditionally executes without knowing if all of the batches passed without uncaught exceptions. And, any work the finish() method performs that involves DML-type work, including scheduling a subsequent chained job may lead to incorrect behavior.
- The finish() method has access to the
BatchableContext
and can get, viagetJobId()
, theAsyncApexJob
that represents the batch job. AsyncApexJob
has a fieldNumberOfErrors
that identifies how many batches are in error. If greater than zero, appropriate business logic should be applied.
Of course, the Limits Exception needs to be avoided in the first place by taking a different approach to stateful logging. I’ll investigate this in a subsequent post (but don’t stay up waiting for it!)
This is great stuff Eric!!!!
Yikes! Perhaps another good argument for moving to queueable classes?