September 21, 2021

HELP REQUIRED for Lost Update problem in multiple cloud instances of an app accessing the same database table

HELP REQUIRED for Lost Update problem in multiple cloud instances of an app accessing the same database table

So the situation is something like this:

In a large multitenant cloud product including multiple microservices, a migration of existing customer data to a new format needs to happen. The following is the flow:

  1. Spring Boot App 1, which will have exactly 1 instance in a live environment, at the time of starting up, will read the tenant details from a shared database , let's call the table as "Tenants".
  2. Spring Boot App 1 will launch a Kafka Message for all tenant records which indicate that the data for that tenant has not been migrated yet. The Kafka Message body will contain the tenant name.
  3. Spring Boot App 2, which will have at least 3 instances running in the cloud at any given time, will consume this Kafka Message and launch a Spring State Machine instance. The Spring State Machine is configured to execute multiple steps as "actions" to transform the tenant's data (which is stored in tenant specific databases)
  4. The steps relevant to the problem are the first and last step, which are supposed to query the the "Tenants" table with the tenant name from the Kafka Message, in the shared database (same one that App 1 read from) and update a column value indicating that the migration has been started or finished (as applicable).

The Problem:

Everything from Spring Boot App 1 to the Kafka Consumer and State Machine of App 2 has been developed and tested in cloud. But even before I could implement the actual steps of the migration, I tested out the code in cloud (we have a seperate test environment in cloud), I noticed that a non 100% number of tenant records have been updated in the shared database table "Tenants". But if I put loggers in my code and check the application log, I observe that SQL statements have been executed by the application to do the update. I am using Spring Data JPA repository find(id) and save(entity) method calls from a transactional method to do these updates. Every time I execute this flow a different number of tenant records reflect the updates, but never all of them.

This problem does not exist in my local development environment, which is the exact same as the cloud environment except it only runs 1 instance of any app at any given time. Which has led me to postulate that this is a typical Lost Update problem situation, where multiple instances overwrite each other's changes when they flush their persistence context simultenously. Am I correct? If yes, what would be the solution? Just to note, Parallel Kafka queue proccessing is required to enhance the performance as this flow happens at the startup of the application, and there are expected to be hundreds, if not thousands of tenants in the live environment.

What can I do to make this system function correctly and at the same time not have a hude performance trade off.

submitted by /u/enkayjee2
[link] [comments]