How often do enterprise companies face data consistency issues? As it turns out, these problems arrive pretty often. A lot of companies adopted APIs as the primary way to exchange data between processes, services, and departments. Both internal and external interfaces, those API mashups sometimes need careful analysis and audit to avoid data consistency and integrity issues. This article is about how we have built a product and integrated it with existing systems to fix data consistency related issues. I will share what worked for us and why, so you can see what instruments and tools enabled us to highlight problematic data sources and build a tool to fix them. We call these tools API Flatteners or API Aggregators.
When you have a lot of systems that produce data and quite a few departments working with those systems, data modification sometimes became a delicate operation, frequently creating consistency and data integrity issues. The more features we add to our platforms the harder it becomes to manage data, which can very often lead to problems. A lot of people think that once they have their systems built that data consistency issues will not happen or will not be possible. But we know the opposite. It’s better to plan for slow and unreliable network connections, bad data, and interruptions, and then design systems for fault-tolerance.
There are quite a few challenges in the way to fix data consistency issues. First, you need to get permissions and allocate resources. Secondly, you need to define boundaries and clearly understand the ramifications of every change you make. Either way, initially someone has to review the data to understand it proactively. You need to find someone who knows all the systems involved and who invests a lot of diligence to make sure the changes are in sync with the rest of the system. That is not an easy task, especially if you have a lot of services, APIs, and data sources. It becomes even more dangerous if you have poorly designed public APIs exposed to your customers without proper versioning and security in place, but that is a separate topic.
We love technologies that enable us to achieve our goals more easily and efficiently. Nothing is better than techniques and tools that boost our productivity while solving specific thorny issues. We tend to say: the right tools for the right job. But realize that it is a challenge to understand what the right tool is for that specific job.
Our customer has a lot of data and a lot of data integrity issues associated with it. The data was represented in various forms and shapes, so to say. Quite a few Salesforce instances and a few data centers, not to mention data sources from multiple platforms. We were privileged to work with quite a few backend systems, quite a few APIs, data sources from external sources, and a Salesforce installation.
We started to analyze APIs to do a high-level review of the endpoints that are exposed to various systems – the root data sources. We have built a tree of data and mappings so that we know how to combine large chunks of data, which by itself does not solve anything but is still necessary. Before you can tell what causes data integrity issues, you need to fetch a large portion of the data, map it, and then analyze it, so you can understand how the whole infrastructure works.
We needed a robust aggregation tool, one that would fetch data quickly and build a detailed view of it. We put together a robust ReactJS-based application and started to talk to dozens of APIs to query the data. ReactJS was a perfect tool for the job. The tool we have built also cna modify data quickly and “on the fly” and re-initiate some business processes associated with that data. In other words, we enabled our customer to fix the data and re-submit it to the various underlying systems with real-time result feedback.
We also added a few additional internal API endpoints to fetch intermediate data from some existing legacy Java systems and created some proxy routines to talk to APIs efficiently. We had to work with legacy Spring Boot Java systems and some other ones adding new code and modifying existing ones, adding appropriate API response status codes, better-structured JSON payloads, etc. Also, we had to talk to a Salesforce (SFDC) instance via its REST-ful API to get data for analyzing and push back to SFDC. We had to create a proxy between our ReactJS app and the SFDC installation to avoid information disclosure within the user’s browser and for better performance that allowed us to have caching capabilities within our NodeJS proxy.
We built this robust APIs aggregation App from scratch and here is how it looks and works:
This may feel like putting a glue tape on top of the problem, and it is true that it only helps to reduce the number of «open» and known issues. So, yes, data consistency issues are still in place. However, we dramatically improved the situation and gave our client the ability to “fix” data, look at the problems closely, and enabled them to start attacking those issues.
When you talk to many end-points and change data, here is our recommendation: please pay close attention to
HTTP header. Sometimes the Browser may cache your
AJAX requests. We found this issue during intensive testing. Save yourself a few hours and make sure that your App properly handles cache control.
Building a data-analyzing tool allowed us to quickly come up with a solution and a road-map for our customer to shed light on their data issues. Upon its final release, our tool was talking to dozens of APIs, Salesforce platform endpoints, extracting data for analysis in one secure place – a fast and snappy web application! As we can see, building a data analyzing tool allowed us to quickly came up with a solution and a road-map for our customer to shed light on data issues.