Data Warehousing/Business Intelligence With OutSystems

A fairly common question is “how do I get my data warehouse (DW) or business intelligence (BI) application to work with my OutSystems application?” Anyone who has looked at the underlying database of OutSystems knows that it can be confusing. The magic that the system does with the Entities, module rollbacks, and more comes at the cost of a data model which is not a “just write queries against it!” design. That makes working with a DW and BI a bit less fun. Even worse, the underlying table names corresponding to the Entities can often be different in each environment, making it challenging to write queries and test them.

Pull Data Out

I feel that it is easier to expose data externally and allow a DW or BI solution to pull the data as-needed. That keeps the logic they need out of the OutSystems solution. It also means that you can make changes on that end without changing your application.

I may be biased (I am one of the maintainers, but not the original author) but I think that the View4Entity component (V4E) will be at the center of most successful DW strategies. V4E is easy to use: it lets you see all of your Entities in your projects, select some, and generate the DDL to create views for that specific environment that have the same names as the Entities, rather than the tricky names from the actual tables. As a bonus, V4E lets you tweak the names as-needed (let’s say you have two modules with an Entity named “Customer”, you need to resolve that conflict… and V4E even highlights the conflict for you!), save the configuration, and load it in a different environment. That gives you the ability to easily gets your views in each environment exactly the same. From there, you write your logic in an ETL system, the DW or BI tool, or whatever else you need to have access the data to use the views, and you are good to go.

If you want something that does not require the external system to have access to your database, you can set up Web Services that expose the data, or run pre-defined queries or reports against the database to the other end. The advantage is that this is a very universal system. The downside is that you now have a full set of services to maintain and grow as your data model and needs change.

Push Data Out

The next approach is to push the data out to a DW/BI system.

One way to do this is with a BPT process. This has the advantage of automatic retries in case the system is down, will automatically be done asynchronously, and requires no “bookkeeping” on your end to track what has and has not yet been synced. It won’t quite be real-time (BPT has a polling system and a queue) but it will be mighty close (well under a second if the BPT system is running smoothing and not soaked). But… the concern with using BPT for this, is that you may saturate the BPT engine with Activities, and quickly fill your database with records around BPT.

If you still want an asynchronous way of getting the data over, you can use timers for this. A regularly-scheduled push can work well if your DW/BI requirements do not need real-time data, especially if your application has a natural time of day where it is less-busy and a big timer can run without problems. You would need to write a system to track what has changed for the day and something like a dirty bit to make sure you only sync the bare minimum and do not re-sync the same data over and over again in case of failure.

I am a big fan of architecture where the Entity Actions (CreateXYZ, UpdateXYZ, DeleteXYZ, and CreateOrUpdateXYZ) are never called outside of a business logic layer, and even within the business logic layer are never called outside of bootstrap logic and a set of standardized actions that expose CRUD functionality. This architectural technique allows you to do all sorts of great things, and one of them is ensuring that any changes to the data get pushed to a DW/BI system along the way. The downside is that you are now making a synchronous call to an external system on every write to the database, which can slow things down (make sure to set a low timeout threshold). And you will still need to back this with a timer or BPT to clean up anything that fails to sync on the initial try. But it will guarantee real-time data syncing without slamming the BPT system.

Which One?

The approach you take will depend greatly on your needs. I personally believe that using V4E is the absolutely best way to do this. If V4E does not meet your needs, and if you need real-time access to the data, either a synchronous approach through the business logic, or BPT is the answer. If it will be very high-volume, BPT will probably not be a good choice, and it should either be done with a timer, or synchronously through business logic. Business logic should probably be your last choice, just because it will have a performance penalty and need the overhead of tracking synchronization. I would use Web Services only if needed for security or interoperability reasons.

J.Ja

OutSystems MVP & longtime technical writer