All leading Wall-Street funds run a number of trading adapters that connect various brokers and dealers’ trading systems to exchanges around the world in order to support their algorithmic or manual trading. Different adapters will be designed and customized in different ways, for example, many will utilize the FIX protocol, others will employ native protocols to an individual exchange.
So what is the best way to monitor the status of these highly-distributed adapters? How can you know if an adapter is down or is producing unusual data? And how do you control the performance impact of monitoring your performance-critical systems?
Trading adapters are deployed on different hosts which connect to each other through LAN connections. The monitoring system watches both the adapter application status as well as the status of connections to multiple internal and external systems. Your adapters probably connect to these systems through various types of middleware such as Tibco, ZeroMQ, RabbitMQ, etc. This can cause you monitoring headaches.
At Bleum, when we encounter this situation we develop an independent library as a common interface for the different types of middleware. All adapters contain several transports which are callback threads from the independent library. Our upstream systems cannot know what type of middleware they are dealing with so in order to monitor the connections with internal systems, the monitor needs to communicate with the independent library to gather and consolidate information from each middleware system. A major consideration of any monitoring system related to trading systems is that the performance impact on the adapters themselves should be minimal.
A major consideration of any monitoring system related to trading systems is that the performance impact on the adapters themselves should be minimal.
In order to reduce the project lifecycle and control any potential problems, we reuse the current system architecture as much as possible. Then, we build a monitoring agent that has a specialized transport between the adapters and the monitoring system. A customized distributed key-value store is used to manage all monitored data. The agent reads and keeps all LAN information for each adapter instance during its lifecycle. The agent can immediately identify if a transport fails to initialize or is disconnected for an unknown reason. The agent collects and summarizes all other transport information and health data, and sends it to the monitoring cluster key-value store. The message is JSON formatted which is simple and readable.
Each adapter sits on its own server between the order routing system and the exchange/broker system. The monitoring agent for each server reports back to the cluster, which aggregates data to be viewed by users.
It is important that the monitoring system has high flexibility in case it needs to support multiple platforms. For this reason, a common transfer protocol like HTTP is a good choice.
- All messages come to the store through an HTTP service and, later, they can be easily referenced by the browser, shell script or a GUI.
- If a port in any adapter is down or missing from the local network, the monitor will know this no later than a configurable timeout duration.
- It is possible for an adapter to be administratively closed, even though the physical adapter is up. This is necessary for adapters from a business perspective as sometimes the remote side is not available (for example the exchange market is not open) or users want to set the adapter mode to allow only cancel requests (so that no new position will be opened).
All of the abovementioned events/statuses can now be monitored on the server side.
Security is also very important because the adapter information may be confidential. The server has authorization and authority components which help to divide rights and liabilities, increasing the usability of the monitor.
There are several benefits to the Bleum solution:
Monitoring real time trading systems is vital.
You need visibility of existing performance to provide assurance, as well as being able to identify bottlenecks to target your development efforts.
With a scalable, flexible system, that has a minimal performance impact on adapter operations, you can collect the valuable data that you need to drive improvements in performance and reliability.