2021 - 2023
~2881 words, 15 min read
This comprehensive cryptocurrency trading framework, written in Python, allows for multi-instrument, multi-exchange, and multi-strategy trading. By launching multiple independent processes for various tasks and leveraging the capabilities of Redis for data management and inter-process communication, the framework provides an efficient and robust platform for sophisticated crypto trading activities.
The project's architecture follows a modular approach, organized into various packages, each serving a specific role within the broader framework. The framework ingests up to 2.5 TB of data per month, can record level 2 order book data, compress and clean it, and store it in Parquet format, a highly efficient columnar storage file format optimized for speed and storage. It also supports the execution in parallel of semi-high-frequency trading (HFT) strategies on multiple instruments and exchanges at the same time.
The redis_manager
package serves as the central communication hub for the system. It uses the Redis pub/sub feature for inter-process communication. When a process needs to communicate a particular event to other processes, it publishes a message to a channel. Other processes that are subscribed to this channel receive the message and react accordingly.
Real-time data from different exchanges is managed through the feed_handler
package. This package maintains active WebSocket connections with the exchanges to stream real-time market data. It normalizes and dispatches this data for use by other components of the system.
The exchange_api
package contains modules for interacting with various cryptocurrency exchanges, such as Binance, OKEx, and FTX, among others. These modules handle API communication with the respective exchanges, providing the ability to fetch market data, place orders, handle private data, etc.
The db_handler
package manages all interactions with the database. It uses Redis as an in-memory database for fast access and retrieval of data, critical for efficient trading operation. It stores various types of data, including market data, order data, balance data, etc.
The watchers
package contains components that monitor market conditions and trading opportunities. It houses seekers
and triggers
that actively look for potential trades based on predefined strategies and initiate trades when specific conditions are met or when the user manually triggers a trade.
The standalone_tools
package provides standalone tools for various functionalities like showing balances, showing trades, reloading data API, etc. These tools offer supplementary features that aid in managing and monitoring the system.
The recorders
package handles recording and reading trading data. It plays a crucial role in the framework by logging trading activity and market data. This data can be used for backtesting strategies, auditing, and performing data analysis.
The framework starts multiple standalone processes, each carrying out a specific function.
The feed_handler
opens WebSocket connections with the exchanges and begins streaming real-time market data.
This data is normalized and published to various Redis channels using the pub/sub feature of Redis.
Different processes subscribe to the relevant channels and react to the incoming data. For example, the watchers
process listens for market data and identifies potential trades when market conditions match a predefined strategy.
When a potential trade is identified, an order is created and published to a Redis channel.
The exchange_api
process listens for orders on this channel, and when it receives an order, it communicates with the relevant exchange's API to execute the trade.
All the trading data, along with the market data, is logged by the recorders
process for future reference and analysis.
Additionally, the db_handler
constantly updates the Redis database with relevant data, ensuring all processes have access to the most recent and relevant data for their operation.
Given the technologies and architectures employed, this crypto trading framework is designed to operate with high efficiency and reliability.
The framework is currently hosted on a dedicated debian bare metal server. This is the prefered option because it provides the flexibility and control needed to run the system efficiently. The server is equipped with 64GB of RAM and 12 CPU cores, which is sufficient for the framework's current needs. Also working on a dedicated host allows for easy controle of the system's resources and configuration.
The system exhibits robust performance attributes:
Concurrency and Parallelism: By leveraging Python's multiprocessing capabilities and launching separate processes for each major function, the framework ensures parallel execution. This design aids in maximum CPU utilization, enhances throughput, and improves the overall responsiveness of the system.
Real-time Data Processing: The use of WebSockets for data feed handling allows the system to stream and process data in real time. This capability is crucial for capturing and acting on quick market movements, providing a significant advantage in crypto trading.
In-Memory Database: Redis, as an in-memory data structure store, offers low latency and high throughput. It ensures fast data access and manipulation, which is vital for handling real-time trading data and executing high-frequency strategies.
Efficient Data Storage: For data that needs to be persisted, the framework utilizes Parquet, a columnar storage file format optimized for speed and efficient storage. It not only reduces the storage space requirement but also allows for faster read/write operations.
The data consumption of the system is primarily driven by the amount of real-time market data it processes, the number of trades executed, and the data logging level.
Redis Data: The framework extensively uses Redis for storing various types of data like market data, order data, balance data, etc. Despite Redis's in-memory nature, it provides options for persisting data on the disk, thereby giving the system a good balance between speed and persistence.
Data Ingestion: The system ingests up to 2.5 TB of data monthly. It's worth noting that the data ingestion is dependent on several factors like the number of exchanges connected, the amount of market data streamed, and the number of trades executed.
Data Recording: The recorders package records trading data for future analysis and auditing. Depending on the logging level and the volume of trades, the size of these records can be significant. However, by employing Parquet for storing this data, the system ensures efficient use of storage resources.
Keep in mind that the actual performance and data consumption can vary based on the system's configuration, the volume of trading, the number of connected exchanges, and other factors.
In the realm of efficient data storage, the framework stands out for its use of Parquet, a columnar storage file format that's optimized for speed and efficient storage. Parquet not only reduces the storage space requirement but also allows for faster read/write operations due to its columnar nature.
One of the best examples of Parquet's efficiency in this framework is the storage of full level 2 order book (L2_Book) data. From March 1st to July 11th, the framework has recorded full L2_Book data (10 levels each side) for Bybit perpetual contracts, which amounts to approximately 350GB. Considering the granularity of L2_Book data, this is a significant demonstration of how the Parquet format can lead to substantial storage optimization.
By managing high-volume data efficiently, the system ensures quick access to historical data, which is crucial for backtesting strategies, performing detailed market analysis, and carrying out in-depth audits. Efficient data handling also contributes to the framework's overall performance, ensuring that it can continue to operate at high speed even as the volume of stored data grows over time.
In addition to Redis and Parquet, the framework also uses PostgreSQL for storing certain types of data. PostgreSQL is a powerful, open-source object-relational database system known for its robustness, flexibility, and performance. It offers advanced features like full ACID compliance, multi-version concurrency control, and extensive indexing capabilities.
For the storage of order data and strategy events, the framework takes advantage of the TimescaleDB extension in PostgreSQL. TimescaleDB is an open-source time-series SQL database optimized for fast ingest and complex queries. It leverages the power of SQL with the performance advantages of time-series data.
Time-series data is a sequence of data points indexed in time order, which is a common form of data in many fields, including finance, where time-stamped data points (like orders, trades, or price updates) are generated continuously.
Efficiency: TimescaleDB helps handle large volumes of time-series data efficiently. It automatically partitions time-series data into smaller, more manageable chunks called "time-series hypertables". This makes data writes and queries faster and more efficient than traditional relational databases.
Scalability: TimescaleDB provides excellent scalability. You can continue to add more data to your database without experiencing significant performance degradation. It supports up to petabytes of data.
Complex Queries: TimescaleDB enables the execution of complex SQL queries on time-series data. You can perform time-series analytics using standard SQL, without needing to learn any new query language.
Compatibility: TimescaleDB is an extension to PostgreSQL, which means it's fully compatible with existing PostgreSQL tools and interfaces. This makes it easy to integrate into applications that already use PostgreSQL.
In summary, by integrating TimescaleDB, the framework greatly enhances its ability to handle time-series data efficiently, perform complex analyses, and scale as data volumes grow, all while leveraging the familiar and powerful features of PostgreSQL.
The framework's modular architecture and use of scalable technologies like Redis provide a solid foundation for scaling. You can further optimize the system's performance and data consumption by fine-tuning Redis configurations, managing the level of data logging, optimizing the database schema, and making use of efficient data structures and algorithms in your strategies.
Lastly, monitoring tools should be used to keep track of the system's performance and data consumption. These tools can provide valuable insights that can help in identifying bottlenecks and areas for improvement.
The framework employs a comprehensive logging system using the custom arb-logger
package, which serves multiple purposes: tracking application flow, providing insights during debugging, and alerting of any unexpected behaviour. Let's delve into its structure and functionalities.
The arb-logger package is comprised of the following four key components:
__init__.py
: Initializes the package.
alert_message.py
: Contains the AlertMessage
data class used to handle log messages. It includes methods for converting between log records (Python's base logging module format) and AlertMessage
instances, enabling a more flexible logging experience.
arb_alerts.py
: Responsible for handling log messages at different levels and sending macOS notifications in case of error logs. It uses the pubsub
feature of Redis to subscribe to different logging channels.
logger.py
: The core of the package, which sets up the logging environment. It configures various handlers, including a RedisHandler
for sending log messages to Redis, and a FileHandler
for writing log files. It also includes support for outputting log messages to stdout
.
Here's a high-level overview of how the logging system operates:
The logger is set up by calling get_logger()
. It takes multiple parameters to customize the logger according to the user's needs, including log level, output file path, and Redis configuration. It also automatically sets up a handler for uncaught exceptions.
When an event occurs in the system, a log message is created with all the necessary information, such as the name of the logger, the severity level of the event, and the actual message. The logger forwards this message to all its handlers.
The RedisHandler
receives the log message, converts it into a dictionary using AlertMessage
and publishes it to the Redis channel.
The arb_alerts
module listens for these messages on the Redis channels. When a message arrives, it's converted back into an AlertMessage
instance and logged locally. If the message level is ERROR
or higher, a macOS notification is sent.
The arb-logger
package is designed to be highly flexible and extensible, providing several advantages:
Redis integration: Log messages are sent to Redis in real time, allowing them to be consumed by other services for alerting or monitoring.
Multi-output support: Log messages can be output to stdout
or to log files, which helps during debugging and maintaining a record of events.
Error notifications: Critical errors trigger macOS notifications, ensuring immediate attention to issues.
Scalability and Flexibility: The logger is easy to set up and configure, and can handle a large volume of log messages. Its design allows for easy extension and integration into any Python application.
The framework takes advantage of Grafana, an open-source platform for monitoring and observability, to oversee the state of the Redis data store and track important metrics. This provides real-time insights into the system's health and behavior, which can guide decision-making and troubleshooting efforts.
Here's an overview of how Grafana is used in the system:
Grafana connects to Redis, subscribing to the necessary channels and fetching key performance indicators. It then visualizes these data points on a customizable, user-friendly dashboard. Here are some of the critical metrics being tracked:
Exchanges feed state: This metric allows the user to keep an eye on the operational status of the various exchange feeds. The state could be up (working correctly), down (not operational), or unknown.
Number of keys: This tracks the total count of keys stored in the Redis database. A significant change in this number could indicate new data inflow or data deletion.
Uptime: This is the amount of time that the Redis server has been up and running. It can help identify any unexpected shutdowns or restarts.
RAM Consumption: This shows how much memory Redis is using. A sudden increase could indicate a problem like a memory leak.
Slow Queries: This metric tracks queries that take a long time to execute, which could impact the system's performance and response time.
Data Ingestion Rate: This is the rate at which new data is being added to Redis. It can help identify spikes in data inflow.
Number of Clients: This shows the number of client connections to the Redis server.
Command Count: This shows the total number of commands processed by the Redis server. For example, the system is currently at approximately 17 billion publish (PUB
) commands and ~2 billion hash set (HSET
) commands.
Grafana's customizable nature allows you to add or remove metrics based on your needs, providing a flexible tool for system monitoring. By tracking these metrics, you can quickly identify when and where issues are occurring, making it easier to maintain the system's health and performance.
The TargetSeeker
strategy is a perfect illustration of how the arb-trops-phoenix
framework can be applied to create, manage and execute a trading strategy. This guide will walk you through the primary functionalities of this strategy and how it integrates with the larger framework.
The TargetSeeker
strategy is designed to adjust trading positions based on a set of predefined target quantities for each instrument, which are defined in a computed_targets.json
file. This strategy retrieves the current target quantities from the JSON file and then adjusts the current position quantities to match these target quantities.
The strategy depends on a target file (computed_targets.json
), which contains a dictionary mapping instrument names to their target quantities. The target file is periodically updated with new target quantities. The TargetSeeker
strategy uses the watchdog
package to monitor changes to this file and adjusts the trading positions accordingly.
The strategy waits for all orderbooks of the relevant instruments to be available before initiating the trading operations. This is achieved using the _wait_all_orderbooks
function.
The update_positions
method is the heart of this strategy. This method is called whenever the target file is updated. This method compares the current position quantity for each instrument with the target quantity. If there's a mismatch, it generates an order to adjust the position quantity.
The subscribe_to_events
method is called to listen for changes to the target file. Any modifications to this file are picked up by the event handler, which subsequently calls the update_positions
method to adjust the trading positions.
Finally, the strategy is initiated by calling the run
method. This sets up the necessary listeners for file changes and kickstarts the event loop to listen for these changes.
By using the arb-trops-phoenix
framework, the TargetSeeker
strategy can efficiently handle trading operations with low latency and high reliability. The framework's robust architecture handles a lot of the low-level details, allowing users to focus on the high-level trading logic. This leads to code that is easier to write, understand, and maintain.
Moreover, the framework's integration with tools like Redis and Grafana allows for efficient data management and easy monitoring of the system's state and performance. The arb-trops-phoenix
framework's use of modern technologies and design principles makes it a reliable and flexible choice for implementing automated trading strategies.