I sometimes trade Bitcoin on Coinbase Pro -- as a New York state resident I don't have many choices thanks to the BitLicense -- and while the Web interface is decent, it does not provide all the functionality I want. I also want to brush up on my C++ and Web development, both of which I last did professionally 20 years ago, and want to learn more about systematic trading by doing it hands-on; I am a registered equities trader but professionally do not trade.
With that as background, I am starting an open source project to develop a crypto trading system, Serenity, and will write about it along the way. When complete Serenity will include:
- feed handlers for multiple crypto exchanges
- exchange connectivity for Coinbase Pro
- algorithmic execution
- realtime risk and P&L calculation
- data capture for research
- tax reporting
plus an HTML5 Web interface for the trader. While it will not offer is any warranty of usefulness for any purpose whatsoever, or any guarantee of reliability. It will not be a hedge fund-grade trading system and if you choose to invest serious money using it and have losses due to a bug, that's on you. This is a hobby project developed currently by an individual who is working in programming languages he does not use at his employer; it's a toy for learning. That said, if you are interested in learning as well, I hope you find it useful.
Architectural choices
Building software requires you to see the work in progress at two levels at once: the narrow path from where you are right now to the next viable, testable iteration, and the larger view of the finished project. Weeks ago I did some research into a couple of key choices for Serenity, and made a simple diagram of how I saw it fitting together.
The first choice was to build it around a loosely-coupled messaging bus. Each functional component from the list above will be implemented as one or more standalone services that communicate via a common bus based on ZeroMQ, using Cap'n Proto for the message format. This particular choice -- as is often the case -- has consequences. In addition to not being useful for anything (see above) Serenity is specifically not suited for microsecond-level high-frequency trading. The tick-to-trade path will cross ZeroMQ's bus multiple times and result in multiple message serializations and deserializations. You could re-compose those same components into a single, much faster monolithic component, but that would then be your system, and not Serenity.
The second choice was to make the codebase heterogenous with respect to language: use C++ in the critical path, and Python and HTML5 everywhere else. This obviously better serves the learning purposes of the project, but it has other benefits as well: Python has a very rich ecosystem for general development and specifically for data science, and so using it where possible will speed up development. It also reinforces the loose coupling: by mandating from the start that components can be in more than one language, it's less likely that other choices will be made that will couple one component to another.
Finally, the third choice is, where possible, to build little libraries that implement the most common functionality, especially on the C++ side where there is to my knowledge no C++ library available. This will manifest as standalone C++ libraries for both parsing crytocurrency marketdata feeds and for exchange connectivity, ones which should be usable without any part of Serenity.
In the end, I expect it will look something like this:
where the green components are expected to be in Python, and the orange in C++.
Back to the code: extending libcrypto-mktdata
In the last post on streaming marketdata we stopped with implementing RawFeedClient
sub-classes for multiple exchanges -- basically taking care of WebSocket subscription mechanisms and binding from WebSockets to a callback and that's it. In this next iteration we are starting down the path to create a "minimum viable product" -- in this case one focused on Coinbase Pro only that successfully routes a tick from a marketdata feed handler to an algorithmic execution engine to an order gateway. That means we need to go from a raw feed to an "event" feed of typed objects by parsing the raw feed.
Since one of our guiding choices is to package low-level crypto trading functionality as libraries, we'll add this new functionality to libcryto-mktdata but leave out Serenity-specific aspects like the main program or the Cap'n Proto messaging over ZeroMQ; these will go in a new serenity-fh project on GitHub.
CoinbaseEventClient
The event client, like the raw client, is callback-based for simplicity. A WebSocket callback with a raw JSON string gets passed back from RawFeedClient
and that gets turned into a void function call with a CoinbaseEvent
base class:
using OnCoinbaseEventCallback = std::function<void(const CoinbaseEvent&)>;
For performance reasons we embed an enum in that base class type (rather than using RTTI to check instance of class) so callers can filter and if necessary cast events:
class CoinbaseEvent {
public:
enum EventType {
match,
status,
ticker
};
[[nodiscard]] virtual EventType getCoinbaseEventType() const = 0;
};
We then take the common code we developed last time defining a Subscription with one or more channels, and use that to construct an embedded CoinbaseRawFeedClient
inside a very similar interface as the RawFeedClient
-- this time for events:
class CoinbaseEventClient {
public:
CoinbaseEventClient(const Subscription& subscription, const OnCoinbaseEventCallback& callback);
void connect() {
this->raw_feed_client_->connect();
}
void disconnect() {
this->raw_feed_client_->disconnect();
}
~CoinbaseEventClient() {
delete this->raw_feed_client_;
}
private:
CoinbaseRawFeedClient* raw_feed_client_;
};
With this done, we can go about creating subclasses of CoinbaseEvent
with custom constructors that take care of parsing, which takes us to the next technical choice: using the high-performance RapidJSON library for JSON parsing.
Parsing JSON in C++
RapidJSON offers multiple interfaces for parsing JSON: DOM-like, SAX-like and even a JSON Pointer implementation that lets you extract elements based on a path-like construct. For this code to start we're going to use the DOM API, which works with raw char* types, not std::string. We'll take the get_raw_json()
output from the RawFeedMessage
and use RapidJSON's Document.Parse()
method to convert it into a DOM:
CoinbaseEventClient::CoinbaseEventClient(const Subscription& subscription, const OnCoinbaseEventCallback& callback) {
OnRawFeedMessageCallback raw_callback = [callback](const RawFeedMessage& message) {
auto d = rapidjson::Document();
const auto& raw_json = message.get_raw_json();
d.Parse(raw_json.c_str());
auto event_type = d["type"].GetString();
if (strncmp("status", event_type, 6) == 0) {
callback(ProductStatusEvent(d));
} else if (strncmp("match", event_type, 5) == 0) {
callback(MatchEvent(d));
}
};
this->raw_feed_client_ = new CoinbaseRawFeedClient(subscription, raw_callback);
}
At this point we can index into the specific JSON field(s) we want with the [] operator, similar to a std::map type, and then extract the value we want using the type-appropriate method: GetString
in the case of the "type" field in Coinbase's JSON protocol, which tells us the type of event that we are about to process. This in turn drives a simple if/else block to decide which of the implemented event object types we want to construct using the Document.
Drilling into one of those Document-to-CoinbaseEvent constructors, the mapping of different fields to C++ is straightforward, e.g. using GetInt64() instead of GetString() to extract a long-valued field like trade_id or sequence:
MatchEvent::MatchEvent(const rapidjson::Document& json) {
this->trade_id_ = json["trade_id"].GetInt64();
this->sequence_num_ = json["sequence"].GetInt64();
this->maker_order_id_ = new std::string(json["maker_order_id"].GetString());
this->taker_order_id_ = new std::string(json["taker_order_id"].GetString());
// ...
}
But what do we do about size & price? Although RapidJSON offers a GetDouble() method, for precision reasons the JSON protocol from Coinbase Pro's WebSocket feed wraps every double-valued field as a string, and it will assert if we try and extract the value with GetDouble(). We thus need a good, fast way to parse lots of double values, because a marketdata feed is going to produce a lot of them.
Parsing doubles
Tino Didriksen's blog post on string-to-double conversion in C++ recommended boost::spirit::qi::parse()
as the best-performing correct double parser. As a side-note, this distinction about correct parsing is an important point: if we really want to reduce latency we probably would go one step further than relax the "correct" constraint, allowing for the fact that the JSON protocol does not have every possible permutation of double string formatting.
Wrapping it up as a couple utility functions, it looks like this:
double fastparse_double(const char* val_txt) {
double val = 0.0;
boost::spirit::qi::parse(val_txt, &val_txt[strlen(val_txt)], boost::spirit::qi::double_, val);
return val;
}
Note there's a bit of messiness which I wasn't able to sort out with the generics that represent Documents vs. member values in Documents, requiring separate functions for each:
double cloudwall::core::marketdata::json_string_to_double(
rapidjson::GenericObject<true, rapidjson::GenericValue<rapidjson::UTF8<char>>> json, const char* field_name) {
const char* val_txt = json[field_name].GetString();
return fastparse_double(val_txt);
}
double cloudwall::core::marketdata::json_string_to_double(const rapidjson::Document& json, const char* field_name) {
const char* val_txt = json[field_name].GetString();
return fastparse_double(val_txt);
}
Parsing enums
Another corner case for parsing is how to convert enumerated types from JSON strings to C++ types. For this I used a simple constant map and Boost's bost::assign::map_list_of
:
enum Side {
buy,
sell
};
/// @brief simple lookup table that maps side names to Side enums
inline std::map<std::string, Side> kSideByName = // NOLINT(cert-err58-cpp)
boost::assign::map_list_of("buy", buy)("sell", sell);
giving you code like this in the constructor:
// look up Side enum by name
auto side_txt = json["side"].GetString();
this->side_ = &kSideByName[side_txt];
Parsing dates
Finally we need to deal with the "time" field from Coinbase's "match" event. For this I decided to leverage a library that's going to be the basis for extensions in C++20,Howard Hinnat's date library:
const std::chrono::system_clock::time_point* MatchEvent::parse_timstamp() const {
auto tp = new std::chrono::system_clock::time_point();
std::istringstream ss{*this->timestamp_txt_};
ss >> date::parse("%FT%TZ", *tp);
return tp;
}
And that's it! We've now gone from a raw WebSocket feed to JSON to C++ objects, and can now turn our attention to wiring up those callbacks to a marketdata distribution mechanism, the subject of the next blog post.