Serenity's tick-by-tick backtester relies on a tick database called Behemoth, which in turn sits on top of Azure blob storage. Serenity's crypto market data feed handlers run 24x7, but every day at midnight they dually-publish the tick data to local RAID disk and Azure blob storage. The only problem: in the current implementation a researcher needs to use one of the two API keys to access blob storage when backtesting on his or her desktop, a huge security hole. If nothing else it puts the integrity of the tick database at risk because there's no way to grant read-only access via API keys -- in theory a researcher could corrupt the tick database in production. As one of my 2021 ambitions is to bring more people into Serenity development, this hole had to be fixed urgently.
Directories, tenants & client apps
Azure AD is the cloud-hosted edition of Microsoft's Active Directory product, and you can attach multiple directories to your subscription. As I already had a directory set up most of the heavy lifting was in configuration of the application and entitlements, which we'll cover next.
Registering an application
First step is to go to Azure Active Directory > App registrations. I created an application call Serenity, and as you can see in the summary page for it you can get at both the client ID and the tenant ID:
Most of the defaults will work fine except for Authentication. Here there are a couple gotchas to watch out for, in particular the "Allow public client flows" setting which for security defaults to No. We need to turn this on because Python desktop applications need the "Device Code Flow" login mechanism, which we'll see at work later. The other thing is we need to tick the standard OAuth2 client redirect as shown at the top. Don't add a platform -- you don't need it.
Granting access
In the Microsoft security model we need to permission both the application and the user, because the permission we are going to give the application is the right to get delegated access to the user's storage account permissions. Miss one of the two, and it won't work! Let's start by setting up the application's entitlements under Serenity > API permissions:
The other place we need to set up access is under Storage accounts > cloudwall > Access Control (IAM):
Integrating device login
Though I struggled for a while with getting the right Python API's, in the end the code involved was quite simple. The blob storage API supports two different modes for credentials with this latest patch, connection strings (original mechanism) and Azure Identity credential objects:
def __init__(self, credential, db_name: str, cache_dir: Path = Path('/var/tmp/abs_lru_cache'),
timestamp_column: str = 'date'):
"""
Creates an instance of AzureBlobTickstore using either an Azure Identity credential class,
e.g. typically DeviceCodeCredential, or a connection string.
"""
if isinstance(credential, str):
self.storage = BlobServiceClient.from_connection_string(credential)
else:
self.storage = BlobServiceClient(account_url=get_global_defaults()['azure']['blob_account_url'],
credential=credential)
Note you'll need to plug in the blob_account_url to match the storage account, cloudwall. You can find this setting in src/serenity/defaults.cfg
. Now all we need to do is create the credential in the AzureHistoricMarketdataService, replacing the older AZURE_CONNECT_STR environment variable:
self.credential = DeviceCodeCredential(client_id=get_global_defaults()['azure']['client_id'],
tenant_id=get_global_defaults()['azure']['tenant_id'])
Here you'll need two configuration parameters, which again you can find (or override) in src/serenity/defaults.cfg
-- if you'll recall above, client ID and tenant ID correspond to the application's identity and the specific directory instance we want to target.
With these changes pulled together, we can run the backtester again. Now we're prompted to go to the Microsoft device login website and given a one-time code:
If it all works correctly once you enter the code you'll get a Serenity-branded confirmation screen:
and you'll start reading ticks from Behemoth!
msal vs. azure-identity
One word of warning: Microsoft's authentication & authorization documentation focuses primarily on the msal package for Python, which is a lower-level API for acquiring access tokens. Rather late in the game I discovered that the azure-storage-blob package does not work with MSAL, but rather a package that sits on top of it called azure-identity.
Side note: what's in a name?
Behemoth (Russian: кот Бегемот) is the enormous, demonic black cat featured in Mikhail Bulgakov's Master and Margarita. Per Wikipedia, "[h]e has a penchant for chess, vodka, pistols, and obnoxious sarcasm. "