Once a year I need to pay taxes for the capital gains on my Bitcoin trades, which in prior years has meant also paying the Bitcoin Taxes service to generate a TXF file with the necessary inputs for TurboTax to do its thing and generate a statement. Unfortunately this year Bitcoin Taxes could not handle the complexity of trading across multiple currencies and exchanges, and I decided to build the function into Serenity.
This in turn means it's time to get a proper database for Serenity. It had to be a free, open source one, and preferably scalable enough to grow with Serenity. Since I had previous history working with PostgreSQL, I decided to evaluate PostgreSQL 11 with the TimescaleDB extensions, as this would give me both a powerful relational database and the option to play around with more timeseries database functionality as a supplement to Serenity's Behemoth tickstore developed previously.
Installing TimescaleDB
Initially I did a from-scratch local install of PostgreSQL and the TimescaleDB extension:
# install postgresql
$ sudo apt-get install postgresql-server-dev-11
$ sudo apt-get install postgresql-11
# compile and install timescaledb extension
$ git clone https://github.com/timescale/timescaledb.git
$ git checkout 1.5.1
$ ./bootstrap -DREGRESS_CHECKS=OFF -DAPACHE_ONLY=1
$ cd ./build
$ make
$ sudo make install
# tune the server
$ sudo apt-get install golang-1.11-go
$ export PATH=/usr/lib/go-1.11/bin:$PATH
$ export PATH=$HOME/go/bin
$ go get github.com/timescale/timescaledb-tune/cmd/timescaledb-tune
$ sudo timescaledb-tune --quiet --yes
# start postgresql
$ sudo pg_ctlcluster 11 main start
# install the extension
$ sudo -u postgres psql -c "CREATE EXTENSION timescaledb"
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.5.1
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.
CREATE EXTENSION
But as we've been doing a lot with Docker on this project, it turns out you can replicate the above much more simply using the published TimescaleDB Docker image. Running the following:
$ docker pull timescale/timescaledb:1.5.1-pg11-oss
$ docker run -d --name serenity-timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=******** timescale/timescaledb:1.5.1-pg11-oss
$ docker run -it --net=host --rm timescale/timescaledb:1.5.1-pg11-oss psql -h localhost -U postgres
Password for user postgres:
psql (11.5)
Type "help" for help.
postgres=#
pulls an image the same versions with the open source extensions only and then runs it, and logs you into psql command line once it's running.
Designing the database
To ease the process of designing what I anticipated to be a complex schema, I decided to invest in
Raphael Araújo e Silva's pgModeler tool, and it was worth every penny. It not only provides a great visual designer, but it also generates the full SQL / DDL needed to bootstrap the database.
With that done, it's time to create the database for the first time. From the serenity source directory, run:
psql -U postgres -h localhost -d serenity -f serenity-db/serenity.sql
The magic of TimescaleDB gets activated with the create_hypertable()
function, so let's take the exchange_trade
table and make it a hypertable, partitioned by timestamp:
SELECT create_hypertable('exchange_trade', 'trade_time');
We will leave a fuller exploration of what TimescaleDB has to offer for the next blog post.
Full SQL Alchemist
Once the database has been chosen and set up, the next question is how to integrate it with Python. If we want to isolate the database layer and introduce object-relational mapping (ORM) features, sqlalchemy is the standard. But I am no longer keen on ORM after having been burned by it a number of times, and in my view Docker and containers change the equation in terms of the value of portable database layers. After all, if your target database image is just a docker pull away, the argument that your software should make it easy to switch is at least a bit weaker. So to start we'll go with raw psycopg2
and use the full range of Postgres SQL features directly from code.
We don't need to give up on object mapping entirely. E.g. the database model has a number of "type code" tables with a consistent schema, and we can easily build a dynamic cache around it:
class TypeCodeCache:
"""
Helper that pre-caches all typecodes in the database and lets the user look up the ID by code.
"""
def __init__(self, cur):
self.cur = cur
self.type_code_map = {
Exchange: self._load("exchange", Exchange),
InstrumentType: self._load("instrument_type", InstrumentType),
Side: self._load("side", Side),
OrderType: self._load("order_type", OrderType),
DestinationType: self._load("destination_type", DestinationType),
TimeInForce: self._load("time_in_force", TimeInForce),
MarkType: self._load("mark_type", MarkType),
ExchangeTransferMethod: self._load("exchange_transfer_method", ExchangeTransferMethod),
ExchangeTransferType: self._load("exchange_transfer_type", ExchangeTransferType)
}
def get_by_code(self, klass, type_code):
return self.type_code_map[klass][0][type_code]
def get_by_id(self, klass, type_id):
return self.type_code_map[klass][1][type_id]
def _load(self, table_name, klass):
id_column_name = table_name + '_id'
code_column_name = table_name + '_code'
type_by_code = {}
type_by_id = {}
self.cur.execute("SELECT " + id_column_name + ", " + code_column_name + " FROM serenity." + table_name)
for row in self.cur.fetchall():
type_obj = klass(row[0], row[1])
type_by_code[row[1]] = type_obj
type_by_id[row[0]] = type_obj
return type_by_code, type_by_id
Tax calculation
With a database in place plus the coinbasepro and gemini modules to access the exchange REST API's, we can populate the database with all the historical trades and query them. In our schema:
SELECT eo.side_id, eo.exchange_instrument_id, ef.fill_price, ef.quantity, ef.fees, ef.create_time
FROM serenity.exchange_fill ef
INNER JOIN serenity.exchange_order eo ON eo.exchange_order_id = ef.exchange_order_id
ORDER BY ef.create_time ASC
Per the IRS, Bitcoin and other cryptocurrencies are treated like a commodity and subject to capital gains taxes. Given most of my trading is short-term rather than buy-and-hold, that means short-term capital gains or losses based on FIFO (First In, First Out). In other words, for every BTC-USD sell we need to find all buys before it that have not yet been allocated to previous sells.
for candidate_buy in all_trades['Buy']:
if candidate_buy['ts'] <= sell['ts'] \
and candidate_buy['remaining'] > 0.0 \
and sell['remaining'] > 0.0:
if candidate_buy['remaining'] < sell['remaining']:
qty_bought = candidate_buy['remaining']
sell['remaining'] -= qty_bought
candidate_buy['remaining'] = decimal.Decimal(0.0)
else:
qty_bought = sell['remaining']
candidate_buy['remaining'] -= qty_bought
sell['remaining'] = decimal.Decimal(0.0)
if qty_bought > sell['qty']:
qty_bought = sell['qty']
cost_basis += qty_bought * candidate_buy['px']
last_buy = candidate_buy
Furthermore, based on my trade history we also need to contend with cross-currency trades. I elected to handle the latter by decomposing each cross-currency trade (e.g. ETH-BTC, ZEC-BTC) into a corresponding buy/sell pair of USD trades, e.g. sell ETH-BTC becomes sell ETH-USD, buy BTC-USD, with a corresponding multiply of the quantity for BTC by the cross-currency trade quantity:
# for each sell, buy short instrument and sell long instrument
for sell in self.trades[instrument_code]['Sell']:
self.trades[short]['Buy'].append({
'px': self.lookup_instrument_mark(short, sell['ts']),
'qty': sell['qty'] * sell['px'],
'remaining': sell['qty'] * sell['px'],
'fee': 0.0,
'ts': sell['ts']
})
self.trades[long]['Sell'].append({
'px': self.lookup_instrument_mark(long, sell['ts']),
'qty': sell['qty'],
'remaining': sell['qty'],
'fee': 0.0,
'ts': sell['ts']
})
Finally, the whole calculation needs to be wrapped up to emit TXF, the TurboTax import format. You can see the full code here.