The Z800 Needs a Friend (And a Power Supply)
In the springtime the dual GPU setup did in more than one power supply, including both an 850W and 1100W. I finally reached the conclusion back in May that the combination of NYC local electricity costs and the recurring cost of buying refurbished power supplies made ETH mining uneconomical. In light of Ethereum's crash after the DAO debacle that was probably the right call.
So it's now sitting in the living room, and with the New Year around the corner and a couple books on machine learning and CUDA on my desk, am thinking what else I can do with it. For well over a year I have been thinking about a project to build an open source timeseries database for storing Bitcoin marketdata, so I have been looking at database storage engines and communication.
While it is true all the cool kids have their own Hadoop clusters in AWS, the higher-spec hardware options are very costly per hour, which makes the living room option more appealing. And I think for certain use cases a tightly-integrated commodity cluster which just uses the cloud for history and backup is worth exploring.
The high performance computing community has been doing a lot of work on
- reliable terabyte-scale data transfer
- GPU/RDMA hybrids
which taken together offer ways to move data between tightly-coupled clusters and the public cloud.
As other have pointed out, TCP/IP can be inefficient if more than one layer is trying to ensure reliable delivery. If you are delivering a large query result to a client you may wish to sacrifice reliable, in-order delivery depending on how you plan to traverse the data. For efficiency's sake ideally we would wan as few layers as possible -- and write the data transfer protocol on a more efficient base.
The question is: which base? RDMA and its WAN implementations (iWARP and RoCE) offer alternatives to TCP/IP sockets, and Mellanox's AccelIO library creates an abstraction layer. The JXIO library provides a further layer on top, through JNI. Google's QUIC and the Cronet implementation for Android offer another approach, as does UDT -- in pure Java.
Given a second server, a pair of Mellanox ConnectIB dual-port cards and a pair of Infiniband cables it should be possible to create a 100 GB/s bandwidth connection between the machines with the I/O workload offloaded to the cards.
Given local SSD or better yet NVMe cards and a good-enough quality bus you can move a lot of data. If you have a good method to distribute queries then you can execute them locally on different storage nodes and quickly merge the results: classic map/reduce.
If this works it should be possible to form a larger production-scale cluster with an Infiniband switch on the rack. RoCE or UDT could be used to extend the data transfer beyond the local cluster and up to the cloud and you also have a means to manage different tiers of data temperature, from hot in memory, to SSD/NVMe local disk cache, to local spinning disk, and finally out to the public cloud.
One of the reasons I became interested in RDMA is a technology from NVidia called GPUDirect that I came across. Just as RDMA lets you pin memory and remotely read and write it, GPUDirect lets you link memory on the GPU cards across Infiniband.
Tim Dettmers covers some of this in this blog post (from which the above image is taken) and this article looks at the Docker component to containerize the necessary software, including TensorFlow.
From some searching around it looks like there is a request in TensorFlow GitHub to add GPUDirect support. Thus it's possible that somewhere down the line this same setup could be used for distributed deep learning on the training data that's being shared across the cluster, possibly using h2o.ai's Deep Water.
Bottom line: this setup probably could serve a dual purpose: store marketdata, and apply machine learning algorithms to the timeseries.
Before bringing the Z800 back am just reading and reviving the VirtualBox setup on my laptop with the latest Bunsen Labs Hydrogen, a lightweight Debian Jessie distro derived from the late, great Crunchbang Linux.