MySBDS - Steem Blockchain Data Service in MySQL
What is the project about?
The Steem Blockchain Data Service is an open source project managed by Steemit that allows you to read the Steem blockchain and store it in a local MySQL database. The project opens with an obvious Notice:
This is prerelease software, not yet suitable for production use. Proceed at your own risk.
I'm working under this warning to get SBDS operational in the most stable way possible today, while watching and supporting the project as it moves forward. At this point, I've spent 100 hours (that I tracked, so more really) digging into SBDS and working to understand it. I'm not gonna give myself the Complete Understanding Badge yet, but I'm on my way.
MySBDS.com
Today, MySBDS.com and blervin/mysbds exist to provide three key things:
- Proven deployment scripts to get up and running with SBDS quickly
- Documentation of the challenges identified thus far (both on Steemit.com) and on MySBDS.com
- A public endpoint offering an always up-to-date full
mysqldump
of the entire Steem blockchain
I've dumped a pretty significant amount of time and energy into SBDS and I'm barely scratching the surface so I want to share everything I've found (today and ongoing) to get a dialog going with everyone else who's looking to get a local MySQL database running. And also to provide this public documentation of implementation results for the team working on updating the code for steemit/sbds.
My near-term goal is to get a public instance up and running for those who aren't quite ready to deploy and manage an entire environment themselves, but still wanna dig into the blockchain data.
Technology Stack
Everything outlined here is deployed in docker
on CentOS 7.4 using common bash commands.
I will specifically outline a DigitalOcean deployment along with some useful functionality available there.
From what I see around here, it looks like Ubuntu is pretty popular so I'm planning to put together another version for that as well.
Roadmap
In order of complexity (and likely implementation):
- Prepare MySBDS.com to highlight current scripts, documentation, and the most relevant Steemit posts about SBDS
- Develop an Ubuntu deployment script
- Document and outline challenges with deployments
- Document my monitoring infrastructure
- Deploy a public MySQL server
- Explore interest in fully managed deployments or even replication
- Continue performance monitoring to identify refinements and share those insights
How to contribute?
Comment here and any other posts about SBDS, catch me on steemit.chat, or just let me know how you can help make the blockchain easily accessible in MySQL!
Deploy MySBDS
Overview
I was excited to play with the SteemSQL service that was originally public, so when it recently became a paid service a little beyond my budget I began to look for alternatives and I realized I could easily maintain my own copy of the blockchain.
At the time of this writing, the size of the MySQL database of the entire blockchain is 403GB, and growing rapidly. Because of this growth, I wanted to look at ways to deploy this on flexible infrastructure and I found a solution with Block Storage at DigitalOcean. This allows us to deploy storage at any size for $0.10/GB per month and increase that size at any time later, however, storage can never be decreased.
We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean running the Steem Blockchain Data Service and MySQL in docker.
Requirements
You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform.
Initial Setup
We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:
- NYC1
- NYC3
- SFO2
- FRA1
- SGP1
- TOR1
- BLR1
- LON1
- AMS3
You'll need to consider the size of the droplet you want to deploy, or the real question, whether you want to download the full mysqldump
of the blockchain. I have written the low_mem.sh
and high_mem.sh
for both scenarios.
If you want to just get an instance up and slowly download everything, you can deploy a droplet with 4GB of memory and use the low_mem.sh
script.
To download and restore the entire blockchain takes a fair amount of resources so the high_mem.sh
script is written for 16GB of memory.
I'd give an estimate for times, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but with the low_mem.sh
deployment it may easily become a few days before it completes.
Prepare Script
Start by opening your favorite text editor such as Sublime Text
or just Notepad
and grab either the high_mem.sh
or low_mem.sh
script.
The first modification is to change the mysql_password
in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.
mysql_password="mystrongcomplexpassword"
Create Droplet and Volume
We start by clicking the Create
button and selecting Droplets
Then, we select CentOS and 7.4 x64 from the drop-down:
As we look to Choose a size
you'll notice there are Standard Droplets
and Optimized Droplets
available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.
Next, click Add Volume
to add block storage.
Given the current size of the Steem blockchain, I recommend using at least 500GB, especially if you plan to download the blockchain as you'll need the extra space.
Next, we'll pick a region.
As you'll see, only the supported regions are selectable with all others grayed out.
Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index}
where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.
That said, we look to the following line in our script setting the volume_name
variable:
volume_name="volume-tor1-01"
There are two parts to this name, the region (tor1
here) and the index of the volume (01
in this case) for that region.
If you do not have any volumes in the region you are using, this should deploy with the 01
index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name
accordingly.
Next, we check the User data
box and paste in our modified script:
Now, we can scroll down and click the Create
button.
The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy
link to grab that and paste into your SSH client such as Putty.
Known Issues
One thing I discovered quickly is that some of the sizes set for database fields are too small and we will see Data too long for column 'raw'
errors.
Fortunately, the community stepped in and there is a pull request at github that fixes everything. My scripts here manually checkout that pull request rather than the master
branch.
I know, I saw the signs that advised against hiking off-trail, but c'mon, this is an adventure so let's explore and see what we find! Dry humor aside, this branch has proven reliable in my testing so I'm comfortable with it but please advise of any issues you encounter.
The SBDS project still references steemd.steemit.com
as the RPC node it connects to, but this was discontinued on January 6, 2018 so I've got a little sed
in here to update that to api.steemit.com
Confirm Script Execution
The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.
tail -fn5 /var/log/mysbds_install.log
Somewhere between a few hours and a few days, you'll finally see:
END install
A simple way to check progress
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
mysql -h $mysql_ip -p -e "SELECT max(block_num) AS lastblock FROM sbds_core_blocks" steem
And you'll see something like this, with a much smaller block number.
+-----------+
| lastblock |
+-----------+
| 20065928 |
+-----------+
Then, just sit back patiently until everything fully syncs up.
You can watch the head_block_number
shown at https://api.steemjs.com/getDynamicGlobalProperties to see how far along you are.
Conclusion
I plan to continue working with SBDS and updating this project with new scripts, tricks, and tools that I discover along the way.
I'm not seeing a whole lot of posts about SBDS so I'm hoping to open the discussion so we can all work together to get this running well. If you're already using SBDS or I just opened this door for you please comment and share your experiences! I'm gonna document everything I personally discover as well as any insights shared by the community.
My goal is to make it as simple as possible to push the blockchain to a MySQL database and I feel like I'm off to a good start.
Posted on Utopian.io - Rewarding Open Source Contributors