You are viewing a single comment's thread from:

RE: MySBDS - Steem Blockchain Data Service in MySQL

in #utopian-io7 years ago

I'm glad it's coming together! It really is a lot of learning little pieces of the puzzle like this to understand everything. I'm still getting that tutorial together to walk through all of the steps like this in a little more detail.

One tip, renaming the volume could cause issues with docker (thought it sounds like it worked here) so here's a trick in case it comes up again.

Inside the volume folder you'll see another folder named _data containing all of the databases. Make sure you docker stop steem_mysql first so nothing is running and then just move your _data folder with all the existing data to overwrite the _data folder in the running container. You can't just replace the steem/ folder as this is innoDB and needs the ib... files as well.

Then, just docker start steem_mysql and it'll see the database.

Sort:  

I was wondering about the possibility of zipping and tarring up the _data directory to populate sbds instead of using mysqldumps. For example, instead of just offering latest.tar for download on your website, would it also work to offer latest_data.tar as a download? There is probably a reason why mysqldumps are what people typically do, but I'm wondering if this is a special case where the _data directory might be a faster solution because the database restore took around 8 hours on a 6 virtual core machine with 16GB and the approach of downloading overwriting the _data directory could be much faster.

Haha, the .tar download itself was a whole other battle for me.

The short answer is the ibdata... and ib_logfile... files are required as well as the steem/ folder, but those files contain data about all the databases on that server. Even so, this approach should work if you backup the entire _data folder, but nothing short of that. That would almost assuredly go faster than dealing with the dump.

Creating my .tar is no walk in the park either. I'm looping through the tables and doing a mysqldump one at a time and appending each of them to the .tar file. This database is so huge a standard mysqldump of the entire thing never finished once in my testing.

It is difficult and expensive to be working with such large databases which got me thinking about a way to potentially reduce the size of the database by more than 50% without too much of a sacrifice. Anyway, I noticed that the sbds_core_blocks table accounts for around two-thirds of the entire database size, and I'm guessing that the mediumtext entry named, "raw" (probably for raw blockchain data) probably accounts for the vast majority of that space, and it got me wondering if it might be possible to simply replace the "raw" entries with a single character or something save space. For example, I think doing this might enable the entire database to fit comfortably on a single $40/month droplet with room to spare.

I've been asking myself this exact same question as well, your approach just may work. I'll definitely post if I find some answers.

Another thing that may be worth considering is that the comments table is the second biggest in storage space, and it may be worthwhile to save space removing removing older edited versions of the "body" mediumtext entries. For example, sometimes I edit a big long post a few times to correct a few typos and this type of editing may be fairly common and may be able to reduce the database size a lot.

Yeah, great point! The body is fully duplicated on edits so that really could save some significant space.

It still seems to be working, and the lastblock seems to stay very close to the current head_block_number.

Thanks for the tip about the _data folder. I was definitely not confident about what I did, but it is good to know the proper way to do things going forward.

Note: What I did fortunately had the same effect as overwriting the data folder in the running container because there was /old64characterdirectory/_data/ and /new64characterdirectory/_data/ and since I saw that the old one was about 360G but that only the new one was the only one being written to so I stopped both steem_mysql and steem_sbds and then removed the new one and renamed the old one to what the new one was which was a roundabout way of overwriting the _data folder in the running container, but my approach could have been problematic if there was anything other than the _data folder in the running container.

Awesome! To test, I run this a least once a day:

SELECT MAX(block_num) AS lastblock FROM sbds_core_blocks

And check against the head_block_number here: https://api.steemjs.com/getDynamicGlobalProperties

I have fallen a bit behind a few times and as far as I can tell it's the public node we're connecting to that's the issue. Even so, stopping and restarting steem_sbds should get you back on track eventually if you see you're falling behind.

Yes, I've been checking the max block_num against the head_block_number throughout the day, and for the past 24 hours it seems to be keeping up well with about a 20 block lag which seems reasonable.

As a test I reverted back to the high memory settings (line 38 of your high memory script) to see if steem_mysql would keep stopping like it was doing before, and so far it has been working well for the past hour, and I plan to continue testing it out tomorrow to see if I need to do any restarts. I'm thinking that the high memory settings may only run into issues when it is way way behind on the block_num because maybe the high memory mysql settings cause the system to try and catch up too fast. It will be interesting to see if steem_mysql has any stops tomorrow.

I've pretty consistently stayed within 20 blocks as well, which is one minute and more than reasonable.

You're exactly right, the settings you changed are exactly what to tweak. I've been working on a new 'tutorial' post the past couple of days after gleaning a bunch of insights from our discussion.

I'm gonna put something together that goes through each piece of my scripts in more detail, including these mysql settings.

The high memory settings stopped working around noon today so it made it through the last night and and this morning and then failed. I started things up again tonight and noticed the server load go up to 18 in about 10 or 15 minutes with kswapd0 being the top process which I think is related to virtual memory.

I briefly looked for a way to change the mysql settings in docker without having to do the thing where I do a docker stop, rm, run, but I couldn't find a way.

At some point tomorrow I plan to go back to the low memory settings to see if they continue to work as well as they did yesterday.

This is good to hear. I only ran the high memory instance for a couple of days. I thought I had proved it enough, but obviously there's more to learn there.

As far as changing mysql settings in docker, I can see the settings with docker inspect but cannot find a way to set them.