Options Have Meanings, or, How I Made an rsync Seven Times Faster

This is an old post!

This post is over 2 years old. Solutions referenced in this article may no longer be valid. Please consider this when utilizing any information referenced here.

Warning: Doing this is making a clear tradeoff between security and speed. Do not do this on the public Internet or across a network you do not trust.

rsync is one of those tools that is in every computer user’s toolkit. It’s fantastic for moving large amounts of data around and for migrating data from one system to another.

rsync also has a ton of options and, after awhile, you get to where muscle memory means you just type the same few options over and over again. With me, that was -avz, archive, verbose, compression.

Recently, I was migrating several terabytes of data from a NAS to a computer. As is often the case, I fired up an rsync job and watched it.

It maxed out at about 35 megabit. Across a gigabit switched internal network.

I grumbled about how the NAS is slow and went to do something else, figuring this was going to take a few days to finish. But a couple hours later it occurred to me to circle back and look at this again. It was bugging me.

My first thought was to eliminate compression. This is going across an entirely internal switched network, I really don’t care about reducing bandwidth utilization. So I dropped the -z options.

The result was an instant speed gain of about 5x. I was pushing 200 megabit now and the time to completion had fallen from six days to about 36 hours. Impressed, I decided to tinker a little bit more. My next idea was to eliminate any encryption (again, this is an entirely internal operation, don’t do this across the Internet or on any network you don’t trust). No dice, I couldn’t drop encryption entirely, but I could choose the weakest cypher that both ends would support. In my case, that was aes128-ctr.

With a little bit of trial and error and some Googling, this is what I finally came up with:

rsync --info=progress2 -sav  -e "ssh -T -c aes128-ctr -o Compression=no -x" user@source:/path/to/copy .

No compression, weakest cypher I could find, disabled some stuff on ssh that isn’t needed. With this my transfers are sustaining about 250 megabit and the time to completion has fallen to about 24 hours.

Screenshot of Grafana
Screenshot of Grafana

Comments (0)

Interested in why you can't leave comments on my blog? Read the article about why comments are uniquely terrible and need to die. If you are still interested in commenting on this article, feel free to reach out to me directly and/or share it on social media.

Contact Me
Share It
Linux
I’m a developer, first and foremost. I like writing code. To me, maintaining servers, configuring things, troubleshooting network issues and the like -  these are things I do to support my primary interest and job as a developer. I’m not ignorant of these things, but all things considered they’re not my favorite things to do. One thing I will admit I’ve been ignorant over the years is DNS. Oh sure, I know at a high level how it works. I even know a bit about the different record types. I knew enough to have my own domain name, configured using Godaddy’s DNS servers to point to my server. But actually running my own name server? Something I’ve never done and, for some reason, had this unnatural fear of. Well, no more. I’m now running my very own shiny new name server and, actually, it wasn’t really as difficult as I thought. And because this was a learning experience for me, I figured I’d walk you through what I did as well. Picking  a Server There are two big players in the “DNS Server software” space: BIND and djbdns. BIND is the 900 pound gorilla that has been around forever and ever, and is insanely difficult to configure. djbdns is from the same guy who wrote qmail - I’ll let you be the judge of that. But after researching and actually attempting to install both of these, I eventually gave up. Both just came across as being too complex for a simple name server handling a couple of domains, and the documentation for both was equally complex. That’s when someone on Twitter pointed me to MaraDNS. I looked it over and was surprised to find good, readable and simple documentation that made it look easy to install. So I decided to give it a whirl. Here’s what I did. Note that this install is for a Gentoo system. Yours will be different if you’re using something else. Installing and Configuring MaraDNS First step is to install it. emerge maradns And let Portage do its thing. Once it’s installed, you really only have to worry about a few files. In /etc/mararc, you need to check to be sure you’re binding to the right interfaces. In my config, I bound it to the loopback and to the main interface: ipv4_bind_addresses = "x.x.x.x, 127.0.0." After that, you tell it to be authoritative, and what domains you are wanting to serve records for. csv2 = {} csv2["rebeccapeck.org."] = "zones/rebeccapeck.org" Note the period at the end of the domain name - it’s important. Each entry in the csv2 array should map to a zone file. I put mine in the “zones” subdirectory (which, in Gentoo, lives under /etc/maradns). mkdir -p /etc/maradns/zones Then, with your favorite editor (which should be vi :P), you create your zone file. The one for rebeccapeck.org (partially) looks like this: rebeccapeck.org. NS ns1.epsilonthree.com. rebeccapeck.org. NS ns2.epsilonthree.com. rebeccapeck.org. +3600 A x.x.x.x rebeccapeck.org. +3600 MX 0 rebeccapeck.org. www.rebeccapeck.org. +3600 CNAME rebeccapeck.org. So what are we doing here? Well, here it helps to know something about the different types of DNS records. I’m not going to cover all the different types of records - this is a good list of common ones and Wikipedia has a full list. The important ones you need to know are NS (Name Server), A (the main record), MX (mail server records), and CNAME (alias). The “+3600” is setting a timeout on the records to one hour (3,600 seconds). By default, the server will send one day (86,400 seconds). Here, I’m telling the server what the name servers are (strictly speaking, this isn’t required, but I added it all the same) and that the main address for people requesting “rebeccapeck.org” is this IP address. I’m also saying that people who request “www.rebeccapeck.org” should get the IP address for “rebeccapeck.org.” I also add an MX record that points to rebeccapeck.org with 0 as the priority (the first (and only) server). That’s it! Restart MaraDNS: /etc/init.d/maradns restart And you can test it out. dig @localhost rebeccapeck.org A You should get a big long printout, but what you want to see is these two lines: QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0 rebeccapeck.org. 3600 IN A x.x.x.x Assuming the above is the correct address, congratulations, your DNS server is now resolving properly locally. Delegating your Domain The next step is delegating your domain to your own server. I’m not going to cover this in too much detail because how it happens depends on the registrar. In general, this is a two step process: Register your name server’s IP address to a name. At NameCheap, when you’re in the domain screen this is done under Advanced Options > Nameserver Registration. Under GoDaddy, this is under the “Hosts” section of the domain information screen. You need to add at least two “nsX.domain.com” entries, but they can both point to the same IP. Delegate your domain to the names you just created. At NameCheap, you would go General > Domain Name Server setup, and Specify Custom DNS Servers. Then, enter the two (or more) names you just created “nsX.domain.com”. I can’t remember how I did this in GoDaddy, but I remember it was pretty apparent. That’s it! They say it takes 24-48 hours, but I started seeing requests hit the new name server within about an hour. Of course, since I wasn’t actually changing IP addresses, there was no real downtime. As of now, all my domains are being served off my own nameserver. It’s kind of a neat feeling of accomplishment, knowing you’re not relying on someone else’s DNS setup - they’re just providing you a name. This makes domain transferring much easier and adding new records much easier. And seeing as how I’m currently in the process of transferring all my domains away from GoDaddy, this will ease the transition.
Read More
Apache
The goal of this project were twofold: To completely eliminate the need for me to touch the phone to provision it. I want to be able to create a profile for it in the database, then simply plug the phone in and let it do the rest. And… To eliminate per-phone physical configuration files stored on the server. The configuration files should be generated on the fly when the phone requests them. So the flow of what happens is this: I create a profile for the phone in the database, then plug the phone in. Phone boots initially, receives server from DHCP option 66. Script on the server hands out the correct provisioning path for that model of phone. Reboots with new provisioning information. Phone boots with new provisioning information, begins downloading update SIP application and BootROM. Reboots. Phone boots again, connects to Asterisk. At this rate, provisioning a phone for a new employee is simply me entering the new extension and MAC address into an admin screen, and giving them the phone. It’s pretty neat. **Note: **there are some areas where this is intentionally vague, as I’ve tried to avoid revealing too much about our private corporate administrative structure. If something here doesn’t make sense or you’re curious, post a comment. I’ll answer as best I can. Creating the initial configs I used the standard download of firmware and configs from Polycom to seed a base directory. This directory, on my server, is /www/asterisk/prov/polycom_ipXXX, where XXX in the phone model. Right now we deploy the IP-330, IP-331 and IP-4000. While right now the IP-330 and IP-331 can use the same firmware and configs, since the IP-330 has been discontinued they will probably diverge sometime in the not too near future. With the base configs in place, this is where mod_rewrite comes into play. I added the following rewrite rules to the Apache configs: RewriteEngine on RewriteRule ^/000000000000\.cfg /index.php RewriteRule /prov/[^/]+/([^/]+)-phone\.cfg /provision.php?mac=$1 [L] RewriteRule /prov/polycom_[^/]+/[^/]+-directory\.xml /prov/polycom_directory.php` RewriteCond %{THE_REQUEST} ^PUT* RewriteRule /prov/[^/]+/([^/]+)\.log /prov/polycom_log.php?file=$1` To understand what these do, you will need to take apart the anatomy of a Polycom boot request. It requests the following files in this order: whichever bootrom.ld image it’s using, [mac-address].cfg if it exists or 000000000000.cfg otherwise, the sip.ld image, [mac-address]-phone.cfg, [mac-address]-web.cfg, and [mac-address]-directory.xml. So, we’re going to rewrite some of these requests to our scripts instead. Generating configs on the fly We’re going to skip the first rewrite rule (we’ll talk about that one in a little bit since it has to do with plug-in auto provisioning). The one we’re concerned with is the next one, which rewrites [mac-address]-phone.cfg requests to our provisioning script. So each request to that file is actually rewritten to provision.php?mac=[mac-address]. Now, in the database, we’re keeping track of what kind of phone it is (an IP-330, IP-331 or IP-4000), so when a request hits the script, we look up in the database what kind of phone we’re dealing with based on the MAC address, and use the variables from the database to fill in a template file containing exactly what that phone needs to configure itself. For example, the base template file for the IP-330 looks something like this: <sip> <userinfo> <server <?php foreach($phone as $key => $p) { ?> voIpProt.server.<?php echo $key+1 ?>.address="<?php echo $p["host"] ?>" voIpProt.server.<?php echo $key+1 ?>.expires="3600" voIpProt.server.<?php echo $key+1 ?>.transport="UDPOnly" <?php } ?> /> <reg <?php foreach($phone as $key => $p) { ?> reg.<?php echo $key+1 ?>.displayName="<?php echo $p["first_name"] ?> <?php echo $p["last_name"] ?>" reg.<?php echo $key+1 ?>.address="<?php echo $p["name"] ?>" reg.<?php echo $key+1 ?>.type="private" reg.<?php echo $key+1 ?>.auth.password="<?php echo $p["secret"] ?>" reg.<?php echo $key+1 ?>.auth.userId="<?php echo $p["name"] ?>" reg.<?php echo $key+1 ?>.label="<?php echo $p["first_name"] ?> <?php echo $p["last_name"] ?>" reg.<?php echo $key+1 ?>.server.1.register="1" reg.<?php echo $key+1 ?>.server.1.address="<?php echo $p["host"] ?>" reg.<?php echo $key+1 ?>.server.1.port="5060" reg.<?php echo $key+1 ?>.server.1.expires="3600" reg.<?php echo $key+1 ?>.server.1.transport="UDPOnly" <?php } ?> /> </userinfo> <tcpIpApp> <sntp tcpIpApp.sntp.address="pool.ntp.org" tcpIpApp.sntp.gmtOffset="<?php echo $tz ?>" /> </tcpIpApp> </sip> The script outputs this when the phone requests it. Voila. Magic configuration from the database. There’s a little bit more to it than this. A lot of the settings custom to the company and shared among the various phones are in a master dealnews.cfg file, and included with each phone (it was added to the 000000000000.cfg file). Now, on to the next rule. Generating the company directory Polycom phones support directories. There’s a way to get this to work with LDAP, but I haven’t tackled that yet. So, for now, we generate those dynamically as well when the phone requests any of its *-directory.xml files. This one’s pretty easy since 1) we don’t allow the endpoints to customize their directories (yet), and 2) because every phone has the same directory. So all of those requests go to a script that outputs the XML structure for the directory: <directory> <item_list> <?php if(!empty($extensions)) { foreach($extensions as $key => $ext) { ?> <item> <fn><?php echo $ext["first_name"]?></fn> <ln><?php echo $ext["last_name"]?></ln> <ct><?php echo $ext["mailbox"]?></ct> </item> <?php } ?> <? } ?> </item_list> </directory> We do this for both the 000000000000-directory.xml and the [mac-address]-directory.xml file because one is requested at initial boot (the 000000000000-directory.xml file is intended to be a “seed” directory), whereas subsequent requests are for the MAC address specific file. Getting the log files Polycoms log, and occasionally the logs are useful for debug purposes. The phones, by default, will try to upload these logs (using PUT requests if you’re provisioning via HTTP like we are). But having the phone fill up a directory full of logs is ungainly. Wouldn’t it be better to parse that into the database, where it can be easily queried? And because the log files have standardized names ([mac-address]-boot/app/flash.log), we know what phone they came from.Well, that’s what the last two rewrite lines do. We rewrite those PUT requests to a PHP script and parse the data off stdin, adding it to the database. A little warning about this. Even at low settings Polycom phones are chatty with their logs. You may want to have some kind of cleaning script to remove log entries over X days old. Passing the initial config via DHCP At this point, we have a working magic configuration. Phones, once configured, fetch dynamically-generated configuration files that are guaranteed to be as up-to-date as possible. Their directories are generated out of the same database, and log files are added back to the same database. It all works well! … except that it still requires me to touch the phone. I’m still required to punch into the keypad the provisioning directory to get it going. That sucks. But there’s a way around that too! By default, Polycom phones out of the box look for a provisioning server on DHCP option 66. If they don’t find this, they will proceed to boot the default profile thats ships with the phone. It’s worth noting that, if you don’t pass it in the form of a fully-qualified URL, it will default to TFTP. But you can pass any format you can add to the phone. if substring(hardware, 1, 3) = 00:04:f2 { option tftp-server-name "http://server.com"; } In this case, what we’ve done is look for a MAC address in Polycom’s space (00:04:f2) and pass it option 66 with our boot server. But, we’re passing the same thing no matter what kind of phone it is! How can we tell them apart, especially since, at this point, we don’t know the MAC address. The first rewrite rule handles part of this for us. When the phone receives the server from option 66 and requests 000000000000.cfg from the root directory, we instead forward it on to our index.php file, which handles the initial configuration. Our script looks at the HTTP_USER_AGENT, which tells us what kind of phone we’re dealing with (they’ll contain strings such as “SPIP_330”, “SPIP_331” or “SSIP_4000”). Using that, we selectively give it an initial configuration that tells it the RIGHT place to look. <?php ob_start(); if(stristr($_SERVER['HTTP_USER_AGENT'], "SPIP_330")) { include "devices/polycom_ip330_initial.php"; } if(stristr($_SERVER['HTTP_USER_AGENT'], "SPIP_331")) { include "devices/polycom_ip331_initial.php"; } if(stristr($_SERVER['HTTP_USER_AGENT'], "SSIP_4000")) { include "devices/polycom_ip4000_initial.php"; } $contents = ob_get_contents(); ob_end_clean(); echo $contents; ?> These files all contain a variation of my previous auto-provisioning configuration config, which tells it the proper directory to look in for phone-specific configuration. Now, all you do is plug the phone in, and everything else just happens. A phone admin’s dream. Keeping things up to date By default, the phones won’t check to see if there’s new config or updated firmware until you tell them to. But his also means that some things, especially directory changes, won’t get picked up with any regularity. A quick change to the configs makes it possible to schedule the phones to look for changes at a certain time: <provisioning prov.polling.enabled="1" prov.polling.mode="abs" prov.polling.period="86400" prov.polling.time="01:00" /> This causes the phones to look for new configs at 1AM each morning and do whatever they have to with them. Conclusions The reason all this is possible is because Polycom’s files are 1) easily manipulatable XML, as opposed to the binary configurations used by other manufacturers, and 2) distributed, so that you only need to actually send what you need set, and the phone can get the rest from the defaults. In practice this all works very well, and cut the time it used to take me to configure a phone from 5-10 minutes to about 30 seconds. Basically, as long as it takes me to get the phone off the shelf and punch the MAC address into the admin GUI I wrote. I don’t even need to take it out of the box!
Read More
Apache
I’ve been using Google Chrome as my primary browser for the last few months. Sorry, Firefox, but with all the stuff I need to work installed, you’re so slow as to be unusable. Up to and including having to force-quit at the end of the day. Chrome starts and stops quickly But that’s not the purpose of this entry. The purpose is how to live with self-signed SSL certificates and Google Chrome. Let’s say you have a server with a self-signed HTTP SSL certificate. Every time you hit a page, you get a nasty error message. You ignore it once and it’s fine for that browsing session. But when you restart, it’s back. Unlike Firefox, there’s no easy way to say “yes, I know what I’m doing, ignore this.” This is an oversight I wish Chromium would correct, but until they do, we have to hack our way around it. Caveat: these instructions are written for Mac OS X. PC instructions will be slightly different at PCs don’t have a keychain, and Google Chrome (unlike Firefox) uses the system keychain. So here’s how to get Google Chrome to play nicely with your self-signed SSL certificate: On your web server, copy the crt file (in my case, server.crt) over to your Macintosh. I scp'd it to my Desktop for ease of work. ** These directions has been updated. Thanks to Josh below for pointing out a slightly easier way.** In the address bar, click the little lock with the X. This will bring up a small information screen. Click the button that says “Certificate Information.” Click and drag the image to your desktop. It looks like a little certificate. Double-click it. This will bring up the Keychain Access utility. Enter your password to unlock it. Be sure you add the certificate to the System keychain, not the login keychain. Click “Always Trust,” even though this doesn’t seem to do anything. After it has been added, double-click it. You may have to authenticate again. Expand the “Trust” section. “When using this certificate,” set to “Always Trust” That’s it! Close Keychain Access and restart Chrome, and your self-signed certificate should be recognized now by the browser. This is one thing I hope Google/Chromium fixes soon as it should not be this difficult. Self-signed SSL certificates are used **a lot **in the business world, and there should be an easier way for someone who knows what they are doing to be able to ignore this error than copying certificates around and manually adding them to the system keychain.
Read More