Data transfer with Netcat

The other day my brother, who works as a system administrator, inquired about a puzzling behavior of GNU Netcat, the popular nc utility. Sometimes described as the TCP/IP Swiss army knife, it can come in handy as an ad hoc file transfer tool, capable of transferring large amounts of data at the speed of disk reads/writes.

Data transfer

Typical usage looks like this:

nc -lp60000 | tar x                 # receiver
tar c dir... | nc otherhost 60000   # sender

It may look strange at first, but it’s easy to type and, once understood, almost impossible to forget. The commands work everywhere and require no specialized server software, only a working network and nc itself. The first command listens on port 60000 and pipes the received data to tar x. The second command provides the data by piping output of tar c to the other machine’s port 60000. Dead simple.

Note that transferring files with Netcat offers no encryption, so it should only be used inside a VPN, and even then not for sensitive data.

Data loss

One surprising behavior of this mode of transfer is that both commands remain hanging after the file transfer is done. This is because neither nc is willing to close the connection, as the other side might still want to say something. As long as one is positive that the transfer is finished (typically confirmed by disk and network activity having ceased), they can be safely interrupted with ^C.

The next step is adding compression into the mix, in order to speed up transfer of huge but easily compressible database dumps.

nc -lp60000 | gunzip -c | tar x               # receiver
tar c dir... | gzip -c | nc otherhost 60000   # sender

At first glance, there should be no difference between this pipeline and the one above, except that this one compresses the content sent over the wire and decompresses received content. However, much to my surprise, the latter command consistently failed to correctly transfer the last file in the tar stream, which would end up truncated. And this is not a case of pressing ^C too soon — truncation occurs no matter how long you wait for the transfer to finish. How is this possible?

It took some strace-ing to diagnose the problem. When the sender nc receives EOF on its standard input, it makes no effort to broadcast the EOF condition over the socket. Some Netcat implementations close (“shut down”) the write end of the socket after receiving local EOF, but GNU Netcat doesn’t. Failure to shut down the socket causes the receiving nc to never “see” the end of file, so it in turn never signals EOF to gunzip. This leaves gunzip hanging, waiting for the next 32K chunk to complete, or for EOF to arrive, neither of which ever happens.

Preventing Netcat data loss

Googling this issue immerses one into a twisted maze of incompatible Netcat variants. Most implementations shut down the socket on EOF by default, but GNU Netcat not only doesn’t do so, it doesn’t appear to have an option to do so! Needless to say, the huge environment where my brother works would never tolerate swapping the Netcat implementation on dozens of live servers, possibly breaking other scripts. A solution needed to be devised that would work with GNU Netcat.

At this point, many people punt and use the -w option to resolve the problem. -w SECONDS instructs nc to exit after the specified number of seconds of network inactivity. In the above example, changing nc -lp60000 to nc -lp60000 -w1 on the receiving end causes nc to exit one second after the data stops arriving. nc exiting causes gunzip to receive EOF on standard input, which prompts it to flush the remaining uncompressed data to tar.

The only problem with the above solution is that there is no way to be sure that the one-second timeout occurred because the data stopped arriving. It could as well be the result of a temporary IO or network glitch. One could increase the timeout to decrease the probability of a prematurely terminated transfer, but this kind of gamble is not a good idea in production.

Fortunately, there is a way around the issue without resorting to -w. GNU Netcat has a --exec option that spawns a command whose standard input and standard output point to the actual network socket. This allows the subcommand to manipulate the socket in any way, and fortuitously results in the socket getting closed after the command exits. With the writing end closing the socket, neither nc is left hanging, and the transfer completes:

nc -lp60000 | gunzip -c | tar x                  # receiver
nc -e 'tar c dir... | gzip -c' otherhost 60000   # sender

Self-delimiting streams

There is one just little thing that needs explaining: why did the transfer consistently work with tar, and consistently failed to work with the combination of tar and gzip?

The answer is in the nature of the stream produced by tar and gzip. Data formats generally come in two flavors with respect to streaming:

  1. Self-delimiting: formats whose payload carries information about its termination. Example of a self-delimiting stream is an HTTP response with the Content-Length header — a client can read the whole response without relying on an out-of-band “end of file” flag. (HTTP clients use this very feature, along with some more advanced ones, to reuse the same network socket for communicating multiple requests with the server.) A well-formed XML document without trailing whitespace is another example of a self-delimiting stream.

  2. Non-self-delimiting: data formats that do not contain intrinsic information about their end. A text file or an HTML document are examples of those.

While a tar archive as a whole is not self-delimiting (nor can they be, since tar allows appending additional members at the end of the archive), its individual pieces are. Each file in the archive is preceded by a header announcing the size of the file. This allows the receiving tar to read the correct number of bytes from the pipe without needing additional end-of-file information. Although tar will happily hang forever waiting for more files to arrive on standard input, every individual file will be read to completion.

On the other hand, gzip does not produce a self-delimiting stream. gunzip reads data in 32-kilobyte chunks for efficient processing. When it receives a shorter chunk, say 10k of data, it doesn’t assume that this is because EOF has occurred, it just waits for the remaining 22K of data for its buffer to fill. gunzip relies on EOF condition to tell it when and if the stream has ended, after which it does flush the “short” buffer it last read. This incomplete buffer is what caused the data loss when compression was added.

This is why the -e solution works so nicely: it not only makes the socket close, it ensures that EOF is signaled across the chain of piped commands, including the receiving gunzip.

Creating panoramic photos

You might have seen some nice pictures around the web that have been taken with a simple compact camera, but they have an astonishing amount of detail. You may wonder, how do they get such a nice, detailed picture? They simply stitch them together. How, you might ask? Do I need to shell out hundreds of currency in order to obtain the latest from Adobe and the likes? Nope, once again free software to the rescue, and it’s incredibly easy!

Step 1

Take the pictures. Bear in mind that they need to overlap, which should really be obvious. A good rule of thumb is to have at least 50% of the picture to overlap with the previous picture. Remember, no one says you can’t take the photos in the portrait mode. It would be a good idea to lock the white balance to a reasonable preset, so the camera doesn’t decide that picture has gone from “cloudy” to “sunny”. Although, not really necessary, as hugin has very advanced features to compensate. Also, you’ll want to lock the exposure so it doesn’t vary between the shots. Once again, this isn’t a problem for hugin, but it might improve your panorama. You can stack an arbitrary grid of pictures, for example 2×3, 3×3, 4×2, etc. For the sake of this article, I used the almost automatic on my EOS 100D with a 40mm pancake lens:

I used the portrait orientation for taking the pictures. I just snapped them and uploaded to my computer.

Step 2

Install Hugin that undoubtebly comes with your favorite distro, or if you’re a Windows user, simply download from their website. Now, I should point out at this time that Hugin is a very feature-full and complex software. The more advanced features are beyond the scope of this article, and quite frankly they somewhat elude me. Anyway, before I get too side-tracked, fire up Hugin, click on Load images, then on Align, and finally Create panorama, choose where you want the stitched photo to end up. There is no step 3:

Beautiful view of Zagreb
Beautiful view of Zagreb

Hugin took care of the exposure and the white balance. You should really use the tips from above, though.

Conclusion

You’ll tell me, but MrKitty, there is wonderful software out there that is waaay better than Hugin, or Hugin is a very advanced tool that you have no idea how to use. Very much true, but the point of this 2-step tutorial is to point out to people that Linux and the associated software CAN be user friendly, and sometimes even more powerful than their proprietary counterparts. I’ve been using Linux for a while now and I sometimes get the question, but why are you using Linux instead of Windows? There is no easy answer. For starters, I work as a Linux sysadmin for a living, so that’s one, though I don’t really need anything more than Putty. It’s the little things, stuff like Hugin, it’s the plethora of programs that are available with your friendly package manager, the ability to write simple code without the need for big frameworks and the like. Try looping through a couple of files and doing something on Windows. You need specialized software for every little thing you want to do.

But MrKitty, you’re a power user, you sometimes code, you’re a professional in the field, of course you like Linux better! Well, I don’t really have anything against Windows, or Mac, or whatever. But I think everyone is forgetting just how much Windows can be a pain in the ass. I won’t even go for the low shots like BSOD.

Billions of dollars have gone into making this as user friendly as possible
Billions of dollars have gone into making this as user friendly as possible

OK, forget BSOD, there are other stuff that Windows lovers might forget. I’m sure everyone cherishes those sweet moments when you’re battling with drivers. I used to fix computers for money. You wouldn’t believe the stuff I would see. The latest one, a colleague of mine asked me to help him out with a mobile USB dongle. The laptop was running Windows 8, I think. Wow, I really lost the touch with the new Windows, in my mind Windows XP is the latest and greatest. Took me a while to actually find the control panel. OK, the drivers were somehow screwed up, even though Windows 8 was supposed to be supported. There was enough signal, the connection was active. Nothing was loading. Pinging 8.8.8.8 seemed to work, but resolving anything did not, even though the DNS settings were correct. A couple of hours of headbanging and googling revealed a nice forum in Polish with people with the exact same problem, and to my surprise there was a solution at hand. A new and improved driver download from somewhere, creeping at a nice 3 – 10 kilobytes per second and it worked, after tweaking the endless carrier-specific options. So yeah, Windows are really user friendly. I have no idea if it would work on Linux.

Anyway, my mother, age 69 is using Linux and loves it. My wife says she can’t imagine ever using Windows again. :)