All posts by mrkitty

Handling large sets of photographs and videos

In 2003 my father bought a digital camera for the family. It was an Olympus C-350 Zoom, 3.2 mpix, 3x optical zoom, a 1.8″ LCD display. At that time, at least here in Croatia, having a digital camera was fairly rare. I’m not saying I had it first in my city, but it wasn’t as commonplace as today. This was such a leap from anything that you owned. You could actually take a picture and upload it to the computer. And the image was usually great if the light conditions were optimal, of course. Indoors, and with low lighting the images were terrible.

Šibenik circa 2003/07
Šibenik circa 2003/07 on a good day
Sorry dude, it's 7:59:34 PM on an August the 18th. There's less sunlight than you think.
Sorry dude, it’s 7:59:34 PM on an August the 18th. There’s less sunlight than you think at this time of the year, so better keep the camera perfectly steady for one fifths a second.

This camera wasn’t cheap. It cost a little less than $500. For Croatian standards of the time it was a fair amount of money. Still is actually, but that is the minimum you have to spend to have a decent camera, it was like that then, it’s still like that now.

I was making pictures of the town, taking it on trips, documenting everything. Since I was always a computer enthusiast I was beginning to worry, what if the hard disk failed? I’d lose all of the photographs I had acquired. There are people that seem to underestimate the importance of photos. You take the photos, they’re nice, but they’re not that valuable right at that moment. Looking back 10 years or more, suddenly the pictures become somehow irreplaceable. They’re a direct window into your past, not the blurry vision of the past that most of us have, but something concrete and immutable. I think this especially applies when you have kids, you’ll want the pictures safe and sound, at least for a little while. Everything gets lost in the end, but why be the guy that loses something that could be classified as a family heirloom?

How not to lose the pictures and how to organize them

Here’s a high-level list of what I’ve found to be good practices, to keep it organized and safe:

  • A clear structure of stored photographs/videos. I’ve found that a single root directory with a simple YYYY-MM does the trick. I dislike categorizing pictures with directory names like summer-vacation-2003, party-at-friends-house-whose-name-I-cant-remember or something to that effect. I think that over time, the good times you had get muddled along the way, and you’ll appreciate a simple year-month format to find something or to remember an occasion. It’s like a time machine, let’s see what I was doing in the spring of 2004, and you can find fun pictures along the way.
  • This goes without saying, backups. Buy an external disk, they go cheap, and you can store a lot of photos there. Your disk can die suddenly and without notice, and all your pictures can simply vanish, never to be seen again. Sure, son, I’d love to show you pictures when I was young, but unfortunately, I couldn’t be bothered to have a backup ready and all the pictures are gone.
  • Disaster recovery – imagine your whole building/house burns to the ground. You get nothing but rubble, and although you were meticulously syncrhonizing to your backup every night to an external HDD, everything is gone. Or more realistically, your house gets broken into and they steal your electronics which contain data that is basically irreplaceable. Create a tarball of all your photographs/videos, encrypt it with a GPG key or passphrase, or with a simple SSL encryption and upload it into the cloud of your choice. Even in the event of a burglary/arson with a regular snapshot of about once per two months, you’ll still be able to recover most of the data when you rebuild your house or buy a new computer in the event of a burglary.
  • Print out a yearly compilation of pictures that you like at your local photo lab. Just pick like 40 of the best, with whatever criteria you deem fit. Who knows if the JPEG standard will be readable in 30 years time, but you can always look at a physical picture you can take with you.
Wow, I just called the cops that my house was burglarized. Now it burned down too? If only I had a disaster recovery plan for my valuable photos.
I just called in a burglary at my house. Now it burned down while getting beers from the store? If only I had a disaster recovery plan for my valuable photos on both desktop computer and portable HDD.

Photos

Most digital cameras, be it video or still frames, have pretty lavish defaults with the image quality. This is a very good thing. I like to get a source file that is close as possible as the device has serialized it to a file. Still, if you take a lot of pictures, you’ll quickly notice that it’s piling up. The first thing to do is delete the technically failed ones. Do not delete the pictures where you think that someone is ugly on it, it may end up great in a certain set of circumstances. You never know.

These days even the shittiest cameras boast with huge pixel numbers, like 10, 15 mega pixels or more with a tiny crappy lens and who knows what kind of sensor. Feel free to downsize it to 5-8 mega pixels, with a JPEG quality of 75-80. Quickly you’ll see that now your images consume a lot less space on the HDD, I’m talking about 30% of the original photo, sometimes even less. I spent a lot of time trying to find exactly how the image is degraded. Some slight aberrations can be seen if you go pixel peeping, but screw that, the photos might have sentimental value on the whole, and you’ve saved a lot of hard drive space that you realistically have available. I recommend using the Imagemagick suite for all your resizing needs. Create a directory where you want the recoded images, like lowres:

$ mogrify -path lowres -auto-orient -quality 80 -resize 8640000@ *.jpg

You can set the number of pixels, in this example it’s 8.64Mpix. Choose a resolution and go with it. I generally use 3600×2400 which is 8640000 in pixels. Mogrify is great for this task, it can encode the images in parallel, so if you have a multi-core computer it really shines since the operations involved are very CPU expensive. You can omit the -path switch, and the files will be processed and placed instead of the file, but be careful as this WILL overwrite the original file(s). Don’t test around on your only copy of the file. You can use the generally more safe convert which takes the same argument with a slight difference, it needs the INFILE and OUTFILE argument:

$ convert -auto-orient -quality 80 -resize 8640000@ mypicture.jpg mypicture-output.jpg

or

$ for JPEGS in *.jpg ; do convert -auto-orient -quality 80 -resize 8640000@ $JPEGS $JPEGS-out; done

The problem with this is that you’ll then have a bunch of IMG_xxxx.jpg-out files. This is the longer way around, but once you’re satisfied with the result, delete the original jpeg files and rename it with a program that mass renames or you can use a perl script called ren, my brother and a buddy of his wrote a long time ago and it still works great for a number of circumstances:

$ ren -c -e 's/\-out//'

This will rename all the files that have the -out to empty string, deleting it from the filename essentially. But this is the long way around, I suggest using mogrify. Mogrify had a very very nasty bug. At one point they decided that it would be cool if you have an Nvidia card and the proprietary drivers installed it would use the GPU for all your encoding needs. That sounds great in theory, but I actually had an Nvidia graphics card with the drivers properly installed. How do I know that? Complex 3D video games worked without issues. And guess what else? It didn’t fucking work. It simply hang there, and didn’t do anything, it would never finish a single image. Did I mention that you can’t fallback on the CPU so easily, I mean at all? I googled around, and multiple bugs were filed. I just tried mogrify now when writing this post, seems they have finally fixed it, and I may go back to using it again, instead of unnecessarily complex python scripts that called concurrent converts which number was based on the number of your physical cores.

Video

A nice feature of modern cameras is its ability to record decent video and audio. The cameras mostly use a very good quality preset for the recordings. On my current SLR camera I get 5-6 megabytes per second for a video. Not only that the files are monstrously huge, they also are sometimes in non-standard containers, have weird video and/or audio encodings. You should really convert it to something decent:

$ ffmpeg -i hugefile.mov -c:v libx264 -preset slow -crf 25 -x264opts keyint=123:min-keyint=20 -c:a libmp3lame -q:a 6 output.mkv

This produces a pretty good quality video. I am strongly against rescaleing the video in any way. Use the original resolution, the displays are advancing at a stable pace, you don’t want to unnecessarily scale down the resolution. You can change the quality with -crf from 18-29 are reasonable options, I discussed it in another post in more detail. Also, it decreases the file size by a factor of 15 or more, virtually without perceptible visual loss. As an added bonus you mux it into an open source container with the h264 family of encoders and the venerable mp3 format for audio. That should work on most computer players by default as well as standalone players hooked up to a TV.

I started this post as more of an in-depth technical overview how to store and encode your multimedia and backing it up. But instead I chose to give a high-level overview of what worked for me over the years. Make backups regularly, have a disaster recovery option present if at all possible, and print out some yearly photos. It’s fun to look over the physical pictures, and can be good fun showing it to visiting friends and family. When deciding how much to shrink the files, always keep in mind that you should compress them as much as possible while keeping the subjective perception of the quality as close as possible to the original. What I mean to say, don’t overdo with the quality settings. What matters is how much space is your archive consuming right now, and how are you able to cope with that amount of data.

Data loss is commonplace. Hard drives fail, do not lose 10+ years of photographs because you didn’t have a decent backup. It’s not so hard. Do it now. Don’t lose a part of your personal history, it’s priceless, and cannot be downloaded from the internet again. Always encrypt your stuff before uploading to the ethereal cloud. Maybe you have sensitive pictures that you wouldn’t want anyone else casually looking over just because they happen to be the sysadmin. You wouldn’t make the same kind of privacy breach in other parts of your life, would you?

Match file timestamp with EXIF data

Over the years I’ve collected a lot of pictures, coming close to 20000. Most of these pictures have the exif metadata embedded in the JPEG files. Alas, I was careless with some of the photographs, and when copying over from filesystem to filesystem, creating backups etc., the timestamps got overwritten. So now I had loads of files that had a timestamp of 22nd January 2010 for example. They were most definitely not taken at that date, but rather they were copied then and no preserve timestamp flag was enabled at the time of the cp issued. I googled for a quick solution to my problems, but I could not find anything that would be simple to use, and not clogged up with bullshit. So, I cracked my knuckles and delved into the world of Python:

#!/usr/bin/python2

# You need the exifread module installed
import exifread, time, datetime, os, sys

def collect_data(file):
    tags = exifread.process_file(open(file), 'rb')
    for tag in tags.keys():
        if tag in ('EXIF DateTimeOriginal'):
            return "%s" % (tags[tag])

for file in sys.argv[1:]:
    try:
        phototaken = int(datetime.datetime.strptime(collect_data(file), '%Y:%m:%d %H:%M:%S').strftime("%s"))
        os.utime(file,(phototaken,phototaken))
    except Exception, e:
        sys.stderr.write(str(e))
        sys.stderr.write('Failed on ' + str(file) + '\n')

Basically it takes each file, reads the exif metadata for photo taken and invokes the os.utime() function to set the timestamp to that date. You’ll need the exifread module for Python, this is the simplest one I could find that can do what I needed it to do. I hope someone will find this script useful. You can start it simply with $ exify *.JPG. You can download it here.

Recording a game video with Linux

I’m sure a lot of people have always thought, wow, I’d like to record a video of this to have it around! On Linux! Well, with it’s incredibly easy! OK, not really so easy, you’ll have to handle a few hurdles along the way, but it’s nothing terrible. As an example I’ll be using prboom which is an engine to run Doom 1/2 with the original WADs you have obtained legally, paid for fair and square etc. It uses 3D hardware acceleration, no jumping, crouching and shit like that. It’s great to see Doom 1/2 in in high resolution, it looks pretty good, and very true to the original, and makes the game more than playable.

Requirements

There is a beautiful program called glc. Basically it hooks to the video & audio of the system and dumps a shitload of fastest compressing png files, that is one per frame. Depending on the resolution you use for capturing and the framerate, expect very hardcore output per second to your HDD, somewhere around 50 megabytes per second for a full hd experience, and that’s with the quicklz compression method for glc-capture.

I won’t go into too much detail how to install glc, or prboom. I’m sure it’s simple for your favorite Linux distribution. It was a simple aurget -S on Arch. Now,let’s head on to actually capturing some gameplay. The syntax is very simple glc-capture [options] [program] [programs' arguments]:

The initial video capturing

$ glc-capture -j -s -f 60 -n -z none prboom-plus -width 1920 -height 1080 -vidmode gl -iwad dosgames/DOOM2/DOOM2.WAD -warp 13 -skill 5

This was the tricky part, I had to play around with the options to get it glitch free inside the game. I’ve recorded a video three years ago with glc, can’t remember using some of these options. -j – force-sdl-alsa-drv, got better performance, but maybe unneeded, play around with it -s – so recording starts right away -f – sets the framerate -n – locks FPS, didn’t need this before, but you get a glitch-free recording -z none – no PNG compression, I’ve had better performances without compression The prboom-plus options should be self-explanatory, I’ve used the 1920×1080 resolution so it’s youtube friendly. The -warp is to warp to level 13, and -skill 5 is for nightmare. The file output is named $PROGRAM-$PID-0.glc by default.

OK, the easy part is done, apart from the tricky part. Now you have a huge-ass .glc file on your hard drive that is completely unplayable by any video player known to man. And when I say huge-ass, I mean huge-ass. A 54 second video comes out to 1.79GB, which is 34MB per second in 720p, and for 1080p I had up to 42MB per second! The default png compression used by glc-capture is quicklz. For 1080p I had some better experiences using -z none so it simply dumps the PNGs into the file and that’s it. As you might figure, this will also increase the resulting file size, but it could be well worth the disk space if you don’t have a fast CPU. You’ll get close to a 100 MB/s for a 1080p stream. Use the default compression if in doubt. Experiment.

What do we do with an unplayable, unusable, unuploadable gigantic glc dump on our hard drive? I strongly suggest you encode it somehow. I used to use mencoder for all my encoding needs. Due to the way it’s maintained, or a lack thereof, I switched to ffmpeg which has an active development and used a lot in the backends of various video tube sites around the internet. OK, let’s go, step by step:

Extract the audio track

$ glc-play prboom-plus-12745-0.glc -a 1 -o 1080p.wav

This line dumps the audio track from the glc file, of course it’s a completely uncompressed wave file. -a 1 is for track #1, and -o is for output, naturally.

Pipe the uncompressed video to ffmpeg and encode to a reasonable file format

$ glc-play prboom-plus-12745-0.glc -o - -y 1 | ffmpeg -i - -i 1080p.wav -c:v libx264 -preset slow -crf 25 -x264opts keyint=123:min-keyint=20 -c:a libmp3lame -q:a 6 doom-final-file.mkv

-o - dumps it to STDOUT, -y 1 is for video track 1. Now we have used the all might unix PIPE. I love pipes. In this case ffmpeg uses two files as input, one is STDIN, that is the hardcore raw video file, no pngs, just raw video. The other input is the audio track we dumped earlier. This could be streamlined with a FIFO, but that’s overcomplicating things. The rest of the ffmpeg options are beyond the scope of this article, but they’re a reasonable default. The final argument of ffmpeg is the output file. The container type is determined by the file extension, so you can use mp4, mkv, or whatever you want. After this, the video is finally playable, uploadable, usable. Congrats, you have just recorded your video the Linux way!

If you do want to customize the final video quality, take a look at the the ffmpeg documentation at what these mean. The only thing of interest is the -preset and the -crf. The crf is the “quality” of the video. I was astounded that 2-pass encoding is a thing of the past, and it’s all about the crf now. It goes from 0 to 51. And only a small part of that integer range is actually usable. I simply cannot relay the beautiful wording from the docs, so I’ll just paste it here:

The range of the quantizer scale is 0-51: where 0 is lossless, 23 is default, and 51 is worst possible. A lower value is a higher quality and a subjectively sane range is 18-28. Consider 18 to be visually lossless or nearly so: it should look the same or nearly the same as the input but it isn’t technically lossless.

Details like these can really brighten a person’s day. 18 is visually lossless (and no doubt uses a billion bits per second), but technically only 0 is lossless. So you have a full range from 0 to 18 that is basically useless. Of course, it goes the other way around. After -crf 29 the quality really goes downhill.

The resulting video can be found here or you can see it on YouTube. Excuse my cheating and my dying so fast, this is for demonstration purposes.

Conclusion

I realize there are probably better ways of accomplishing this, you can google around for better solutions. Glc-capture supposedly works with wine too, with some tweaks. I haven’t really tried it, but feel free to leave a comment if someone had any experience with it. This is a simple way to make a recording, you can edit it later once you encode the file to something normal. Glc also supports recording multiple audio tracks so you could also record your voice with a microphone and mash it all together. Good luck with that. :)

A different view

Some years ago my sister-in-law asked me to write an essay she had for her Croatian homework. I said, I suck at writing essays, but if it’s an interesting topic, why the heck not. After all, I’m sometimes known for my ability to be a smartass which can come in handy. The topic of the essay was “The universe and my place within it”. Wow, now that’s something I can do. I wrote it all down, and her teacher was genuinely impressed with it, so much that she had pinned it in the school lobby as the best essay written. Supposedly the essay is still pinned there, 4 years later, but that’s speculation on her part.

However, we were caught. Our genius plan has been foiled by her teacher. She immediately knew she didn’t actually write a word of it, and had asked her a simple question; what exactly is a galaxy? The sister-in-law, then 15, replied with a blank stare. The teacher made a proposition: “Would you like a passing grade, or write your own?” She chose the barely passing grade. Anyway, looking back at the piece I wrote, it’s kinda cute and decided to share it here, translated to English. Since this is somewhat lost in the translation, the protagonist is the sister-in-law and is set from her point of view after returning home from school.


During a windy and cold, starry winter night while coming home from school I saw the Milky Way in the sky. This wasn’t the first time I had laid my eyes on it, but it was the first time I appreciated the implications of such a sight. It’s a view to the center of our galaxy, in which we are nothing but an insignificant planet in our galactic neighborhood, just one of many. I felt how I was standing on one of those tiny rocks, the gravitational pull of the entire planet pulling me down to the center of the rock that is enormous to us, but a speck of dust compared to other astronomical bodies. The seemingly infinite number of stars only in our galaxy, all with their planets moving of their own accord can leave a person breathless.

You can ask yourself a lot of questions. The first question that pops into one’s mind is “are we alone in the universe?” The answer is, of course we’re not alone in the universe! Out of all the countless planets that are woven throughout the universe, it’s impossible that the Earth is the only one blessed with the prerequisites for life. Where are the little green men, then? No one said they were anywhere close. Life, at least the way we know it could be extremely rare. It’s possible that a planet like Earth is only present in one out of a billion galaxies. Even taking into account such a grim approximation it would mean there are at least 200 to 300 alien civilizations in the visible universe that are asking themselves the very same questions I’m asking while gazing into the beautiful night sky.

The added problem of our alien brethren is the time and the vast distances involved. An alien civilization could have been at arms reach a mere 3 billion years ago, long dead by now with their sun dying a violent death which can now be seen as the remains of a long-gone alien solar system through telescopes and pretty pictures on the internet. The roles could be reversed, in 10 billion years we’ll be the lovely false color pictures. The distances and times involved are so colossal that it begs the question can technology within one system, our universe, ever be outside the bounds of said system so we could conceivably call our galactic neighbors to lunch.

And this thing is totally nearby
The commute is terrible, and this thing is totally nearby

You can’t rule out the possibility of a consciousness and intelligence so alien to us that even with our most intense efforts we can never hope to detect them, so different that it’s not even possible in a sense. They might not be living on a planet somewhere and asking themselves dumb questions like us. Those kind of analogies could be applied here, to Earth. You’re going somewhere, minding your own business, when you encounter a “lower” life-form, a worm or something similar. Can we help it in some way, step on it by accident or with intent? We can even simply not perceive it in any way, shape or form and go about your merry way. Can a worm discern us in a sensible kind of way, inside its own limited perception of the real world, whatever that may be? It could very likely be that we’re the worm in a scenario where a “higher” life-form is asking similar questions about us. Is this what they call a deity?

The Milky Way is abruptly replaced by concrete and shingles of my own home. Oh, I’ve arrived. The level of inspiration gathered by a simple gaze at the center of our galaxy is unbeatable, albeit it almost completely vanishes as soon as you go through the doorstep. Watching other Suns, other worlds, even if it is through your mind is now replaced by other simpler and in a way harder questions and problems. In any case, I can’t wait for my next walk through the cosmos!

Creating panoramic photos

You might have seen some nice pictures around the web that have been taken with a simple compact camera, but they have an astonishing amount of detail. You may wonder, how do they get such a nice, detailed picture? They simply stitch them together. How, you might ask? Do I need to shell out hundreds of currency in order to obtain the latest from Adobe and the likes? Nope, once again free software to the rescue, and it’s incredibly easy!

Step 1

Take the pictures. Bear in mind that they need to overlap, which should really be obvious. A good rule of thumb is to have at least 50% of the picture to overlap with the previous picture. Remember, no one says you can’t take the photos in the portrait mode. It would be a good idea to lock the white balance to a reasonable preset, so the camera doesn’t decide that picture has gone from “cloudy” to “sunny”. Although, not really necessary, as hugin has very advanced features to compensate. Also, you’ll want to lock the exposure so it doesn’t vary between the shots. Once again, this isn’t a problem for hugin, but it might improve your panorama. You can stack an arbitrary grid of pictures, for example 2×3, 3×3, 4×2, etc. For the sake of this article, I used the almost automatic on my EOS 100D with a 40mm pancake lens:

I used the portrait orientation for taking the pictures. I just snapped them and uploaded to my computer.

Step 2

Install Hugin that undoubtebly comes with your favorite distro, or if you’re a Windows user, simply download from their website. Now, I should point out at this time that Hugin is a very feature-full and complex software. The more advanced features are beyond the scope of this article, and quite frankly they somewhat elude me. Anyway, before I get too side-tracked, fire up Hugin, click on Load images, then on Align, and finally Create panorama, choose where you want the stitched photo to end up. There is no step 3:

Beautiful view of Zagreb
Beautiful view of Zagreb

Hugin took care of the exposure and the white balance. You should really use the tips from above, though.

Conclusion

You’ll tell me, but MrKitty, there is wonderful software out there that is waaay better than Hugin, or Hugin is a very advanced tool that you have no idea how to use. Very much true, but the point of this 2-step tutorial is to point out to people that Linux and the associated software CAN be user friendly, and sometimes even more powerful than their proprietary counterparts. I’ve been using Linux for a while now and I sometimes get the question, but why are you using Linux instead of Windows? There is no easy answer. For starters, I work as a Linux sysadmin for a living, so that’s one, though I don’t really need anything more than Putty. It’s the little things, stuff like Hugin, it’s the plethora of programs that are available with your friendly package manager, the ability to write simple code without the need for big frameworks and the like. Try looping through a couple of files and doing something on Windows. You need specialized software for every little thing you want to do.

But MrKitty, you’re a power user, you sometimes code, you’re a professional in the field, of course you like Linux better! Well, I don’t really have anything against Windows, or Mac, or whatever. But I think everyone is forgetting just how much Windows can be a pain in the ass. I won’t even go for the low shots like BSOD.

Billions of dollars have gone into making this as user friendly as possible
Billions of dollars have gone into making this as user friendly as possible

OK, forget BSOD, there are other stuff that Windows lovers might forget. I’m sure everyone cherishes those sweet moments when you’re battling with drivers. I used to fix computers for money. You wouldn’t believe the stuff I would see. The latest one, a colleague of mine asked me to help him out with a mobile USB dongle. The laptop was running Windows 8, I think. Wow, I really lost the touch with the new Windows, in my mind Windows XP is the latest and greatest. Took me a while to actually find the control panel. OK, the drivers were somehow screwed up, even though Windows 8 was supposed to be supported. There was enough signal, the connection was active. Nothing was loading. Pinging 8.8.8.8 seemed to work, but resolving anything did not, even though the DNS settings were correct. A couple of hours of headbanging and googling revealed a nice forum in Polish with people with the exact same problem, and to my surprise there was a solution at hand. A new and improved driver download from somewhere, creeping at a nice 3 – 10 kilobytes per second and it worked, after tweaking the endless carrier-specific options. So yeah, Windows are really user friendly. I have no idea if it would work on Linux.

Anyway, my mother, age 69 is using Linux and loves it. My wife says she can’t imagine ever using Windows again. :)