2008-03-21

Working around stupid picture restrictions

Google has managed to be evil again. :-P

Blogger does not support file uploads. Alone, this would not be a problem -- in context of a blog engine, it could be a mere oversight, or 'unneeded complexity'. Since Blogger supports image uploads, one would presume that concatenating a random image to a zip file would do the trick. (It needs to be a zip file, because -- unlike most other compression formats -- zips are end-headered, allowing any zip tool to recognise a zip stream in the end of another file. It's useful, for example, in creating and processing self-extracting zips.)

And one would be wrong. It turns out Google actively reprocesses uploaded images, removing any such tails.

Anyway, the next step of a workaround is not quite as simple, but still, reasonably obvious: encode the file data into an image.
Here's a pair of complementary Perl scripts -- using the Perl GD bindings -- that perform such an encoding and decoding. First, an image containing both inside a zip:


And here's the Perl code to perform decoding:

#! /usr/bin/perl -w

use strict;
use GD;

my $png;
undef $/;
$png .= $_ while <>;
my $image = GD::Image->newFromPngData($png);
my ($width, $height) = $image->getBounds;
my $data = '';
for my $offset (0 .. $width * $height - 1) {
my $byte = $image->getPixel($offset % $width,
int($offset / $width));
die "Too many colours" if $byte >= 256;
vec($data, $offset, 8) = $byte;
}
my ($magic, $length) = unpack 'A4N', $data;
die "Unknown magic" unless $magic eq 'FiGP';
die "Too little data (image incomplete?)"
unless $length <= length($data) - 8;
binmode STDOUT;
print substr $data, 8, $length;

Technical details


Blogger accepts images in three common formats: JPEG, GIF, and PNG. JPEG is generally a lossy format, so it's unsuitable for this usage; this leaves GIF and PNG. Little experimentation shows that internally, Blogger converts GIFs into PNGs, which implies that PNGs have a somewhat lesser chance of getting hosed in the process. Thus, we use PNG as the container format.

A PNG file can be viewed as a matrix of pixels, picked from a palette specified in the file. (Actually, more complex structures are possible, but this is good enough for our purpose.) Thus, an obvious choice is to be assigning to each possible byte in the plaintext one colour in the palette, and then to convert each plaintext byte into a pixel. For simplicity, we're not even finding the minimum set of all possible bytes in the plaintext, but using a constant set of 0-255. Since we don't really need the colours, only their indices -- which means that pretty much the only restriction is that the colour needs to be distinct for each byte --, we construct a palette in which Nth entry has the red-green-blue values of (N, N, N). Note that GD's png and gif construction subsystems assign colour indices in the order they're assigned, and that Perl's map works in the order of the given list.

Next, there's the issue of PNG being a two-dimensional matrix, and a file being a one-dimensional stream. We could resolve this by constructing a PNG that would be one pixel high or wide, but such a PNG would be inconvenient to handle. Thus, we use larger widths -- by default, 256 pixels --, and map the plaintext to the image by rows. However, this raises an issue of padding, at the very least when the plaintext's original length is a prime, or does not have convenient factors. This could be resolved in several ways; for example, we could use a special, 257th, palette entry to denote unused pixels. However, the way I chose is to add a special 8-pixel (or 8-byte) header to the image, to keep the useful size of the data, and to allow 'format identification' via a magic prefix. This format's magic prefix is 'FiGP', for 'file in grayscale picture'.

Finally, if we used a fixed width and calculated the height based on actual need, small files would produce wide and short images that, again, would be hard to handle. A workaround would be using a minimum height, but small files padded to this minimum height would look ugly, and random-padding to combat ugliness would certainly be overkill. Thus, our solution is to use progressively smaller widths if the height would otherwise be too small, and if that fails, apply the minimum height of 32 pixels.

Note that the decoder doesn't care what the image size is; it just walks it by rows. It also doesn't care what the image RGB values is, only using their indices. It checks the magic header, however.

No comments: