WORK IN PROGRESS

this guide is not yet finished. at the moment it's just a rough outline.

Introduction

If you own a digital camera and take lots of pictures you need an idea of archiving them. That's not really a problem. The difficulty is finding a single picture in your collection of thousands.

So if you put some effort into the archiving process, you have a higher possibility of finding one. Of course this extra effort creates an overhead that's only worth it if it saves you enough time searching.

Besides this trade-off there are some pitfalls you may run into. I'll explain my archiving concept to you by presenting my workflow and adding some thoughts to it.

For feedback or anything to add or improve, you can contact me.

Concepts

There are different concepts that need to be combined in order to get the job done.

How to sort

There are two ways to sort your pictures that make it possible to infer the location of a picture on disk: chronological and spatial.

Photos that are connected to each other usually describe the same event, which is at a given time and place.

chronological

Using a timeline is the most simple way to archive your photo-collection. Most people will do something like this by common-sense. E.g. you may have the following directory structure:

+- 2008
|  +- ...
|  +- ...
|  ...
+- 2009
   ...
   +- 2009-07-23 Summer Party
   +- 2009-12-24 Christmas
This way you group events (Summer Party, Christmas) by date and additionally group events of a year together. Encoding the date in ISO 8601 has the advantage, that directories of events are sorted automatically in filesystem view.

Fortunately the date is stored together with the photo by the camera.

spatial

If you shoot pictures of more than one event in a mixed way the aproach above is quite limited. But there are possibilities to geocode your photos and access them using a map view.

However - my camera doesn't support that and I don't have an external GPS device either, so there's no such thing like automated geocoding for me ;-(

Filesystem limitations

Current filesystems are hierarchy-based and thus quite limited for storing and sorting files for multiple criteria.

While waiting for better filesystems, we have to decide for one criterion for now. As mentioned above, chronological is more convenient than spatial.

Filename Conventions

When storing a picture on disk you need a filename. Since the camera needs to store the picture first, it has to decide for a name. In my opinion names like dcf0981.jpg are fine. They're short and meaningless, but they are like an ID of your picture.

Unfortunately the counter will overflow after 10000 pictures, so after some time you'll have multiple photos with the same ID and filename. This is fine as long as you are storing them grouped by date and assume, that you don't shoot 10000 pics a day. But if you make a collection of some pictures of the last 10 years they may collide.

A way to circumvent this is to prefix the filename with the date of the shot with the advantage that photos are now sorted by date in filesystem like before. The disadvantage is that filenames get quite long. e.g. 20091121184521_dcf0981.jpg (YYYYMMDDHHMMSS = 14 chars) or 1258825521_dcf0981.jpg (unix timestamp = 10 chars) at a resolution of 1 second.

Base36

I'm using a base36 representation of the unix timestamp. Basically you just encode the huge number with an alphabet of 36 chars (A-Z, 0-9) instead of 10 numbers (0-9). Therefore the length of your "number" becomes smaller: KTGZZL_dcf0981.jpg (6 chars) which is more memorable and pronounceable.

You may discard the original filename postfix if you never take more an one pictures per second. If you do, you may keep the least significant part of the ID which allows 10 pictures per second: KTGZZL_1.jpg (if you kick out the underscore it's as long as the original filename)

A possible disadvantage of unix timestamp based names is the Y2K38-problem. Eventually we will just lose strict sorting. The numbers are still unique for the next 69 years, but wrong ;-) I believe we'll have better filesystems until this really gets a problem. If I'm wrong it's still possible to convert as changing bases is reversible.

Another way to shorten the filename is to decrease the resolution of your timestamp.

Tags

To circumvent the limits of filesystems you can use tags. They are quite popular these days and you can imagine them as multiple filenames for a file, since you can use more than one.

I use them to describe the location (for spatial sorting), event and content. The more data you add the finer your searches may be. If you never plan to search for a picture by content this is a perfect waste of time.

Since tags don't have a hierarchy, I embed one in it: all my tags for locations start with "Location.". All my tags for events start with "Event.". And guess what: All my tags for Contents start with "Content."...

As an example I would tag this picture (KL89PP_dsc7857.jpg): example for tagging: toy

Usually I even tag people, which could be a quite boring work, but makes searches possible like: "show me all pictures of me and my friend without a dog". For people I use tags like "People.Surname.Firstname". My hope for the future is that this may be automated by facial recognition.

Meta-Data is MY data

All the mentioned properties of a picture like time and tags are data about your data (picture). There are standardized formats for pictures like EXIF and IPTC which are embeded in the picture file itself. This way these meta-data can be copied with the file as long as filesystems can't handle meta-data in a secure way. Whereever you move or rename your picture file, the meta-data stays with it. Every program you need to work with your pictures may access the meta-data.

The date of teh photo and potential available GPS coordinates are already a part of these embeded formats. Your camera stores additional things like parameters in it already. So the best thing you could do is store your tag in them, too.

Unfortunately there are many photo collection programs available that don't respect your meta-data. Once they import a picture, they'll add everything to a local database for searching. tags and all additional information you enter only gets stored in the database. You can still do all the tasks like tagging and searching, but it's hard to move to another collection product or share pictures with friends, as you'll lose some meta-data (your work!).

So watch out what your software does. Jbrout respects your meta-data and stores all of it in the file. To speed up searching it caches them in a local database, but changes go to the file. I can even access my tags without this software, either with EXIF/IPTC-decoders or raw with a hex editor ;-)

Workflow

mount camera move images to temporary import directory umount camera base36rename pictures & videos ls -1 *.avi |xargs -i rename-video-date36 {} ls -1 *.jpg |xargs -i rename-EXIF-date36 {} start jbrout refresh import directory in jbrout browse through all pictures, delete bad pics, rotate pics, group events in subdirectories for storing assign tags move pictures to permanent storage quit jbrout the early tagger memorizes the most TODO multi-cam