Magnum Photos' tagging game

worker-results

What’s in a name? Magnum has teamed up with New York start-up Tagasauris to develop a web-based media tagging tool.

Magnum Photos has opened up its enormous archive of images to a wider client base using the very latest keywording technology. Philip Wolmuth speaks to the agency

Author: Philip Wolmuth

Over the past few months Magnum Photos has re-keyworded almost all the 500,000 photos in its digital archive, using a new system based on crowd sourcing. The prestigious photo agency has teamed up with New York start-up Tagasauris to develop a web-based media tagging tool that enables the time-consuming task of metadata entry to be farmed out to the enormous pool of online labour through Amazon's Mechanical Turk service.

Mechanical Turk is a "marketplace for work", which gives businesses and developers "access to an on-demand, scalable workforce". The web service, which launched in 2005, allows its users to post Human Intelligence Tasks (HITs) - as the name suggests, tasks that require human input and cannot be easily accomplished by machine. Reading a photo, understanding the significance of its visual content and translating it into words, for example.

Responding to these posts are workers around the world. While some tasks can be done by anyone who registers, others require completion of a qualification test, as with Magnum's new keywording labour force, which is managed and monitored by Tagasauris.

According to Meagan Young, Magnum's Web Content Manager, an image sent out through the new system will come back, having been keyworded by up to eight people, in less than a minute. After piloting small trials last summer, images are now being sent out in batches of 20,000. "It's really exciting," she says. "You can keyword an entire archive within weeks."

Sheer scale
Work on the existing archive was scheduled to be completed in December, after which Magnum plans to add more images from member photographers' archives. New work will be keyworded as soon as it comes in. Previously, Magnum had five or six staff in its four offices (Paris, London, New York and Tokyo) responsible for metadata entry - and 200,000 images in its archive with no caption or keyword information.

The pilots showed that between four and eight keyworders per image are optimal - any more results in duplication. Most of the images come back with roughly the same number of keywords as they went out with (the average is eight), but they are more accurate and relevant. This stems from another innovation: the new keywords are linked to a semantic database. The meaning of each word is recognised by both the keyworder and the archive management system.

As Young explains it, "The keywords in our old system were simply text. The new keywords have semantic data attached to them. Let's say you were to type ‘Jaguar' into our system right now [before the new system is fully operational]. You would get the car, the animal, all sorts. The new system will ask you - did you mean the car, or did you mean the animal? The only way the system would know that is if we had attached that semantic data to that keyword. The keyworder will define the Jaguar, and when it comes back into the system, the system will immediately recognise what you are looking for."

The semantic database is derived from another crowd-sourced resource: Wikipedia - or, to be accurate, DBpedia. DBpedia is described as "a project aiming to extract structured information from the information created as part of the Wikipedia project". Without dropping too far down a slippery technical slope, what this means is that keywords can be linked to a data classification system, derived from Wikipedia, that connects words to each other based on meanings. What's more, this system does not belong to any one photo library, and is universally available (DBpedia data is published with a Creative Commons licence).

So, in theory, everyone can use the same names for tags, drawn from what Tagasauris calls "the world's largest collection of knowledge". Tagging photographs in this way allows users to burrow into the photo archive in the same way that surfers can follow links through Google. And the metadata trail the user follows will not be derived from a system devised by a picture librarian, but based on a crowd-sourced set of meanings.

Pros and cons
Why is all this happening? For Magnum managing director, Mark Lubell, there is an obvious bottom-line benefit. "Our images are going to be more findable, they are going to be more linked, and there is going to be a greater engagement from the audience. And all this will result in increased sales - and it's costing us a lot less to do."

But this is only part of the picture. The keywording initiative is a small, albeit essential, element in a major restructuring of the way in which Magnum relates to the evolving digital marketplace. When Henri Cartier-Bresson, Robert Capa, Chim Seymour and George Roger set up their co-operative in 1947, and for many years afterwards, photographers were able to travel the world on assignment for newspapers and illustrated magazines, bringing back work that appeared in multiple-page spreads, before being filed away for future uses in the archive.

Now much of the advertising that financed those assignments has migrated to the web, and print journalism is struggling to adapt. At the same time, the way viewers and users of photography access images has changed dramatically. Google is a more likely first port of call than a polite telephone enquiry to an all-knowing archivist. But according to Meagan Young, "At the moment our searches from Google are about two to three percent of our traffic, and really it should be the inverse of that. Most websites will tell you that most of their traffic comes from organic searches."

Search engine optimisation is the name of the game. "Up until now, our entire archive has been behind a wall and not searchable by Google. We are in the process, with our DAM (digital asset management) vendor, of creating a static version of our entire archive, from image detail level to story level, and we'll open it up to Google."

The member photographers took some convincing, fearing such openness would leave their work vulnerable to unauthorised downloads and copyright theft. They were persuaded when shown the results of a Google search for Cartier-Bresson, their hugely influential founder member: not one of them linked back to the Magnum archive. "Making Magnum relevant on the web has a lot to do with how we structure the database and make sure that the metadata of our digital asset is rich," says Young. "You can have massive amounts of digital assets, but you have to be able to find them. And it's not just finding them, it's finding them in a rich way. It will lead to the possibility of creating interesting database narratives and looking at our archive in different ways. All digital archives are eventually going to have to do this."

The crowd-sourced keywording initiative is part of this shift, which Lubell calls "a great leap forward. The real revelation is that, to survive, you have to move to a semantic database," he says. "Imagery will have much richer data attached to it, better search results will result in better sales, and people will be more engaged with this content because they will find the images in the right context."

It won't be long before Magnum can begin to assess whether these changes will give it a web presence that can continue to support the production of new work - and whether there truly is wisdom in crowds.

Visit www.magnumphotos.com and www.tagasauris.com.

 

  • Comment
  • Print
  • RSS
  • LinkedIn

Comments

SEO Exclusives

Does Magnuim Photo intend to only allow Google to have exclusive access their collection?
Shouldn't they also allow Bing (the #2 search engine) and others the same opportunity?

Posted by: Michael Rose on 06 Jul 2011 at 00:18

Updating your subscription status Loading