Epic Default Productions

Metadata, Or “I already knew that!”

by HotSake on Dec.10, 2008, under Braindump

In Which HotSake Alienates Half the Audience:

I’ve been working on some personal projects lately that have got me thinking about metadata. I thought I’d lay down a brief introduction here in case anyone’s interested. Of course, the people who would geek out about metadata are the people who probably already know all about it, hence the alternate title of this post.

So, what is metadata, and why should you care? Metadata is data about data (that’s what “meta” means), and I guarantee you work with it every day. Think of a file on your computer as the data. Metadata would be things like the file’s name, its size, type, location, etc. Notice how all that information is useful, and most OSes present it quickly and easily? That’s the point of metadata. It helps you organize and understand the actual data. Did you know digital cameras can include information in each picture file including what kind of camera took the picture, the image settings used, and even GPS coordinates? Try looking up EXIF some time.

The specific application of metadata I have in mind relates to the internet. Here, you could say metadata is how the web knows itself. There’s simply too much stuff out there! The challenge today is not to generate interesting content, as it was in the early days of the web. The challenge is to find interesting content. The web is so user-generated (swiftly becoming a meaningless term) that the signal to noise ratio has dropped like a stone. It’s probably a conservative estimate to say that 90% of everything is crap. Metadata can help.

Ah, but how to collect it? Some types of metadata collection can be automated: page ranks, hit counters, etc. track how often a site is visited and allow simple comparisons. Unfortunately, they don’t tell you anything about the content of the site. Computers don’t understand what a page is about; that’s the idea behind the Semantic Web. On the other hand, humans do, and computers can piggyback on that understanding whether users are trying to help or not. Intentional piggybacking happens when people consciously create metadata. Adding tags to images on Flickr is a good example; <meta> tags in HTML are another. A computer can’t recognize objects in most photographs (yet), or understand the meaning of a web page, but people can, allowing automated systems to create relationships between data based on human understanding of that data.

Unintentional piggybacking is the unwitting creation of computer-usable metadata. Certain human behaviors can be tracked by computer, and those behaviors are grounded in an understanding of web content. This is one of the principles behind Google’s spiders. People link to web content using appropriate words because they understand the target of the link. They’re not trying to make Google’s job easier, but Google’s spiders are designed under the assumption that people understand what they link to, so link text will describe the target of the link. Thus, spiders crawl the web and make note of what words link to what pages. Everyone who writes links contributes metadata in machine-readable format, whether they know it or not. Once you’re aware of the process, you can do fun things like Google bombing, but it could be argued that you’re not subverting the process at all. You’re simply providing an alternate interpretation for some bit of data. After all, George W. Bush and “miserable failure” is a meaningful connection to quite a few people.

Flex your e-peen by telling HotSake how he butchered the description of metadata in the comments!

:,
9 comments for this entry:
  1. Oz K. Fodrotski

    First on your own post is some kind of meta…

    …data AHAHAHAHA YOU SEE WHAT I DID THERE?

  2. HotSake

    I just wanted to prevent anyone from making a stupid “first” post. At least it’s somewhat amusing if I do it? Yes, let’s say that’s true.

  3. Jonny Nero

    CHILDREN! Don’t make me turn this website around!

  4. Snifit

    I don’t want ice cream, I want pancakes D’:<

  5. eye-shuh

    Very informative! :p

    In one of the CMU courses at the UW they all create blogs and Google bomb connecting a completely random website the prof picks with random keywords. I’ve always wanted to take it but I think it’s majors only. :\

    AND LOOK. AN ACTUAL READER COMMENT. Therefore, I call FIRST.

  6. Broken Angel

    I refuse to comment now if I cant be first.

    …waaaait….

  7. x3nocide

    Already know about this, but yes, it’s super cool.

Leave a Reply