Here’s a quick text on the face value of art that I put together for our Indiegogo campaign. What do you think? Is the analogy with the face value of fiat currencies sound, or does it need work?
What’s the relation between creative works and money? A bank note is generally valuable. Not because the paper it’s printed on is valuable (it’s not), nor because it’s unique. It’s valuable because it is backed by a government that guarantees your ability to pay taxes and other debts with its face value. If there’s some level of trust in the government and its central bank, people will generally that the value of a bank note or coin is its face value. Internationally, the reputation of a government and its people reflects the ability of a government to borrow money. When there’s a lack of trust in either direction, things tend to go from bad to worse.
Similarly, a work of art doesn’t have an intrinsic value in the digital world: the bits it’s composed of are not expensive, and every work can be easily replicated at the wink of an eye for practically no cost. Yet, a work of art has a face value, a value that’s separate from it’s physical form. It has a value to its’ creator because the work builds the creators reputation. It can lead to more people discovering the creators art, commissions, donations, grants, and other advantages.
That’s why attribution is important: it adds to the face value of a work of art and contributes to the global artistic reputation system. Thank you for your contribution to our activities encouraging attribution!
One of the advantages of being able to uniquely identify a work and its creator is that it make a number of mechanisms available that can be used to further support individual creators. One of those mechanisms is micropayments and other ways of also giving monetary rewards to the creators you like and whose work you use. Having an agreed upon standard for metadata formats and unique identifiers would make it significantly easier to use those formats to also convey information about payment options, as well as enable the tools that we use to create a virtual tipping jar which we can flip a coin into when we use digital works.
Together with @gnugirl, I’ve done some work at figuring out the age-old question: how large a percentage of images used online are used without giving proper credit to its creator?
From a research point of view, this question is both interesting and difficult. One of the difficulties is creating a list of images and their web pages which is representing a uniform (or at least near-uniform) sample of the web. Baykan et. al. (2009) highlight that “no approach has been shown to sample the web pages in an unbiased way.”
In our initial work, we’ve used a modification of the technique of Bharat-Broder in which they concatenate random words from a lexicon, use this against a search engine and then selecting at random URL from the search set returned.
There are obvious risks for bias in this, including but not limited to:
- Query bias towards large, content rich pages
- Search engine bias
- Ranking bias, depending on the search engine used
Bar-Yossef and Gurevich (2006) have built upon this and introduced methods that can be used together with biased samples generated with Bharat-Broder and stochastic simulation techniques to produce near-uniform samples.
We further complicate the situation by introducing researcher bias since each image needs to be evaluated in its context by a human to determine if the image has been correctly credited or not. There are no standard ways of crediting an image; this depends on the implementation and style of the user, which makes automatic checking difficult without further introducing additional bias.
Our first set is based completely on blogs from Blogger, which is not a representative sample of the web at large but provide indications and a test-ground for our further work. In order to generate the set, we used an English lexicon (introducing bias towards English blogs) from which we randomly picked 2-5 words which was then searched for in Google’ image search. From the result set returned, a random image and context was selected from the first ten results.
Each set of context and image was studied individually, with the researcher locating the image within the context and looking at the surrounding information to identify credits. In addition, we did a reverse image search on the image (again, using Google images) to ascertain if it could be deemed obvious that the image was retrieved from another source and not an original work of the blog owner.
We excluded from the results any work which was determined to be the original work of the blog owner (beyond the scope of our research) as well as results where we could not find any indication of the image indicated actually being used on the page returned (likely due to dynamically generated content that changed from the time when Google indexed the page).
The results were then categorised into three different categories:
- Credit is given
- No credit is given
- Credit is given, but based on a reverse image search, it’s obviously incorrect or falsified.
In our initial sample, which include a small set of pages, the distribution is as follows:
- 33% - Credit is given
- 65% - No credit is given
- 2% - Credit is given, but based on a reverse image search, it’s obviously incorrect or falsified.
The next step will obviously be to further reduce the bias in our sample set, increase the size of the sample set which unsurprisingly affect bias (Brajnik et.al., 2007), as well as to run this analysis on the web at large.
Bar-Yossef, Z. & Gurevich, M. 2006. Random sampling from a search engine’s index. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland). ACM Press, New York, NY, 367-376.
Baykan, E., Henzinger, M., Keller, S.F., De Casteleberg, S., & Kinzler, W. (2009) A Comparison of Techniques for Sampling Web Pages, 26th International Symposium on Theoretical Aspects of Computer Science STACS 2009 (2009) 13-30
Bharat, K., & Broder, A. (1998) A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the 7th International Conference on World Wide Web (Brisbane, Australia). Elsevier Press, 379-388.
Brajnik, G., Mulas, A., & Pitton, C. (2007) Effects of sampling methods on web accessibility evaluations. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility. ACM Press, New York, NY. 59-66.
Thank you for this most interesting question! Aside from the obvious remarks that one could make in jest, Jonathan Corum has dug deep into this question. I propose that you check it out!
We all have names or aliases that help identify us in our surroundings. Most of the time, a name or alias is not unique, and it doesn’t matter if it’s unique or not. In some cases though, you might want an easy way to identify you even if you change affiliation, or even name. ORCID is an example of identifiers used in the research community; a researcher registers with ORCID and is assigned a unique identifier which is then included in research publications. Anyone can then look at that identifier and look up the person in ORCID’s registry.
People are not the only ones needing identifiers: our computers have unique identifiers to help identify them on a network, our cars have vehicle identification numbers, books have ISBNs, and so on. Creating unique identifiers to identify assets, people and organisations are a key component when it comes to metadata for digital works; unique identifiers provide a way to identify a given work and its creator. An identifier which can then be used to look up information about the work in a registry, for example.
The way it works today is roughly that:
- You register your work in a registry (or apply to your national library or similar institution to get a ISBN in the case of ISBNs)
- You receive a unique identifier
- You put that identifier on your work, labeling it as a particular identifier (Ie., “the ISBN of this book is X”).
- People use that identifier in catalogues, databases, web sites, etc.
The identifiers received from a registry are guaranteed to be unique within that registry. That’s one of the reasons you can’t invent identifiers at random: they wouldn’t be guaranteed to be unique.
But how strict do we need to be? Is it enough if there’s only a 0,5% chance of someone picking the same identifier? What about 0,005%? If there was a way to generate a unique identifier without communicating with any other device, everyone could generate as many as they needed for their work or themselves. And only when they wanted to would they have to register this in a registry.
UUID is one a way of generating a practically unique identifier. It’s a 128-bit self-generated identifier which, given if there were about 70 trillion such identifiers generated world wide, the probability of a collission would be 0.00000004%. A UUID could be generated without need for communication with any other device or service, meaning that a UUID could be generated in a camera, a phone, or any other recording device at the time of recording.
If we agree that a UUID is unique enough that it’s unlikely that two people will randomly generate the same identifier, we could simplify the process of generating identifiers significantly:
- You generate a UUID as identifier and put this (in some cases automatically) into the work
- Optionally, if you want your identifier to be trusted, register it in a registry.
What are your thoughts about using UUIDs as unique identifiers?
I’m interested in researching how often images are correctly credited when used online. I have some ideas of how to go about researching this using random samples of images from various social media platforms, as well as from the internet generally. What I hope to gain from this research is a hint at how often images are used without attribution, how often images are used with obvious fraudulent attribution, and of course, how often images are used with correct attribution.
If you’re interested in this type of research and would like to contribute your thoughts, please get in touch!
I recently found myself in a situation where I needed to write a small web application. Nothing terribly complicated: something that would allow you to login to the application and send changes to or retrieve lists from a remote server. Using a full-fledged web development framework seemed excessive, yet hand-coding everything also seemed excessive.
Here’s an image of yours truly, taken by Creative Commons’ own David Kindler. It’s an example of a watermark in an image, this time to announce that the picture is from the Global Summit 2011, but which could equally be used to say that this picture was taken by David Kindler, as a way of ensuring that the picture was correctly attributed when it was re-used by me in this post.
CC Global Summit-Jonas Oberg.jpg | Flickr - Photo Sharing!DTKindler Photohttp://creativecommons.org/licenses/by/2.0/deed.en / CC BY 2.0
The problem with this kind of watermarking is partly that it takes away something from the image: it’s an invasive procedure that modifies the content of the image. Even if we were to resize the canvas of the picture so that the addition of the watermark fit outside of the actual picture, or perhaps to figure out a way to do minimalistic changes of individual pixels to somehow engrain the information without the picture, it only partially work.
If I were to rotate the picture 90 degrees for publishing, the attribution ends up on the side. Since this is licensed under a Creative Commons Attribution license, I might also chose to take just part of the image, just ignoring the watermark all together. And if I resize the picture, the attribution might get so small that it just can’t be read.
It’s a solution, but it’s not an ideal one. That’s why we’re working with the concept of metadata: information which is recorded as part of the image, but not in the visual representation of it. EXIF is the most common form of metadata for images, it allows a camera to record information about what aperture was used when shooting the image, if the flash was used or not, sometimes where the image was taken, and a lot of other information. Data which is useful to have, but which should not be part of the image itself.
Adding information about licensing and creator to this metadata would allow us to create tools that read this metadata (or a link to the metadata) and understand what to do with it, such as automatically crediting the creator when we make use of the image, or, if we try to use the image in a way that the author doesn’t want, we could get a helpful hint from our software letting us know that we might want to look into our use of the image. It shouldn’t prevent us, but it should give us notice.
One of the advantages of being a Fellow of the Shuttleworth Foundation is that they encourage you to be open. To be open about your ideas, so that others can complement, criticize, add to, change, revise, and challenge them. As a former teacher, I know that the only time when ones own knowledge advances is when faced with opposing views. When someone hears you talk about your work and then go “but what if..” or “i don’t think so..”
Suddenly, you’re faced with a cognitive conflict that needs to be resolved, and through resolving that cognitive conflict you change your understanding of the topic and advance in your thinking.
Today’s thumbs up go to David Wiley (@opencontent) whose innocent remark some days ago, “Stupid question, but..” sent my mind on a tour from which it returned today with a fresh bunch of new ideas that will give material for blog posts to come.
Over on Twitter, patlockley asks the critical question, how is the work of Commons Machinery different from Open Attribute? The difference is in the ease of use, and in the distribution. With Open Attribute, it’s easy to manually copy attribution from a web page to your document, but when you switch around images, you need to separately remember to switch the attribution texts, and if someone copies just the image file from you, all information about attribution is lost.
That’s what we aim to work with: by making the attribution and license information (“metadata”) an integral part of the image files (“embedded”), the information will be persistent and follow the image wherever it’s copied.