• Feite Kraay, Author |
9 min read

My career as a musician was brief and unremarkable, limited to a few years playing bass trombone in my high school band. I possess no musical talent whatsoever; it’s likely that the music teacher took pity on me because my more talented brothers were playing, and I was slotted into a role where my monotone efforts would be least disruptive. I mention this only because band class was also my first introduction to the intricacies of copyright law. One evening, I was missing a piece of sheet music at practice and asked to make a photocopy from a friend. I was told I could not do so under any circumstance, as the school had only paid for a certain number of sheets and would be in violation of the publisher’s and composer’s copyright terms if more copies were made.

Now, did you notice the title of this post? It’s one of the best-known lines written by William Shakespeare, from Romeo and Juliet. Fortunately for me, Shakespeare’s work is in the public domain—so I can’t be accused of violating copyright law. And, if you bear with me and read on, you might agree that I’ll be using that line (and others) in new and hopefully original ways—so plagiarism shouldn’t be a problem, either.

Times of woe afford no time to woo
Copyright is the basic principle that an author or creator owns the work, or intellectual property (IP), that they produce. That IP can be sold or traded but only with the explicit consent of the author. For example, because I have chosen to be employed by KPMG in Canada, I have transferred copyright and ownership of this and my other posts to the firm. Plagiarism is passing off someone else’s original work as one’s own, without attribution or modification. A handful of recent news items, as well as the ongoing (at time of writing) strikes by Hollywood writers and actors, suggests to me that there’s something going on with how generative AI is built, trained and used that could have serious legal implications in the areas of copyright protection as well as plagiarism.

One of the key issues in the writers’ strike is ownership of, and compensation for, the writers’ work. Writers—and, indeed, any content creators, including visual artists—have a legitimate concern that AI, when trained on a sufficiently large set of pre-existing data, can generate content that replicates a specific artist’s original voice or visual style. Producers, it is feared, could therefore use generative AI to create new scripts based on writers’ past work. Could this be construed as plagiarism? How do copyright protections on the pre-existing work apply in such a case? These questions haven’t been tested yet in court, but writers believe their right to fair compensation for their work hangs in the balance. And the issue goes far beyond Hollywood—any author, musician or artist who publishes even excerpts of their work online could become unwitting feed for commercial generative AI systems that could then go on to mimic their style.

The actors’ strike followed quickly on the heels of the writers’ strike and—aside from solidarity with the writers—their concerns are related but slightly different. Producers may be able to take recordings of an actor’s performances in a number of scenes and, with the help of AI, create whole new scenes based on the actor’s likeness, style and mannerisms. This may be not so much a question of perceived plagiarism as it is control of intellectual property. How much control should an actor expect to have over derivative work like this, and what sort of compensation should be expected?

They stumble that run fast
This all goes deeper than superstars possibly losing a zero on their mega-paycheques. The livelihoods of actors and other entertainment professionals as well as all writers, artists and indeed anyone who produces any kind of original content are at risk. If it were only about the potential job loss, about AI simply doing a creative professional’s job more cost-effectively, I’d be less sympathetic. As I’ve described before, all industries including high tech itself have had to grapple with the impact of technology on employment—and the creative industry should be no different in that regard. No, the real problem is ownership and control of one’s creative product and identity, as well as fair compensation for prior work done, and this is a deeper issue worthy of more scrutiny.

Remember, as I’ve written before, generative AI does not produce original content—and was not, in fact, used to produce this content. It is a large language model (LLM) designed to mimic human conversation in a natural language. To do a good job in conversation, it needs to predict the best response to any given prompt from the end user. To do that, it needs to be trained. Training generative AI means feeding it as much pre-existing content as possible from as many sources as possible. Then, the content is organized and indexed in a way that allows the system to quickly put together a plausible response to any given question it is asked. The more general-purpose the AI system, the broader its training data needs to be—until it includes pretty much everything ever published on the internet. An earlier version of one popular generative AI system at least limited this to everything through the year 2021, but even this limit is no longer in effect.

What does this have to do with copyright and intellectual property? Well, it could be argued—in fact, it already has been—that in producing its responses, generative AI is using other peoples’ original content without permission or attribution. Plus, the rules around how material is selected and combined are often so complex and obscure that it’s not clear how the system came up with the result it did, nor whether it’s even possible to attribute its output back to anyone in particular. At time of writing, a prominent American newspaper is considering suing a well-known generative AI developer on exactly this issue. The newspaper claims that it did not explicitly give consent for its reporting and editorial content, both publicly available and paid subscription, to be reused by generative AI—and that beyond consent, it deserves to be compensated for such use.

I’m not a lawyer, but I don’t think generative AI has good odds of a clear win were such a case to proceed in court. The potential lawsuit covers the same issues as the entertainment strikes—ownership, use of and compensation for copyrighted material. In my opinion, these are very legitimate concerns. I would even add the following consideration: not all books and newspapers are the same, and there doesn’t seem to be much if any control over the quality of the material fed into generative AI, which may exacerbate the real problems of inaccuracy and bias. So, if generative AI is illegitimately using other peoples’ source material, and its answers turn out to not even be reliable, why should we bother using it at all?

A madness most discreet
I’ll come back to that question—because there may be hopeful solutions. But first I want to draw your attention to some surprising findings in a recent KPMG survey on the use of generative AI in Canada. According to the survey:

  • 52 per cent of Canadian students aged 18+ admit to using generative AI to help with their schoolwork, although 60 per cent feel that doing so constitutes cheating. (There’s an interesting overlap—a certain proportion, at least 12 per cent and maybe more, use generative AI despite their misgivings.)
  • 69 per cent of students always or sometimes claimed generative AI output as being their own work.
  • Only 37 per cent of students always fact-checked the results from generative AI before using them.

I worry about statistics like this, because I would hate to think that AI might be contributing to a culture of intellectual laziness in our academic institutions. Don’t get me wrong, academic plagiarism is nothing new, but generative AI is making it far too easy to do and harder to detect. I’ve spoken to friends and relatives who are educators, and they all share the same concerns. AI is transforming education at a breakneck pace, and it will be a long time before solutions catch up to the problems.

Back to the title of this post: when Shakespeare’s Juliet says that “a rose by any other name, would smell as sweet” she is suggesting that if Romeo were a plain ordinary Smith or Jones instead of a Montague, she would love him the same—and absent all the family intrigue, their love would be much easier to consummate. What about plagiarism by any other name? Because it is getting combined with other work from many other sources, training generative AI may not exactly be lifting another author’s work word for word. And yet generative AI is using work from many sources, without attribution, and presenting it as its own. Maybe we need a new word for it—but I do believe that writers and other content creators are right to be concerned about how their work is being used. And if a student or anyone else uses the output from generative AI as their own work—some might argue that you can’t plagiarize from a machine, but it is intellectually dishonest, nonetheless. Plagiarism, by any other name, still applies.

The light through yonder window
So, what do we do? The broader the scope of the AI, the harder it is to pin down. The narrower the scope, such as the case of screenwriters or actors, the easier it is to see—and deal with—the problem. Part of the solution, especially in the broader scope, will have to be regulatory. KPMG, like many other professional service firms, has already issued clear rules governing the use of generative AI at work. Although it is OK to use generative AI as a source of ideas, its output must never be used directly in any client-deliverable work—and practitioners must continually be checking for inaccuracy and bias. Granted, it’s still an honour system because you can’t always detect AI-generated content, but it’s a step in the right direction and schools will have to follow suit.

For the general public using commercial generative AI, all I can say for now is caveat emptor—and hope that rising awareness of the problems will instill a sense of care and responsibility in how AI is used. But if that lawsuit proceeds and goes in favour of the newspaper—the implications could be drastic. If the owners of original content fed to generative AI must be identified, compensated and even given the opportunity to opt out—then the entire business model of commercial generative AI is toppled and the technology itself may need to be completely re-architected.

The more we narrow the scope, on the other hand, the more we can control how generative AI is trained and used. What if we were to focus on small-scale AI systems for specific knowledge domains? Oncologists, perhaps, could agree on a limited set of medical literature necessary to train an AI system that could help researchers and diagnosticians to identify and treat various types of cancers. Engineers could define a similar knowledge domain for particular types of construction work, and the same could hold for scientists collaborating on a research project. Closed generative AI systems are being built and tested for the legal profession, with additional considerations over management of confidential or privileged data.

If there’s general agreement on limited knowledge domains for AI systems with specific limited purposes, then the questions of copyright, attribution and compensation can more easily be managed within the community of users of those systems. Ownership of the product or deliverables from such systems would still be open issue with clients, software vendors and practitioners all laying claims that will have to be tested. Nevertheless, this is a good place for generative AI to start proving its utility and fairness—and maybe it can grow from there.

Multilingual post

This post is also available in the following languages

Stay up to date with what matters to you

Gain access to personalized content based on your interests by signing up today