"Generative AI" or "generative artificial intelligence" is a term for the "large language model" (LLM) style of predictive content generation. They started out as predictive text but are now used for images, audio, video, etc. You can read more about it on the
Wikipedia page for genAI, but here's my best, simple explanation based on having followed LLM news as a non-expert since around 2020:
The explainers I've read used text as the basis for their explanation, so I'll use that as I understand that best.
Generative AI/LLM systems "generate" (i.e. "guess") what would be said next based on their "training data". Training data is the text, images, video, etc. which are put into the dataset that people use to train the LLM.
Instead of having words or phrases as the way they build their guesses (e.g. the way a human person would), they work in chunks of (usually four) characters at a time. This is just enough information to be useful in guessing the next thing, while not being so much that it becomes inflexible and unable to deal with new information.
At its core, generative AI works by looking at a whole bunch of stuff that already exists, and then, when given a prompt, guessing what would fit based on the stuff it already knows about. This leads to several problems:
- Any biases inherent in the training data will be replicated in the output of the LLM. Racism, sexism, ableism... it's all magnified by the data.
- The dataset is full of men but doesn't contain many images of women and none tagged as nonbinary? Now the LLM thinks most humans are men.
- The dataset is full of clothed men and scantily clad women in sexually suggestive poses? Now the LLM thinks female nudity is inherently sexual and is more likely to categorize femme people showing any skin as a sexually suggestive image.
- The dataset is mostly full of white faces, and most of the darker (color/hue) images are of animals? Now the LLM is likely to think that Black people are not human (BBC article, Nature Article, Radford, A. et al.).
- Generative AI needs a lot of data in order to work. A lot of the early LLMs used Reddit for the data, which meant what the LLM would generate was constrained by whatever moderation policies kept the worst overtly racist, sexist, queerphobic/queermisic, etc. content out of the subreddits. This let early LLMs seem kind of wholesome if you stuck to their text outputs.
- As LLM makers have needed more and more data, they've started "scraping" (grabbing content from) anywhere they can possibly get it. If you've spent much time online at all, either you're very aware that there's a lot of really terrible stuff said online, or you've been lucky enough to exist in well-moderated spaces where the truly vile shit is kept away by admins and moderators.
- LLMs don't understand context or jokes, and are likely to misunderstand the same word or phrase used in a different context (e.g. "clean" as in removing dirt or debris, "clean" as in prepare for cooking), and different words used for the same underlying thing (e.g. "terrorist" and "freedom fighter" as terms for the same group, depending on whether the speaker is sympathetic to the group's actions and/or goals). This issue is especially pronounced when LLMs are used to summarize information on the same topic from a variety of sources, mixing together different meanings or not collecting otherwise relevant information...
- People who may have agreed to something being shared in one context (e.g. posting a story online for people to read for free) have not consented to it being used in another context (e.g. building a LLM that claims to generate text mimicking someone's writing style). This is even more egregious when the work is copyright protected, and so the LLM-builders are specifically stealing other people's work in order to try and get around paying them for training data.
- Additional problems accrue at the intersections of these issues and others which are inherent to the nature of LLMs as programs which guess what an average respondent is likely to say based on the prompt.
TL;DR:The best case scenario is that generative AI takes information people freely chose to provide to it, and then makes guesses based on that information as to what someone might say in a given situation as outlined by the prompt. In practice, the information is usually stolen, the outputs are frequently incorrect and full of bigotries, and the whole thing is a massive (capitalist) project to avoid paying people for their work.