Decoherency: What is it like to be an essay ?

Not content with such humdrum concerns as "what is it like to be a bat ?"*, linguists are now moving on to trying to figure out if text can understand itself. Well, that's how I'm choosing to spin it, anyway.

* The answer turns out to be "fucking awesome, but not as awesome as being a dolphin".

Recently I went on a short rant about how I don't get why a few people are claiming ChatGT isn't useful. I will likely return to this in due course, but that's not the point of this post. Rather, in the discussion I raised the question :

How far can a chatbot progress with this pure linguistic intelligence ? Will it reach some hard wall beyond which further understanding is impossible ? Or, if we give it enough information, can its understanding rival our own ?

This is actually a very old theme of mine which I first started wondering about back in high school, if you can believe it. More recent examples : in this post I explore what we mean by understanding (in another discussion, I'm gratified to find that nobody else seems to have much of a better definition than my own "knowledge of context and connections", also explored in a different but related sense here); in this post I describe my much earlier thinking which stemmed from an incredibly crude chatbot by modern standards.

In brief, I wondered about how a network composed entirely of if-then units, albeit forming a network of incredible complexity, might eventually constitute some form of intelligent (though not conscious) process. My hope was that we could feed information and have it evaluated in a gloriously unbiased way, a "truth engine" of sorts.

Again, this post need to have the reins kept pretty tight or else I'll likely go off in the wrong direction. The thing I want to concentrate on here is the response :

Apparently there has been a lot of research on your question “How far can a chatbot progress with this pure linguistic intelligence ?” It’s called distributional semantics. I should have known about this, but somehow completely missed this research.
https://arxiv.org/abs/1905.01896

Now it took me a while to find the time to read it, but I'm glad I did. But I must begin with a tremendous irony. For a very long time I thought I'd correctly learned the meaning of the word "semantics" by hearing it used, which in my circles is pretty much invariably in the phrase, "arguing over semantics". Naturally this led me to assume it meant "petty irrelevant details of language" - like minor details of word order or an obvious typo in the placement of a comma. Imagine my shock to discover it meant "the meanings of words", as though this weren't the most important aspect of any dialogue at all ! I mean, how can you even have a coherent conversation if you don't agree about what words mean ?

A bizarre and stupid phrase, I think. Even worse than "rhetorical question", but I digress.

Anyway, the irony is that this paper looks at distributional semantics, the idea that the meaning of a word can be inferred from its relation to other words - something which I clearly failed to do. Now you can probably see how this relates to chatbots, philosophy and ideas about consciousness and whatnot. It's always seemed to me that our understanding is based not merely on words themselves, but on the sensory and mental experience conjured by those words : qualia, concepts, and all manner of purely mental notions (see also this post on the general attributes of consciousness, which I think is deeply flawed by still useful).

But the paper takes my high school speculation and runs amok with it, albeit in an altogether more sophisticated form. It explores the "Distributional Hypothesis, which states that similarity in meaning results in similarity of linguistic distribution".

The basic idea seems to be to create a "semantic space". A one-dimensional example of this might be words ranging from black to grey to white, which you could populate with all the shades in between ad nauseum. By looking in large amounts of text at how frequently a word (or phrase, e.g. "very dark grey") was used in association with others, you could infer similarity. This example would be useless, though, because words have many different sorts of meaning : you could infer that black relates to white, but have no idea about yellow or ennui.

Instead the process involves describing the words in typically hundreds of different ways, which enables a very great deal of nuance. Words can be very similar in some senses but completely different in others, e.g. "dragon" and "daffodil" are both nouns, both Welsh*, both begin with d, both refer to "living" creatures, but still refer to completely different things.

* It's St David's day today, as it happens.

With hundreds of vectors, this "radically empirical" approach allows for much more nuance than this. The approach seems to be that the meaning is entirely relative : context is everything, like a thesaurus on steroids. It can show how word meaning changes both on current context and how that context changes over time, e.g. "dog" used to mean a particular breed but now means a whole group of animals; "awful" used to be mean more literally "full of awe" but now means "terrible". And they can detect how word use broadens and narrows over time, becoming used in more or less contexts from one decade to the next. "They thus infer a change in meaning when they observe a change in the context of use."

The connection between individual words can show very subtle differences in meaning indeed, e.g. "baking a potato", which is applying heat to a potato and nothing more, is actually quite a different process of "baking a cake", in that the latter is designed to transform raw ingredients into an end product.

This process can be applied to individual words, phrases of multiple words, or even to the parts of words themselves, e.g. "ist" and "er" as suffixes. Matching the senses in which words are used within phrases can reinforce the meaning of those phrases. And it's possible for such an approach to figure out which associations are not only technically correct or incorrect, but also which ones are correct but unfavourable. The author does note that this is not without limitations, such as being good at finding general meanings but not at individual ones : if I decide to spontaneously swap "dragon" and "daffodil", you could probably figure out what I was doing, but distributional semantics would be very confused because it wouldn't have "St George and the Daffodil" in its database.

Still, it's highly provocative stuff - in the sense that it provokes thought, not that it winds me up the wrong way. To what extent does language influence our thoughts ? I lean strongly towards the view that it's mainly our thought process which drive language instead of the other way around, with the real hard thinking being done at much deeper levels than anything we consciously access. But you have to wonder from all is if we don't do at least some of this sort of pure linguistic reasoning ourselves, if we think based on word properties (or at least use them as a heuristic) rather than running a sort of simulation when we want to imagine a new scenario.

For example, recently the example came up of George Washington fighting a Sasquatch. Not very sophisticated, but perhaps the brain files these terms away in categories not that dissimilar to nouns and verbs and suchlike, so that I can easily imagine Washington fighting a yeti but aren't very likely to imagine him fighting, say, custard, and have even more trouble imagining in a battle against sessions or density. What the hell would that even mean ?

Clearly sensory data is useful though, and no large language program will ever have the same understanding of these things as I do until it also has the capacity for direct experience of the world (and quite probably not even then). But at least some degree of meaning, even if of a different sort, could perhaps be said to be encoded in text itself, with sensory data providing an absolute anchor for a reasoning processes which is otherwise entirely relative.

The answer to me original question then is probably, "Quite far, but not without limit, and it will never become truly human-like". It probably does reach a hard wall, but that doesn't mean it can't provide useful output even without the same sort of understanding that we have. My dream of a truth engine will probably never be fulfilled by this method, but the radically different perspective from this purely linguistic approach seems to be powerful enough all the same.

Decoherency

Wednesday, 1 March 2023

What is it like to be an essay ?

No comments:

Post a Comment

Review : Matilda