David, Some great points there. I especially like this one: >>it is *ALL* made up.<< That helps me to dimly understand that everything the chat says is simply plausible, no more than that. Maybe we should think of it as no more authoritative than the cocktail party chatter of someone who reads indiscriminately and can't shut up until they've spewed five paragraphs. Marcia On Friday, March 10, 2023 at 12:34:46 PM HST, R. David Murray via Hidden-discuss <hidden-discuss at lists.hidden-tech.net> wrote: >From what I understand (admittedly from only a *basic* understanding of machine learning), it is not so much that ChatGPT is "making errors", but rather that it is "making stuff up", and does not admit that it is making stuff up. I'm going to brain dump what I think here, but I'm not an expert in this by any stretch, so don't take me as an authority. Perhaps this can help you reason about ChartGPT until you find a better expert to consult ;) One thing to understand is that this is a *trained* model. That means that it was given a set of questions and answers and told "these are good, these are bad", probably with a rating of *how* good or bad. Then it was given a lot of other data (and how exactly this gets turned into questions and answers is *way* beyond my knowledge level). Then a team of model trainers started asking questions. The trainers would look at the answers it came up with and rate them, thus adding to the "trained" data set. When you tell ChatGPT that its answer was good or bad, you are also potentially adding to that training data, by the way. I'm guessing that the way the system works there is actually no way for it to "know" that it has made something up. The output that it produces is generated based on what you can think of as a very advanced version of statistical language modelling: given a certain input, what are the most likely kinds of things that would follow as a response? And like any statistical model, when you get enough standard deviations out, things get weird. At no point in the model output are things tagged as "made up" or "not made up": it is *ALL* made up. In the middle of the bell curve the made up things are *much* more likely to be "correct" than out at the edges of the bell curve. But oh those edges... It is of course more sophisticated than a statistical model, but the same principle applies: if there are few examples of *exactly* the kind of data your input contains, then it is going to draw from stuff that is a lot less closely related to your input for its response. But, and here is the important part, it is going to make up *something* to answer with. If a source is mentioned multiple times in the context of your input, it will use it. If there are no sources mentioned in the context of your input, it will generate an output that looks like the *kind of thing* that would be a response to that *kind of input*. In this case that included a list of articles. It generated at least one of them from an author whose name was probably mentioned in the context of your input, but never with an actual article name attached. Or maybe that author was mentioned in the context of conversations containing a subset of the *words* in your input (rather than logically formed sentences), depending on just how fuzzy the match was. Then it effectively made up a plausible sounding article name to go with the author name, because that's what responses to other similar questions in its training data looked like (not similar in content, but similar in *form*). So while I agree that making up all the sources seems like an extreme example of this, ChatGPT is what Science Fiction calls an "Artificial Stupid" (something that can't actually *reason*), and thus I think my explanation is plausible. It just depends on how fuzzy the match was that it made on the input. If the match was very fuzzy, then it would have come back with material from its data that generally followed at least some of your input, and then since responses the trainers considered "good" to questions like that usually included some sources, it made some up based on how the answers to other, less related, questions looked. Anyone want to bet that four sources was the average number that was accepted as "a good answer" by the people who did the training? I know I've seen "four things" in a couple of ChatGPT answers, and I haven't asked it very many questions :) Given all this, there are only two things you can do, one of which is exactly what you did: ask it for the sources. Given *that* input, it should be able to come up with the most likely response being the actual source. If it can't, then it has probably made up the source (note: I have not tested this technique myself, but it follows logically from how I think the system works). The second thing you can do (which you probably also already did) is to rephrase your input, giving it different amounts and kinds of context, and see how the output changes. If your altered input results in a less fuzzy match, you will get better answers. The big takeaway, which you clearly already know, is to never trust anything ChatGPT produces. Use it as a rough draft, but verify all the facts. My fear is that there are going to be a lot of people who aren't as diligent, and we'll end up with a lot of made up information out on the web adding to all of the maliciously bad information that is already out there. I have read that the ChatGPT researchers are worried about how to avoid using ChatGPT's output as input to a later ChatGPT model, and I have no idea how they are going to achieve that! And keep in mind that that maliciously bad information *is part of ChatGPT's data set*. Some of it the people who did the training will have caught, but I'm willing to bet they missed a lot of it because *they* didn't know it was bad, or it never came up during training. --David On Fri, 10 Mar 2023 03:14:21 +0000, Marcia Yudkin via Hidden-discuss <hidden-discuss at lists.hidden-tech.net> wrote: > Yes, I know that people have been pointing out "ridiculous factual errors" from ChatGPT. However, to make up sources that sound completely plausible but are fake seems like it belongs in a whole other category. > > > > > > > On Thursday, March 9, 2023 at 04:10:43 PM HST, Alan Frank <alan at 8wheels.org> wrote: > > > > > > ChatGPT is a conversation engine, not a search engine. It is designed > to provide plausible responses based on similarity of questions and > answers to existing material on the internet, without attempting to > correlate its responses with actual facts. Pretty much every social > media space I follow has had multiple posts from people pointing out > ridiculous factual errors from ChatGPT. > > --Alan > > > -------- Original Message -------- > Subject: [Hidden-tech] Question about ChatGPT and machine learning > Date: 2023-03-09 15:29 > From: Marcia Yudkin via Hidden-discuss > <hidden-discuss at lists.hidden-tech.net> > To: "Hidden-discuss at lists.hidden-tech.net" > <Hidden-discuss at lists.hidden-tech.net> > > This question is for anyone who understands how the machine learning in > ChatGPT works. > > I've been finding ChatGPT useful for summarizing information that is > widely dispersed around the web, such as questions like "what are the > most popular objections to X?" However, the other day for a blog post I > was writing I asked it "What are some sources on the relationship of X > to Y?" It gave me four sources of information, including the article > title, where it was published and who wrote it. > > This looked great, especially since I recognized two of the author names > as authorities on X. However, when I then did a Google search, I could > not track down any of the four articles, either by title, author or > place of publication. I tried both in Google and in Bing. Zilch! > > Could ChatGPT have totally made up these sources? If so, how does that > work? > > I am baffled about the explanation of this. One of the publications > involved was Psychology Today, so we are not talking about obscure > corners of the Internet or sites that would have disappeared recently. > > Thanks for any insights. > > Marcia Yudkin > Introvert UpThink > Introvert UpThink | Marcia Yudkin | Substack > > > > > > Introvert UpThink | Marcia Yudkin | Substack > Marcia Yudkin > Exploring how introverts are misunderstood, maligned and > underappreciated in our culture - yet still thrive. Cli... > > > _______________________________________________ > Hidden-discuss mailing list - home page: http://www.hidden-tech.net > Hidden-discuss at lists.hidden-tech.net > > You are receiving this because you are on the Hidden-Tech Discussion > list. > If you would like to change your list preferences, Go to the Members > page on the Hidden Tech Web site. > http://www.hidden-tech.net/members > _______________________________________________ > Hidden-discuss mailing list - home page: http://www.hidden-tech.net > Hidden-discuss at lists.hidden-tech.net > > You are receiving this because you are on the Hidden-Tech Discussion list. > If you would like to change your list preferences, Go to the Members > page on the Hidden Tech Web site. > http://www.hidden-tech.net/members _______________________________________________ Hidden-discuss mailing list - home page: http://www.hidden-tech.net Hidden-discuss at lists.hidden-tech.net You are receiving this because you are on the Hidden-Tech Discussion list. If you would like to change your list preferences, Go to the Members page on the Hidden Tech Web site. http://www.hidden-tech.net/members