Insights to Inspire / AI
Chain of Thoughts – Ep. 1: And here are the LLMs
Maria Victoria De Santiago, Damian Calderon and José Ignacio Orlando
José Ignacio Orlando (Nacho): Well, welcome to today’s episode of Chain of Thoughts. This is going to be a pilot, let’s say. So let’s see how it goes. Well, you have already heard the intro. You know that the idea of this podcast is to connect our thoughts, connect the thoughts that we have around […]
Chain of Thoughts – Ep. 1: And here are the LLMs
José Ignacio Orlando (Nacho): Well, welcome to today’s episode of Chain of Thoughts. This is going to be a pilot, let’s say. So let’s see how it goes. Well, you have already heard the intro. You know that the idea of this podcast is to connect our thoughts, connect the thoughts that we have around AI and the thoughts that we have around Product, and think a little bit together about how AI is impacting this world. So it’s not going to be just me on this podcast. It’s going to be the three of us. So we will introduce ourselves. I will start if you’re okay with that. I’m Nacho Orlando, Jose Ignacio Orlando, but everyone calls me Nacho. It’s like the chips. I have to make that joke always. I’m a software engineer. I’m also a Ph.D. in applied mathematics. I am the director of the AI Labs at Arionkoder. I’m also a professor at the university in Argentina. I am a researcher in AI as well, and I’m representing the nerd team, the AI team, in this podcast. And with me we have Vico and Calde. Both of them will be part of this podcast as well. So you are not talking yet. So please introduce yourselves. Vico?
Maria Victoria de Santiago (Vico): Sure. Thank you Nacho. So everyone calls me Vico. I actually come from a business background going to consulting for some years and facilitating cross-disciplinary teams. My different lives have taken me from international affairs, small investment consulting, business consulting into anthropology. That’s actually going to be hopefully my contribution to this podcast too, and seeing how as human beings,
communities and society collaborating and building images of the future, building innovation together. So that’s a little bit of me looking forward to what we will be able to discuss into this episode.
Nacho: What about you, Calde? Tell us who you are.
Damian Calderon (Calde): Hi, I’m Calde. I have a background in design and I’ve been working as a product manager for more than ten years and I have a particular interest lately on how UX and user experience in general will change with all these series of changes, right? That LLMs and AI in general are bringing to the field of technology.
Nacho: That’s awesome. So it’s going to be the three of us in most of the episodes, but
the plan is to have also some interviews. So it’s going to be amazing and it’s just a matter of sharing and well, all the things that you always say when you are recording a podcast. So before starting with the actual topic of this edition, before moving forward, I’d like to know, guys, what you’re expecting from this podcast. So what are your thoughts? What are the thoughts that you have that will be chained in Chain of Thoughts?
Calde: Well, I think it’s really interesting to see how LLMs currently and also all of the things that are happening with AI are kind of reshaping the way we see technology and the way we work with technology. So I really think that it would be like thinking together. It would be more like putting our ideas together, contrasting them and getting to new ideas from that. And I would love to see that happen, right? That we create new ideas here while recording the podcast.
Nacho: That’s great. What are your thoughts, Vico?
Vico: Well, kind of really on the same line together with building shared languages. And it’s funny because today we’re talking about language models. It’s also exploring the boundaries
between what is and what is not that would help us actually entertain much more significant conversations. It is said that we understand these things through the boundary of what it is and what it’s not for the contrasting views to help. I think that will help us all to explore
and actually navigate together different possibilities for the future. So it’s more like building
a shared understanding that with help actually for new opportunities. So maybe it’s not only for us, but hopefully for those who are listening to us.
Nacho: Yeah, that’s great. My thought is that this could be an excellent place for all of us that we are recording, all the people that is recording this episode to learn from each other. That’s something that I love to do. And I think that that is going to be great, not just for us, but also for the people who are listening to us. I think that by sharing the things that we know, the things that we are learning as well, and the things that we don’t know, we will somehow help our listeners to learn more. And I think that’s that’s an important part. At the same time, I really believe that contrasting our ideas as Calde was saying is an important part. We are in front of a technology that is reshaping the world mostly. And I think that
sharing our vision on how the world will evolve thanks to this technology and how we are expecting the world to evolve is also interesting from this perspective. So yeah, I’m really looking forward for this to be fruitful for our listeners. I’m quite certain that this will be fruitful for us as well. So yeah, let’s make it happen!
Vico: So today we’re going to be discussing about LLMs and go from the basic what LLMs are to how they work internally. Also going to the basics around training fundamentals, a little bit about fine-tuning. But let’s discuss also a little bit about ownership of LLMs. We know there’s something that we should dive in, and about learning curve. I know many of you that are out there thinking about Product. This is something you would like to understand better together with some applications of LLMs and hey we couldn’t be a product team in this podcast without talking about how to balance LLMs and product design,how to include them in our practice, how do we build shared languages to collaborate even with machine learning engineers? And this wouldn’t be a podcast with Nacho involved if we didn’t talk about ethics and LLMs of course, lastly, future trends is what we’re going to be talking about. How will LLMs evolve in the coming years. So make sure to stay tuned in all the way to the end to hear about this. And now let’s start from the very top. And Nacho, please, do help us. What are indeed LLMs in the end?
Nacho: All right, so when I have to teach and based on a name, what I usually prefer to do
is to decompose the name into its pieces. So if we call about an LLM, we talk about a large language model, large language model. So let’s decompose it in three parts. First, we have the model. All right, so we are talking about the machine learning model. We are talking about an algorithm that is learned from data. Right? We can dive deeper if you want,
but let’s just think about it as a black box that was learned from a database. The second thing is that is a language model. This means that it’s a model that is able to recognize,
translate, predict, generate language, alright? And for doing that, it has learned from data
the properties of the language, let’s say, the patterns that are hidden and in the language,
the probability distribution, if we talk in mathematical terms. And the third element of the definition is that it’s large, and by large we mean that it’s a model that has billions of parameters. And the fact that is big and large is the important concept here. When we talk about an LLM, we are talking about a model that has so many parameters that it’s able to model those complexities of the language, all those properties, all those intrinsic patterns that are hidden in the language that we as human beings are able to dominate, but that
for a machine to dominate gets more complicated.
So in a nutshell, we are talking about a machine learning model with billions of parameters that were used to model the language. If we want to go deeper in terms of which kind of model we are talking about, we are referring to a deep learning model, which is a specific type of machine learning model that is based on deep neural networks. And yeah, and I think it’s also important to understand that these large language models are in general generative models which comes in, in opposition to what a discriminative model is. So if you think about it, the discriminative model is a model that can make a decision out of data. So if you feed it with a sample, it’s able to classify it or to make a decision out of that. For example, I don’t know, you provide a picture of an animal and the model is able to recognize whether it’s a cat or a dog, for example, that’s a discriminative model. A generative model is different. A generative model is able to generate data. In this case, a large language model is in general able to generate text. All right. So that’s the definition. I would say that’s a large language model, a machine learning model, a deep learning model that is generative and that is able to model language.
Calde: And you say it’s like a black box, but how do they work internally?
Nacho: Well, it really depends on the type of large language models that we have in front of us. But I would say that most of the modern architectures of deep neural networks, well, first of all, the machine learning model, the deep learning model is based on a neural network. The neural network has a specific architecture. By architecture, we mean the way in which the different layers of these neural networks are organized. And in the particular case of the large language models that are so popular right now, like GPT or Llama or Mistral models, these are based on the transformer architecture, which has nothing to do with the car that can turn into a robot.
When we talk about the transformer, we are talking about, as I said before, a deep neural network architecture that was published in 2017. So a few years ago by Google, it was a paper by Google. And the name of the paper is quite funny, it’s “Attention is all you need”, because they realize that by designing this neural network in a specific way, they were able to bypass some of the problems that the most popular models by that time which were the LSTMs weren’t able to bypass. So this architecture has a specific thing which is a component that is able to model the relationship between one word with every other word on the input and by learning that you learn well, first of all, you have more context, right? Because now you… when the model is analyzing a sentence, it’s not analyzing just a few words altogether, but the entire input at the same time. And as a consequence of that, these LLMs, these architectures were able to produce LLMs that produce better outputs. So yeah, in a nutshell that the most important component that makes these elements work is this architecture, this transformer architecture.
And actually if you decompose the acronym GPT it stands for general purpose transformer. So the T of GPT is actually a transformer. And another important factor around how these LLMs work internally is this massive parameterization and the fact that it has millions of parameters that are organized in these layers. So it’s not just a matter of the amount of parameters that these models have, but the way in which they are organized. So, for example, when it comes to an LLM, there’s always an embedding layer.
This embedding layer is a layer that takes text as an input and produces a vector representation of that input. And in general, with respect to representation that contains the semantics of the text that you are introducing, which is amazing. I mean, there are some embedding models in which you can do arithmetics directly on words. For example, if you have the word king and you sum the word female, I mean, the representations, the embeddings for those words, then you get a new embedding that translates to the word queen. For instance, if you sum female and king, you get queen. And that’s amazing. And that was like the first evidence that we had about capturing the real semantics of the words within a machine learning model. And then there are other layers involved, other neural layers involved in these models. We are talking about feedforward layers. There are around that were around for the last 80 years, I believe, recurrent layers which are the ones that allow the model to model sequences. And of course this attention mechanism that we talked about that is able to model the relationship between the different words in the input.
Vico: That’s very interesting. And why don’t we talk then a little bit into basics, around training this, and what are the fundamentals for those more or less familiar with AI. And explain basically how basic LLM trainings. Why are there so revolutionary right now? And what is so special about it?
Nacho: Yeah, I think that the special thing is a trick, you know. But before talking about the trick for training these models, let’s first define what training a machine learning model is because I guess that some of our listeners will come from the product area and would like to know a little bit more about this. So when we train a machine learning model, we need basically two-slash-three elements.
First, we need the training data. When we talk about training data, we are talking about samples that model a task, the task that we want the machine learning model to solve. Then we need a cost function and objective. And by cost function, we mean a mathematical function that allows us to measure the degree of error that the model is committing every time that we fit with a training sample. And this is helpful for the model to improve itself and to learn how to model this task. And the third component is the optimization process. This optimization process is an iterative process that lets the model to give it a try on each training sample, produce an output and then pay a cost based on the subjective function. So every single machine learning model is always trained in the same way, from the logistic regression algorithms that were invented in the sixties and that were used by statisticians in all the world, to these complex transformers that are in the background or inside the LLMs, but in the specific case of the LLMs, I think that the the magic is behind the trick that is used for training.
So most of machine learning models are usually trained in a supervised learning way. This means that in our training data we have the samples and the output that we are expecting to get from that input sample. But… and this is fine, but you can guess that the main problem with that is that collecting of a dataset full of samples with a label, a target label involves a human labeling the samples and that makes the process quite expensive. So the alternative is what is called unsupervised learning, in which you just take the inputs and try to learn something out of that.
You know, that’s something that for those that come from a machine learning background, that’s how clustering algorithms are trained. Let’s say they are not trained, but you got me. And the trick here is what is called self-supervised learning. So self-supervised learning is an active field of research. We are talking about models that are able to learn by themselves based on a pretext task. So for example, we as humans… So let me ask you a question. How do you believe that we as humans learn to walk? It’s a tricky one.
Calde: Yeah, but I think a lot of observing and experiencing the real world, right? It’s a combination of observing and experience.
Nacho: Exactly. Yeah. It’s very similar to that. So of course, there’s a big difference because when we are children and we are learning how to walk, we fall and we have some negative reinforcement. Every time that we hit ourselves, right? In these cases it’s a little bit different. But a self-supervised learning model is able to learn automatically from a pretext task. That’s the funny thing around these models: they are trained in a way that is not… they are trained for doing a task that is not exactly the target task that we have, but that is helpful to get what we want so that in this case, the trick is a task that is called masked language modeling, which is essentially taking a piece of text, removing the words and asking the model to find what was the missing word.
And this is, if you think about it, that’s something that you can do with all the text that we can find on the Internet at zero cost, mostly because you just need to take as many texts as possible, remove random words from that, and then iteratively ask the model to find a way to find the word that is missing. And there are some other tricks like, there is another option that is called next-sentence prediction, in which you basically take a sentence, then extract a random sentence from somewhere else and you ask the model to answer the question: is this the next sentence of the text or not? And then based on that, the model is able to learn by itself this relationship.
Calde: From what you’re telling me, actually, and thinking of learning a new language for humans also, right? It’s similar to the exercise that we have in books when we learn English or Spanish. Similar to that. You just have to complete a sentence or understand what it would be like the next sentence, or how to answer in a proper way to a question, right?
Nacho: Yeah.
Vico: The fill-in-the-blanks exercise.
Nacho:
Yeah, exactly. It’s just filling the blanks and what really amazes me the most is the fact that this is not very complex. If you think about it, the idea is super straightforward. It was just a matter of plugging that on a very big cluster, you know, with lots of GPUs and being able to train something like that, that humongous model. But yeah, the idea is very, very, very simple.
And so yeah, the way in which these LLMs are trained is that they, they are, they have billions of parameters because they need to solve this complicated problem of figuring out what’s the missing word. And they are using these parameters to somehow learn the complexity of the language and all the hidden patterns. And they are training. And now it’s a regressive way, which means that they produce an output. So they are fed with a sentence with the last word missing. They predict the missing word. And the next step is to take this sentence, this new sentence with the missing word as input and try to produce the next one, and so on and so on and so on.
And that’s the generative component of the model because now you get something that can be fed with its own answer and try to prolong the sentence until reaching a point. And you know what that point is, another word. But it’s a word that is dedicated explicitly to finish the sentence or to finish the text. It’s just a token. It’s just something that lets the model, that lets the user of the model, stop the generation. So, yeah, that’s how these models are trained. Super easy.
Vico: I was just thinking of that, you said it’s like a really straightforward approach, basically fitting in like super powerful computing power and a huge dataset. How huge are we talking also? Because we’re seeing this is simple, but how big of a dataset have these models been trained with?
Nacho: Well, the best models that we have right now, and this is something that comes more around who owns the LLMs and that debate. But the big models that we have available right now are trained on big datasets, and that’s the only information that we have, that they are trained on big datasets. We don’t have even an order of magnitude. We don’t know the details. What I can tell is that depending on the amount of parameters of the model and the size of the dataset that we need for training, because otherwise we can enter in a regime that is called overfitting, which is when a model memorizes, so to say, we are not talking about models that have memory abilities, but you get my point, models that are able to solve the task perfectly on the training data. But when they are evaluated on new unseen samples, they fail. They suck on that new dataset. So in order to avoid the problem of overfitting, you need to train the models in humongous amounts of data so that they know all the, so that they know how to generalize to any other context.
Calde: So let’s say you want to use one LLM model for a particular field for a custom application, right? Do you have to create a new large language model? How is that happening?
Nacho: Well, it really depends on the target application. An LLM is what is called the Foundation model. A foundation model is the model to rule them all in Lord of the Rings terms. It’s a model that has such a huge amount of parameters and was trained with such a big dataset that it already learned some important patterns that can apply to many other problems. That means that you don’t need to train a monster like this every time that you want to swap from one application to another, but just doing a fine-tuning and sometimes not even that, but anyway, on a smaller dataset.
So by fine-tuning, we mean instead of retraining the whole thing, which is computationally expensive, that requires a big dataset, and these things, we are talking about just using a small representative dataset of the task in hand and retrain only a few layers of the models. Again, you can say, alright, if you needed, I don’t know, 1000 GPUs for training these models. Then how can you do the fine-tuning if you don’t have 1000 GPUs? Well, there are some tricks that the scientific community has introduced, some tricks. There is one that is being quite popular right now that is called LORA, which is a female parrot in Spanish. But it’s it stands for low rank adaptation of a large language model, which was published by Microsoft. I think it was two years ago or one year ago, in which you basically end up freezing some parameters of the model. So this means that they remain untouched, that you are not doing anything and you just add a new set of only a few training parameters that are the ones that need to be adapted.
So in the essence of this algorithm is just, finding a mathematical trick to instead of feeding the entire monster into 1000 GPUs, just feeding a tiny bit and optimizing that small part that needs to be corrected. And there are other techniques like adapters and things like that that were also invented to allow fine-tuning these LLMs. But in practice, sometimes you don’t even need to do that. Sometimes you just need to prompt the model in an efficient way. And by prompting we mean instructing the model or using an input that gives instructions about the task that you want to solve or just fine-tuning for a while on a smaller data set and see what you can get.
The most typical example is when you want to take one of these foundation models to a very niche application. For example, I don’t know, Medicine, you want to have a large language model that knows about different conditions and I don’t know, which medicines or drugs can be administered to a person who suffers from a specific disease. So of course, maybe the training set in the original training set of the foundation model already had some information about that, but it’s going to be lost in between information about cats and dogs and politics and the Constitution of the US and things like that, right? So how to make it focus on that specific part? Well, you can download as many medical papers as you can collect from the Internet and then train the model again, in this masked auto-encoding way and then move forward.
Calde: Great.
Vico: Okay, sorry. We’re thinking maybe on the next point, there are two big legs we could take from the problems discussed up until now. One is we talk about the computational demand of training these models and you already gave a hint of this topic about ownership of the LLMs at the beginning of the conversation. We’re talking about requirements, training, and who are the owners of these. Then, in the end, and that’s the first door that we have there, and then a few takes into you left a hint there, about talking about prompt engineering, but let’s do a pause before going into more to the learning curve.
And this, the ownership in the end of these LLMs and who are them? Because it seems only a few companies and we’re already aware of some of them, can actually have the privilege, let’s say it in this way, actually to basically train them because of these requirements.
Nacho: Yes. Yes. We talked about the two main needs that we have when we want to train a large language model, which are the training set, the big training set, a.k.a Internet and the computational demand of training that you need high performance computing infrastructure to train these big things with thousands of GPUs and that’s prohibitive and that’s why we are seeing that all these models are produced by big companies like Microsoft, OpenAI, Meta and so on and so forth. So Google as well. So yeah, you need to have access to those training facilities and that’s something that is prohibitive right now. And that’s why we are seeing a shift, let’s say, in the scientific production around these big models, we are seeing that the big models are released, so to say, by big companies and all these scientific institutions that are doing research in LLMs and are actually playing around with those models that were produced by these big tech companies. So that’s shifting a little bit. I believe the industry and the way in which research is being done as well.
On the other hand, there’s an effort, but an effort that I believe that we should keep pushing as part of the industry as well, which is the open source or open sourcing of these algorithms, these models. Meta, for example, or Mistral, are two of the companies that are releasing the models that they are training. They are sometimes even releasing the code that they used for training. I’ve read a few articles a few weeks ago about companies that are releasing the training datasets that they used for that. And I really believe that’s important from many perspectives. One of them is that we need to audit these algorithms. We need to understand how they work and how they were trained because we are talking about the technology and I would love to hear your opinion about this as well.
We were talking about a technology that is going to change the way in which we work, the way we inform ourselves, the way in which we know new people and more and more, so learning about the risks and the biases and all the problems that might be hidden on those algorithms, I believe it’s very, very important. If you just package that around the product and you just provide access to users, as OpenAI has done with ChatGPT then you are hitting the world with a hammer. But it’s an unknown hammer. You know, it’s something that we don’t know how how it works, how it works internally. So yeah, I believe that going back to your question, Vico, unfortunately we are seeing that big tech companies are the owners of the LLMs, we are seeing efforts to open source these algorithms and these efforts have strong, are grounded in this idea of creating safe and sound AI but the owners are those big things, those big groups. So we have to think about it. But I would like to know what do you think about that point in particular?
Vico: I have mixed feelings around this, and I’m going to be completely honest and straight on this. So we need actually as is, that there is a demand, an investment that needs to be done so therefore I understand that the structures and dynamics of our current world are making these big companies take up on this, this is also a big bet, too, right? It was at some point that it flourished. But I do agree a lot on hey, this is a fundamental shift. Not only it’s a new technology, it’s a shift in the way we engage with technology, how we interface with our daily tasks in general, how we might even, we’ve seen some few gadgets are bringing this large variety into our daily lives and how it could transform completely. Not so much in the long term, but really like mid-term, actually even our own cognitive pathways move. And where do we put the most effort? And so it’s going to have a huge impact on seeing efforts around safety and to understanding biases and allowing also to build specific models that are more aware of the context of what we’re seeing and that it would take nuances into consideration will definitely be something that I would like to see because I think the impact these technologies can have should not lay in the hands of a few, but it should be watched by all of us that are working with this new technology.
Nacho: Yeah. At the same time, there’s another factor, which is if we want to innovate with this technology, it’s important to have access to it because we cannot let these big companies be the only ones with the big good ideas. The good thing is that they are releasing somehow their models, they are providing us with access to their models through APIs. That’s the common thing.
We are seeing that there’s a new startup every second appearing that is just connecting to OpenAI’s API and doing something with that. And that’s fine. I mean, that’s a very important thing. And as you said before, they made the investment, they have bought all these machines for running the algorithms. But yeah, I think that from an innovation perspective, it’s also important to release these models because someone will come up with a great idea for solving this, for solving a new problem. Calde, I want to hear your opinion as well. Sorry I interrupted you.
Calde: Well, I think that I totally agree with you in this sense, I think it’s very important to be able to understand what the models are producing. And for innovation especially, it’s important to understand how they work internally. Right? At the same time that I look forward to having access to a lot of information, getting a lot of resources to it. So it’s at the same time, it’s a difficult thing to do. And I was thinking that it can be difficult in the sense of what these can, enabling, something that we, I don’t think we know, right? it’s not certain, it’s part of the unknowns. And I remember what happened with the team at Google and thinking of that part. So they released the paper about the problem of these things getting too big. I’m talking about the stochastic parrot.
Nacho: Yeah.
Calde: So there are a lot of implications really. I would like to talk, maybe in the ethics section, but I don’t know. We can go with that.
Nacho: Yeah. Yeah, definitely. I really believe that. Well, that paper that you were referring to was focused mostly on something that I try to push every time that I have a conversation around LLMs and how they work, which is that they are unable to reason. They are not intelligent at all. They are very good at mimicking something that resembles intelligence, but they are not intelligent by themselves. They are just producing outputs in a format that is pleasant to read and that looks nice for us. They are not smart at all. And that specific paper that you’re talking about introduces this idea of a stochastic parrot about something that is able to produce outputs that look nice, but in the end, they are just sampling from a probability distribution, and then that, of course, means that the output that you will get will look nice, but it doesn’t imply that you have reasoning contained there. And yeah, I think that we can talk more about that.
Vico: We could do a whole episode about this.
Nacho: Yeah. Yes.
Vico: We could go for hours, but for the sake of the stream, go back a little bit, and we will go back. This is cool, and if you really would like us to discuss more, do leave some of those comments to let us know, and we can dive deep for another episode.
Actually, do remember that if you’re listening to us, we’re steering a little bit back on the topics that still we have left, and we have a few much more left in that. It’s not even the half of the episode right now.
Now, the other door we had opened is discussing for AI experts across segments who may be familiar with other areas of machine learning but not LLMs. What aspects of transitioning to LLMs should we be mindful of and how they can enrich the already existing knowledge for it? Can you walk us through, Nacho?
Nacho: Yes, well, I think that the nice part of working with LLMs is that for 99% of the applications, you don’t need to know what an LLM is and how they are trained and these kinds of information. I think that knowing that adds for the extra 1% that you’re missing. But I think that all of us are able to innovate with AI right now, with LLMs at least, without knowing sometimes not even a bit about programming or something like that, you just need to be a good prompter. So I would say that the learning curve in this case is not too steep at all. It’s relatively simple. It’s just a matter of learning some use cases, scenarios, and then try to map the problem that you have in mind into that specific scenario where you want to try the algorithm.
I would say that there are like two flavors or two different ways of working with large language networks right now. One of those is using these models as meaning structures, let’s say, as semantics, just to collect semantics from pieces of text. And by that, you use an embedding model. And this is something I think that’s the most complicated, so to say, application, but it’s not difficult to understand at all. It’s just a matter of inputting a text, getting a vector, and then try to find similar representations and then make something with that. That’s, for example, the component that we have in write systems. And we can talk more about that later on.
But that’s the most difficult application. I think that the easiest one is just prompting. But by just learning how to prompt a machine learning algorithm, a large language model, then you can have a machine learning algorithm that solves the problem for you. For prompt engineering, you just need to take a course online. It’s not a big deal. It’s just a matter of learning the patterns of the prompt and how they should be structured. We can even make a podcast out of that as well. You just need to sort your ideas in a specific way. In that specific way is exactly the way in which the LLM can, you know, complete the message for you and produce the outputs that you need. So it’s not difficult at all.
Vico: I was thinking maybe for those a little bit less familiar with this, we constantly hear about prompting, prompting, prompting in so many places. But we have the prompt and the prompt engineering, and there’s a slight difference on the problem and how it engages engineering for those trying to leverage LLMs. Do you mind giving that thought some more of it?
Nacho: Yeah, sure. So of course both of them are connected by the word prompt, right? But the prompt engineering part is what is done of like, let’s say, so imagine that you want, for example, make a to large language model that is able to take a brief of a new article and write a blog post about that. All right? So, all right, you have a use case already in your mind and you want to use an LLM for that. What you need to do and you need to spend some time is engineering the instruction that this general instruction that instructs the algorithm to produce the output that you want. So, for example, you would say something like, “You are a journalist that writes articles for a blog. You are receiving short briefs about, or briefings, about a specific thing that happened, and you want to produce a text with three paragraphs and 10,000 words, extending that briefing in a pleasant way” or something like that. This thing that I’m saying is basically the instruction, the prompt that you will use for the machine learning model to produce the output. And for that you don’t need to learn any coding or have any coding skills. You just need to sit yourself with a few examples and ChatGPT, not even the plus version and try it there and see what you get. Once you try it with a few briefings and you see that you’re getting the blog posts that you want, then it’s just a matter of putting that inside a piece of code and then the input will always be the briefing, not even the prompt itself because the prompt will be hidden inside the code of your program and you will need just to copy and paste the briefing into a textbox and you will get the help.
Calde: So then you can decide which are the parameters, right? yourself. Like, for example, the length of the text or the tone of the text that you want you can make when you engineer your prompt, you can show which parts to the user or the UI is passing to the system as variables.
Nacho: Exactly. Because in the end what you have is the UI in which you will decide the tone and things like that will just produce will just fill some empty places that you had in your original prompt. So you can say write this text in (blank), and leave that blank space in this language or whatever, and then you just choose that. So one important thing to take into account is that these large language models do not know how to count. It’s amazing because they can write poems, but they are unable to count. So if you tell them “write just 2000 words”, they are unable to control themselves. They will produce words and words and words. And sometimes they will stop in 2000, but they don’t know how to count. So take that into account when it comes to prompt engineering.
Calde: Basic math, right? They can’t do the simplest things.
Nacho: Yes, they struggle to do complicated math, but sometimes they are much better to do that math than basic math. And yeah, there’s another thing with LLMs, which is that they have so, so many parameters that they sometimes dedicate parts of those to memorize some concepts. Memorizing in this context means that if you get this input, it’s like an IF clause. If you get this input, then produce this output, period. And yes, sometimes they know how to multiply two numbers because they memorize stuff. By the way, just as we did when we were in school, right? Yeah, yeah, yeah. So fair enough. But, yeah, it’s amazing how they struggle with those simple things.
Calde: Great. We talked about prompt engineering, right? And then what will be, like, embeddings?
Nacho: Well, embeddings, as I said before, they are a way to represent the semantics of a, not just a word, but also a piece of text. So they are becoming a key element in these applications that are known as RAG -retrieval augmented generators-. We are talking about systems that are basically designed around a knowledge database. So, for example, you have all the knowledge of your company as files or as a Confluence site or something like that, and you want to ask a question or you want the answer to one of your questions. The typical way to do that, and you guys can tell me more because you are product engineers, was just writing a few words in a search text space and you got, I don’t know, ten or 20 places in which these words were being used. And then you basically try to construct the answer yourself.
In a RAG system, the interaction is completely different. And that’s fascinating because you just ask a question as if you did with a human being, and what the system is doing inside is translating your question into an embedding and then look around the different sources of information for answering that question and looking which one is the one that is closest to your question in the embedding space. By that, you collect the context, you collect the context that you would use as a human to answer yourself the question, and then you use that combined with an instruction to language model to get the answer in a human-like way. And if you think about it, that’s amazing. That’s something new for us. It’s redefining the way in which we search for things, and that’s why most of the people are using right now ChatGPT, for example, for solving problems related to coding or things like that, instead of looking up the documentation of the programming language to identify exactly how to solve this part, they prefer just to put the bug in a piece of code and let the model detect by itself how to solve it. The thing that needs to be taken care of is that when you interact with something like ChatGPT, for instance, you are not using a RAG system; you’re just prompting a large language model. So a RAG system is different than the large language model. It combines both instruction to language model and the embeddings generator. But, for example, when you use Bing, Bing search, you are basically using that. It’s a huge RAG system that extracts embeddings from your questions and looks them up.
Vico: So Nacho, you were telling us about these RAG (Retrieval Augmented Generation), and I was thinking if you could walk us through some other of the most common applications for LLMs. We’ve talked about RAG, but there are definitely some other applications that I’d love to hear your thoughts about. We’re seeing many in several products being used. So I think it’s also a good way to spark ideas for those who are listening to us.
Nacho: Yeah, sure. Some of the new applications that we are seeing are what are called few shot learning applications. So we have already discussed what supervised learning is, what unsupervised learning is, what self-supervised learning is. Well, let’s talk now about few shot learning.
Few shot learning refers to the process of learning with only a few samples. It’s like you have two or three examples and then you have a model that solves everything in the context of the samples you used as examples. So, if you imagine that, that’s exactly what you can do by prompt engineering a model. For example, let’s say that you want to do sentiment analysis on the answers… not answers, but the comments that you get about your product, for instance, then the only thing that you need to do right now is to write a prompt, take the comments as an input, and ask in the prompt the model to produce whether this was a positive or a negative response. And that’s just an example of the amazing things that you can do using this idea of few shot learning. You just show a comment and say this was a negative comment. Then you show another comment and you say this was a positive comment, and there you are. Then you have something that solves a problem for you, and this is just an example, but you can do whatever you want. At Arionkoder, we have used language models for things like that, like for determining the role, the engineering role that is more likely to be associated with this profile, for instance. And you just take a LinkedIn profile and you get the output based on that or things like that are relatively easy to do with this idea of learning.
Then there are other applications that are multimodal applications of large language models. When we talk about multimodal, we are talking about taking not just text as input but also images or video or things like that. So you can imagine things that are able to detect the frames of a video that contain some information, some specific information, or when a character appears on a video or things like that, or even using these generative models that we are seeing right now, like Midjourney or Stable Diffusion or things like that. They are grounded in large language models and a diffusion model that is able to produce the output. And last but not least, we have these dialog-tuned language models, this conversational AI that we are seeing deployed everywhere.
Vico: Everywhere. Yes, everywhere.
Nacho: It’s like the new way to interact with a computer is as a chatbot in writing and writing and writing questions and then things in a textbox and yes, and this is basically changing the way in which we are doing many tasks like, for example, customer service. I think that 90% of current systems that are doing customer service or that relied on humans for customer service are now using chatbots instead because you can, it’s just a matter of detecting when you need a human, when an answer is difficult to be produced by a machine learning model and then connect a human with the system. But you can do it now in a very, very straightforward way. So that’s, I think, the most easy to see application. But that’s another application. There you have it.
Vico: Today, we’re seeing many of these applications as chatbots, still on the reading side. There’s also a few on into directly conversational and speaking indirectly. What’s your thought on those more spoken tools and changing that type of interface?
Nacho: Well, Another example of a multimodal application. So when it comes to talking with a computer directly, you are using a model that translates sound into text, and then you just connect the text with a large language model. And from there you have the answer. What really fascinates me is the fact that with prompt engineering you can also produce text or not just text, sorry, but code and instructions. And then you can run those instructions directly. And I think that’s at the core of many products that we are seeing right now. We are seeing that, for example, even in Microsoft Windows, you can now have agents that can solve tasks for you with the operating system. Just instruct something like “change my screensaver” or produce instructions like that. And then you get internally what the AI model is doing is producing not the code, but the steps that need to be mimicked with code to produce the output that you are looking for. And that’s amazing.
But I have a question for you right now, which is how do you guys, as product engineers and as product experts, see this environment changing our lives?
Calde: Well, I think LLMs are starting to be seen everywhere, and in a lot of cases, you know, the initial implementations aren’t too powerful right now but are still interesting. I think also what they are bringing for the future and we can see that in a lot of marketing messages really, the way they present the LLMs and AI tools. I think they try to convey the way that they are seeing this iteration in the future. But it’s just like a marketing promise for now.
For example, I’m thinking about Miro; I use Miro every day. And they have introduced this mega series of features that had to do with adding, right? AI. And the AI is supposedly to assist you in solving things that I feel that the outcomes are yet; they need to be refined but they are really like exposing that tool as a way of learning, maybe in a way of showing what they can do or maybe just like a, you know, the rash and the fever for the new thing.
Nacho: Yeah. And speaking about Miro, because Miro in particular is intended for graphic-based interaction, let’s say you’re just putting stickies and connecting them. And that interaction is quite straightforward. It’s the same way in which we collaborate with a whiteboard, for instance. But do you believe that these features that they are adding right now on language are useful, or do you think that it’s like forcing a technology into something that doesn’t need it at all? I don’t have an opinion myself.
Calde: I have it. If it worked like marketing promised, I think they would be awesome, right? And it would be really useful for us because what we do when we work in Miro is we just like trying to organize ideas. There are ideas that we put in the stickies, and that we do that just to visualize our thinking in small chunks, right? And then we try to put them in to make a relationship. We do affinity mapping, we create flowcharts, we try to put ideas together or separate them, and they are cooperations that could be really interesting for an LLM to solve, right? And at least to be like an assistant. But when you try with an actual series of stickies, the output that you got, this number, is not what you expected or you want to do things with that. There’s like a huge thing, a huge topic there that is about how you interact with this technology, like what are you expecting from it? And I think we should like from a perspective of a product designer, I think we should keep our promises a little bit more down to earth, like not promise that we’ll solve everything, and maybe like a collaboration with users until it be like, don’t try to propose we are getting a magic solution. It’s more like something that will get you in the middle of it.
Nacho: Because I guess that there are some matters around how much your users trust this technology, if they start seeing that the outputs that you get are not exactly the ones that were promised, then that damaged the credibility of your solution. Right? It’s like it creates a negative feedback, probably.
Calde: Yes. And I think that’s a problem us as product designers have to think about, right? It’s not that we just are connecting something new and that thing does the magic. It’s more about connecting things in a way that first you have to move forward with your magic. What are you doing? That is your process and such? But I think there are changes in the future and we’re really excited to have this work for us in the future. I’m just talking of the things that I tried in Miro and maybe for when I see, for example, for summarizing transcripts of interviews that is something that we need to do because we have you sitting down for interviews with users, it’s an interesting use case, right? Because knowing you can summarize the main points of a conversation, it’s a lot of output, and a lot of things it can include, you have your your own highlights as a human, you already are thinking “okay, this important” and also what’s important about that conversation. So here we have a pattern, but technology could add something, right? That could be interesting, the generated summary with things that you missed maybe or we could add some extra points that you didn’t consider.
Vico: I’m actually excited to see specific applications in this regard because today we have like general summaries and I do agree it’s pretty powerful to reduce time, get quick results, but I wonder if we can have an even greater potential if we have LLMs like specifically trained around human behavior and even bias and much more focus on looking for behavioral patterns in language and being aware of cultural context.
That would be an interesting potential field for impacting product design on the ethnographic step would be doing research because there’s a part of making sense of the human behavior when you’re going through that interview, right? And when you see and trying to decide, okay, well, what are the thought patterns this person is applying to when using a product, actually. And I’m just really future thinking. And if had to try to like imagine if we could do that, like find the patterns that are being applied, the biases, hey, even cognitive processes or fears around this interview.
Calde: And at the same time, yeah, and at the same time, I think that that’s one of the things that are still not solved. Right? Or maybe this is just like an idea, but models are trained based on human learnings. Human learnings have biases. So they have some flaws in that specifically in that part, right?
Vico: A hundred percent. For me, it’s where they have the huge, the biggest limitation. Right?
Calde: Yeah.
Vico: How much context they can manage from this and how much understanding they actually do have around human behavior or even their own bias. And that it’s like we’re doubling that usually as humans we do try to see what others, it’s even difficult to see our own bias, try to ask the machine learning model, large language model to be self-aware of their own biases, but like trying to ask the impossible there without a doubt.
Nacho: Yes, because there no awareness in machine learning models.
Calde: We’re going to do that for a future chapter, right?
Nacho: Yes, I think. I think that we should do that. Yeah. And one thing that maybe it’s good enough for the next episode, but I would like you to at least grasp the surface of that. Don’t you guys believe that now that we have AI and generative AI in particular, and LLMs in particular, don’t you have the feeling that companies are trying to push that technology regardless of the use cases scenario? Don’t you think that they are desperate to put AI in their websites without taking a moment?
Calde: It’s the new thing, right? And we have gone through this in the history of technology, like companies trying to incorporate new things in a way that still doesn’t work, but they are good marketing things. So it’s part of the history of the internet. Like in the initial webpages that didn’t have any purpose really and then understanding how you can use as a tool for information and easing processes, right? Like processes in different situations and then web applications and web 2.0 and then mobile applications and you know, like, it’s tech that’s part of the history of technology. People start trying to add something to their products. That’s the new thing and they wonder how to make the most of it.
And I think that we are in this initial phase regarding generative AI, they’re all trying to figure out how, right? And they are trying to think of what cases can be useful and try to incorporate the technology in that way. And still, we don’t, we are not seeing like a lot of success. I mean, there is a lot of success for ChatGPT and some specific applications, but in general tools, I think we are just starting to scratch the surface, right? In the tools that we use day to day. The first use cases maybe aren’t that good or it doesn’t get to outcomes, but they will eventually do and there will be like, when the platform shift really starts to happen, right? Because that will change, will move everything probably to this new platform that is LLMs and generative AI. I have the feeling that it’s starting to happen.
Nacho: And if you guys have to recommend a company that is considering about adopting AI, if you have to recommend a way to do that, a way to start thinking about AI as a solution, what would you do?
Calde: Well, the first thing for me would be to think of it not as something that’s just plug-and-play, because if you think of it that way, you really are missing the point. You’ll be trying to include it in a way that’s just like, integrate in any way, and would be so much worse than whether we are seeing these products making. Like Figma, Miro, it would be just like an extra feature just to say you have it, right? You’re doing AI and that’s happening to a lot of companies right now.
So any new product feature, it starts from solving a problem, right? Any product feature that is really contributing to a product, this solves problems in a more effective way that all the tools are doing. So in the background, just again, about users, right? And how this technology will help with user goals and needs in a particular context in which they solve the problem in a more efficient way. So it’s getting back to product design, in that sense, it’s getting back to… and I think we need to understand more of models and ML engineers need to understand more of product design in order to get to good solutions. So it’s back to shared understanding and maybe the goal of these podcasts, right? to make it like something that is shared. Like the understanding of the technology and product design with AI.
Nacho: That’s great, that’s great.
Vico: In the end, it all dials back into having teams that are driven by the whole context in the end, right? Are we being strategic in thinking what does my product actually solve for my users? How could I make this even better, more efficient with and by a plan, find a way for a more strategic path to actually consider it? It will depend a lot on the type of problem and the type of value proposition that I have as a product.
But if the team that is involved in exploring this is aware of potentials, of possibilities, and building this shared language of, Hey, how do we go through translating that problem into potential, at least a proof of concept with this, and build a roadmap as we experiment. I think this path of experimentation are what some companies are going through. It makes a lot of sense, but a lot of them were rushing into it without thinking of how does it actually connect with everything, right? They have like one…
Nacho: Yeah. Yeah. Because in the end, this is a solution for a problem. So you need to figure out the problem before trying to apply the solution. Otherwise, it’s like you’re putting the solution before the problem even appears and before even knowing if maybe the problem is there. But if that problem is not adding value to the user, then it doesn’t make any sense.
But on the other hand, I get Calde’s point that I think I never sit myself to think in those terms, in terms of, hey, let’s leave some space for experimentation, because sometimes even the users don’t know that they have a specific pain and they need to see that there’s something new that appears that solves something that they didn’t sit to think about. So it’s like a tradeoff, right, between the two things, between the this new solution that I have and that I can use for so many things and the pains that the users have and that I was unable to solve. So it’s like connecting the dots.
Vico: There’s an interesting paradox that some of the companies that have sat and heard what the customer said were the ones that ended up going back and disappearing because only what they did was listening to what the customer said. So there’s a dilemma there to being forward thinking. And this also made me remember these blog posts that we have around MAYA, the principle of the most advanced yet acceptable form that you could put forward for users. So it’s striking a balance, right? Into creating new spaces, but also what people are waiting for.
Calde: Yes. And it’s also the new toy, right? The new toy that a lot of people want to play with. Yes. Everybody wants to get it within. Everybody wants to get their hands dirty on it. And maybe they keep all that really simple application. And that happened with all kinds of technology. So remember the drinking application in the mobile phone era that it was just like, it was an animation of a beer and you can play with it as if your phone was a glass of beer so you just lean the screen towards your mouth and the beer started to also lean into your mouth. And it was just playing, you know, like the most dumb application of playing with that particular thing that new phones, new technology has, and I think a lot of people are still on that stage with AI, with images, and most of us right now, we just have to find a way to apply it. Not going for playful things but also things that are of use. That’s starting to happen a lot.
Nacho: Yeah. And the funny thing is that it’s a technology that is easy to use. Going back to the very beginning of this podcast, we are talking about just coming up with instructions. So I think that we as humans are really good at instructing someone else to do our dirty job. So I guess that in the context of innovating with large language models, it’s just a matter of exploiting that side of us and coming up with instructions to get the output that we want. So yeah, I think that full of opportunities, but we are running out of time. So I would like to save us a few minutes to talk about the ethical implications or considerations around large language models and the kind of things that we as either as machine learning experts or product experts need to think about.
My take on that is that, well, I think Vico mentioned that before. When it comes to training learning models, we are talking about data, and the data contains our biases. They are intrinsic. They’re of the nasty things that we do on the Internet because we don’t expose our faces there. They are contained in the data that is there and therefore is captured by the algorithms. So those biases, in my opinion, are something to be worried about. Also, the fact that these versions of AI that we are using right now are non-steerable, which means that they go wherever they want and we cannot control where they are going. We can do that with filters that we need to put before the prompt. That fact really worries me a little bit. Also, the carbon footprint of AI is something that we need to be aware of. The fact that these big, humongous models are trained on GPUs that are fed with fossil fuel-based energy. That’s something that needs to worry us as this civilization, and also the fact that we are putting a technology that is so powerful, so powerful in the hands of basically everyone in the world as users, not as makers, but as users might come up with some malicious use cases that we need to focus on. For example, now we can write fake news in a relatively easy way, and that’s something that will really help nasty people to create chaos and political disorders and things like that. And I think that we should be aware of that. But I want to know your take about the ethical considerations of AI.
Calde: Yeah. I think that this topic, this last topic that you commented is one of the most worrying things for me and I think the problem in the background is just what are the main goals that you are putting behind the technology, right? Are you considering putting community goals in the front of it? Like the society’s goals, user’s goals, or are you just optimizing for business? Right? That’s like a consideration to make and I feel we have to go way beyond that. Involving a series of people to understand how our model can be used for bad stuff. That is the red team idea. So I read about some people trying to do like extra things with AI like going to be part of a team to be how this could affect society as a whole and how this could end up producing bad outcomes when trying to do something good. So it’s really interesting, but I think this is one of the most relevant problems because we have this new technology out there. We can use it and it can be used for harming activities, find things that are dangerous to the society.
Nacho: Vico, what’s your take on this?
Vico: I was thinking a little bit back about what we said at the beginning. We create representations of the world from what we learn and what we understand. Basic information on the Internet is more and more these. Generative AI and, in particular, large language models with their ability to create content in a massive amount, are creating a future gap in knowing what abilities as humans also to make sense of the world more and more as we are becoming more wary of what’s real, what’s not. And there’s a huge difficulty around this, not only around fake news, but how this actually accelerates harmful activities and puts our whole system at risk.
The first steps for anyone to target us in cybersecurity, they will try to collect information about us, and actually, LLMs accelerate this stuff. They can actually make it where they collect information about us and how they start to understand will learn from us if they’re like the whole reconnaissance process. And so with this, but I’m also thinking of other product developers and how this also opens a huge door into security and how actually exploring code and finding vulnerabilities and hunting around and deconstructing my information is accelerated with applications like these.
I wonder if we’ll see anything like that. I know we’re far from there. But I get a little bit worried also about that. But without a doubt, for me, the hugest concern is how we have handed out a massive capacity of building narratives around what’s happening in the world. And those narratives we consume is actually how we create an image of the world. So it’s a little bit my biggest scare that we are all aware that the world we’re consuming and the power of what’s happening around, it’s not necessarily what we see. And more and more with these talks around. So being really critical and keeping our minds open to ask around with the news that we’ve come across, it’s going to be critical, and hopefully, we’ll start seeing clear ways of tagging, like “Hey, this was created with AI,” or things that will allow for traceability of the content of these tools, right? At some point to be more transparent of what’s this being created.
Nacho: Yeah, well, there’s a lot that we can discuss about this, and probably some of these topics will be covered in the next episode. But I would like to thank you guys Vico, Calde, and of course the listeners that are there. If you made it to this point, that means that you have listened to our episode all the way to the end. So that’s cool! Thank you very much for your company today.
So in this show, we have covered the fundamentals about LLMs, and we learned a lot about how these models are trained and are reshaping our products today. In our next episode, as I said before, we are going to cover some new things about this. We’re going to add the new element to the chain. So keep yourself in check. Please leave us a review on Apple Podcasts and Spotify or comment in the box below if you’re watching us on YouTube. And remember, that Chain of Thoughts seeks to build a community of product designers, developers, and AI experts to keep creating great digital tools for customers and users.
So to keep this community going, remember to like, follow, and subscribe to Chain of Thoughts both on Spotify, YouTube, and all social media, and share with your own followers using the hashtag Chain of Thoughts podcast on X and LinkedIn. So you can also follow Arionkoder on these social media platforms to keep yourself updated about the show and our actions as a company.
And yeah, this was Chain of Thoughts, the first episode.
Vico: See you around!
Calde: Bye bye!