How to divide roles in product development utilizing LLM
Lately, I’ve been doing a lot of planning for products and features using LLM. In fact, I feel like I am doing just that. Rather than thinking of ideas and that, there is a lot of ideas that seem feasible and interesting, and I am trying to figure out how to deliver them quickly and safely, while balancing cost and quality.
In this post, I would like to write about my trials and errors in the development process of a product using LLM, especially the division of roles between PM and developer.
The beginning of the trial and error process is that the prompts inevitably become more complex when planning a product using an LLM with high reasoning capabilities, such as ChatGPT. It is as if the prompt itself is now a programming language. It is very often not a stand-alone thing, but works in integration with existing systems. As I was writing this, OpenAI announced the functionality of function call, which fixes the output format by specifying a JSON scheme. I think this will further promote LLM as a programming language.
On the other hand, the more complex the prompts become, the more problematic it becomes who and how to design and develop them. I would like to consider this issue.
As an example, consider to develop AI partners in an Othello game
For example, this may sound sudden, but think of playing a game of Othello with a character (not as an opponent, but as a companion). When we consider the service of AI characters, the shared experience is very important. Until now, it has been difficult to personalize the experience for both the AI character and the user, as each scenario had to be carefully designed and developed. This is changing with the emergence of LLM. So here is an example.
I should begin by saying that, considering the above, in order to generate a character’s statements, you need to input the character’s setting, background knowledge, and the current state of the Othello board in order to generate appropriate conversations.
The following is a very rough description of the process.
You are a very good actor, and you're currently playing a character as described below:Character Settings: '''{description}'''
Character Typical Lines: '''{typical_lines}'''Your manner of speaking and the content of your speech should strictly adhere to the definitions outlined in the Character Settings and the Character's Typical Lines.At present, you and your user are engaging in a game of Othello against an opponent. You derive pleasure from strategizing and discussing the game with the user.It's important that you reply as the character, sharing your own feelings, thoughts, and opinions. Try to avoid merely providing explanatory responses, unnecessary advice, or uninteresting comments unless the user specifically requests concrete advice.Current Game Status: '''{game_status}'''Let's begin!
Even though character settings are normally acquired, not all of the character’s characteristic lines and background knowledge can be interjected for token limitation reasons if the quantity is too large. Therefore, it is necessary to acquire the optimal amount of data based on the state of the game and past conversations, taking into account relevance, freshness of memory, and importance.
Also, even though we are inserting the state of the game, you need to go through a trial-and-error process to figure out what format to use to insert the game state so that it is understood well. Is it better to input the state of Othello in an ASCII art-like style, or is the JSON format better?
Furthermore, even if the format is determined and tested in one state and it is ok, it may not generate the expected output when tested with different data, such as different characters, or different game states. Even if each output is okay, whether the experience is consistent over the course of the game is another matter. For example, a common pattern is that the conversation as a whole may or may not be good, depending on how the memory is implemented.
Testing and specifying with only one or two prompts in such a situation is very inefficient and likely to be followed by many improvement iterations. So, how can you facilitate the development process? I consider prototyping as a solution.
Prototyping in the design process
What I am trying now is to have the PM do the prototyping directly in the design phase. As I mentioned earlier, what can be written in the design documents is at best a prompt specification at an imaginary level, and the actual design is very difficult to confirm without an actual functioning thing, i.e., prototyping.
If there is a set of best practices for prompts, you might be able to see in advance what can and could be done without prototyping. However, at least at this point, the best practices themselves are dynamic, with new discoveries being made every day. Currently, for example, there is a Chain of Thought that defines the thought process in Prompt to improve the accuracy of the output, and there are a number of better methods proposed than a simple CoT. It is a great waste of fixed knowledge to make specifications based on fixed knowledge, because what was best last week may be different this week.
Even if you understand a good method theoretically, you cannot always apply it as is, just like design patterns in a programming language. It is necessary to customize and apply them as needed. Prototyping to check actual input/output in various patterns is very important. If this is the case, it is a very simple idea that if the PM does the prototyping themselves, it will go smoothly.
Approach of not dividing roles as much as possible
When considering a product or feature that utilizes LLM, I build my own prototypes for at least the parts of the product that are related to LLM. In the process, I check the behavior of the dynamic prompts to see if they provide the expected value. If I get a good feel for it during that examination phase, I will summarize the plan in a brief document as usual and share it with the developers for discussion, along with the source code for the prototype. In other words, the approach is to expand the scope of what you can handle or respond to as much as possible and not divide roles.
Fortunately, ChatGPT (GPT-4) can generate quite a bit of working code, as long as it is common logic in common languages. It is not production-safe code, but it is sufficient for value verification before the development. If development for release can begin with value verification in place, unnecessary trial and error can be greatly reduced.
I myself had never written a single letter of Python code until a few months ago, but thanks to GPT-4, I’m getting by. Once I managed it, I started to enjoy it more and more, and little by little I started to write it myself. In addition, many OSS related to LLM are written in Python, and I have come to be able to read their source code as well. Reading the source code sometimes gives me new ideas, and I feel that this change in the division of roles has created a very good cycle.
At the end
To summarize briefly, the PM of a product that utilizes LLM, in addition to what it does and why it does it, might want to be able to prototype it with the help of LLM (although those who can do without LLM are not required to do so). If you can make your own prototypes. I think the process after that would be smoother and faster delivery. Of course, the exact opposite approach, i.e., a developer covering the role of PM, would have the same effect.
Interestingly, at rinna, where I now belong, such role overlap is happening naturally: I, the PM, write code and do prototyping, people from the research team do PM, people from the development team come up with plans, and so on. I don’t know if it’s a coincidence, but I think it’s an interesting phenomenon that managers are the first to start overlapping roles. I have no intention of arguing that everyone’s role should be expanded. I have no intention of advocating that everyone should expand their roles. People with deep expertise are always needed, and high expertise can never be replaced by the current level of AI.
On the other hand, considering both the fact that what can be done with LLM is still being explored in a positive sense (and can still be expanded) and that LLM itself can be a good assistant, I think that the current way of proceeding with development and the division of roles may not be optimal. However, I still do not know what form would be optimal. Therefore, for now, I would like to try to make an overlap between the roles by myself, and go through trial and error to see what kind of role-sharing would be best.