2024-11-15

Creating the world’s best open-source roleplay LLM on Bittensor

2024-11-15

My last blog post explained how Bittensor works at a high-level. With this piece I want to highlight what we’re doing specifically at Subnet 11 - Dippy Roleplay. I hope this is a useful resource for potential subnet owners!

Subnet 11 is just one of 40 different subnets, each of which is tackling a unique AI research/product problem. Subnet 11 exists to create the world’s best open-source roleplay LLM through Bittensor’s incentive mechanism.

You can chat with this cute cat (named Guru) on the Dippy iOS App!

‍

What do you mean by Roleplay LLM? How is that different from LLMs like GPT-4?

Roleplay LLMs are created to power empathetic AI companions. In contrast, Productivity LLMs like GPT-4 are created for tasks like data analysis, classification, writing code etc. While GPT-4/Claude/Gemini serve business use-cases, roleplay LLMs serve the consumer use-case of entertainment.

Roleplay LLMs are currently the biggest consumer use-case of AI. Character AI is the world’s most used AI product after ChatGPT/Gemini. Character recently released a blog post saying they get 20,000 queries per second, which is 20% of Google Search's volume 🤯

Similar Web Data: C.ai is the #121st most visited website in the world with ~300M monthly page views

‍

Some of the most popular Character AI chatbots!

‍

On Character’s platform, you can chat with therapists, fitness coaches, anime characters etc. They have raised hundreds of millions of dollars to train their roleplay LLM from scratch and they open-source nothing. We do not know their model weights, architecture or size. They decide what they want to censor with no transparency to the end user. This is dangerous considering the scale at which Character operates. Their AI model is used by hundreds of millions of users, and we have no idea how it works under the hood.

Companies like Meta and Mistral are doing great open-source work by releasing state of the art productivity LLMs like Llama 3 and Mixtral. These open-source models perform at par with GPT-4/Claude/Gemini. However, there’s no comparable open-source roleplay LLM alternative to C.ai. We want to create that alternative through the power of Bittensor.

All the state of the art roleplay LLMs are centralised, with very limited effort dedicated to open-source research

‍

Cool, so I understand what a roleplay LLM is, but how do you evaluate one? And what does it even mean to create a “good” roleplay LLM?

Great question! Productivity LLMs like GPT have been studied extensively in academic literature. The benchmarks for evaluating productivity LLMs began to take shape in the 2010s. MMLU, HellaSwag and TruthfulQA are just some of the commonly used benchmarks today.

Unfortunately, no similar benchmarks exist to evaluate roleplay LLMs given how new they are. There’s also very little academic research done on roleplay LLMs. It’s a brand new research problem, with very limited academic literature published by centralised AI chatbot companies like Chai Research and Replika AI.

Since there’s no well-known benchmarks for roleplay LLMs, how does your subnet evaluate them? It seems like you are solving a brand new research problem through Bittensor.

Yes, that’s right! Our AI team has expertise with roleplay LLMs through building out the Dippy iOS App and this experience has helped us come up with an initial model evaluation criteria. But considering this is a brand new research problem, we are open to criticism as we iterate.

Everything in Bittensor is open-sourced, so you can post on our subnet 11 channel to suggest improvements. The developers in the ecosystem are world-class and some of the miners on our subnet hail from companies like Anthropic and Google.

At high-level, this is how our subnet works:

Miners use fine tuning techniques, or MergeKit, to train, fine tune, or merge models to create a unique roleplay LLM. These models would be submitted to a shared Hugging Face pool.

Validators assess model performance on various metrics that like engagingness, coherence, creativity and rank the submissions

On the pinned messages in subnet channel 11 you will find our W&B dashboard and model leaderboard to track top performing models and various other analytics. Our GitHub Repo also outlines the whole system in greater detail.

Sounds great, but how does one use the models created by miners on your subnet? And how do we even know if your roleplay model evaluation criteria is any good?

The easiest way right now would be to download the models from the leaderboard and run them yourself. Unfortunately, this isn’t straightforward. Moreover, roleplay models require special system prompting to unlock their true capabilities, so even if someone could get it set up themselves, they may not be able to utilize it to the fullest extant.

By the end of July, we will release a front-end, for anyone to test out the best miner submitted models on our subnet. This will clearly showcase a) the value generated by our subnet and b) how models are improving over time as we update our evaluation criteria.

Early designs of our front-end (slated for release end of this month!)

‍

Who will be the end user of your subnet? How will it generate revenue?

We launched our subnet less than 2 months ago, so have a lot of progress to make on our path to creating the world’s best open-source roleplay LLM.

Within ~6 months we expect the model created by our subnet to power the Dippy app. At this point, other start-ups and indie developers will be able to use it for their AI companion apps. The AI companion category is the fastest growing product segment in consumer AI and is expected to generate $70-$150 billion in annualized revenue by 2030.

Investor Sami Kassab outlines subnet monetization very well in his research report. He explain how developers will pay for access to the models through APIs provided by the subnet validators. Payment for API access will occur offchain, directly between the developer and the validator. Validators have the flexibility to accept various forms of payment, including fiat currencies, making it as easy to use as a traditional web2 API.

Monetization is one of our north stars, but we’re still many months away from generating substantial revenue. Corcel is a great example of a world-class API product built entirely on top of Bittensor.

Corcel has an epic product built on top of Bittensor and generates substantial revenue

‍

This sounds epic! How do I get involved?

A great place to start would be the Bittensor Dev Docs and the Bittensor Discord Server. Each channel in the discord server is operated by a subnet owner and in the “pinned” section of each subnet channel you will find detailed info on how to get started as a miner. If you want to work on building great roleplay models, check out channel 11! TaoStats is the best website to visualize the whole ecosystem.

Bittensor offers a great opportunity to contribute to the open-source AI movement and my hope with this blog is to demystify the protocol for many!

‍