How to Use Custom Large Language Models in AutoGen

autogen

Background

AutoGen natively only supports foreign LLMs such as OpenAI, Claude, Mistral, etc., and does not support domestic LLMs. However, some domestic LLMs perform quite well, especially considering the price factor. Their cost - performance ratio is quite good. In the past two days, I’ve been trying to figure out how to integrate domestic LLMs.

Although AutoGen doesn’t directly support domestic LLMs, it does support custom LLMs (custom model). You can refer to this blog: AutoGen with Custom Models: Empowering Users to Use Their Own Inference Mechanism

However, the sample code in the blog is not very intuitive. In this blog, I’ll record the specific steps on how to integrate domestic LLMs and provide sample code.

Custom Model Class

AutoGen allows for the definition of custom model classes as long as they adhere to its protocol.

The specific protocol requirements are in autogen.oai.client.ModelClient, and the code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class ModelClient(Protocol):
"""
A client class must implement the following methods:
- create must return a response object that implements the ModelClientResponseProtocol
- cost must return the cost of the response
- get_usage must return a dict with the following keys:
- prompt_tokens
- completion_tokens
- total_tokens
- cost
- model

This class is used to create a client that can be used by OpenAIWrapper.
The response returned from create must adhere to the ModelClientResponseProtocol but can be extended however needed.
The message_retrieval method must be implemented to return a list of str or a list of messages from the response.
"""

RESPONSE_USAGE_KEYS = ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"]

class ModelClientResponseProtocol(Protocol):
class Choice(Protocol):
class Message(Protocol):
content: Optional[str]

message: Message

choices: List[Choice]
model: str

def create(self, params: Dict[str, Any]) -> ModelClientResponseProtocol: ... # pragma: no cover

def message_retrieval(
self, response: ModelClientResponseProtocol
) -> Union[List[str], List[ModelClient.ModelClientResponseProtocol.Choice.Message]]:
"""
Retrieve and return a list of strings or a list of Choice.Message from the response.

NOTE: if a list of Choice.Message is returned, it currently needs to contain the fields of OpenAI's ChatCompletion Message object,
since that is expected for function or tool calling in the rest of the codebase at the moment, unless a custom agent is being used.
"""
... # pragma: no cover

def cost(self, response: ModelClientResponseProtocol) -> float: ... # pragma: no cover

@staticmethod
def get_usage(response: ModelClientResponseProtocol) -> Dict:
"""Return usage summary of the response using RESPONSE_USAGE_KEYS."""
... # pragma: no cover

To put it simply, this protocol has four requirements:

  1. The custom class should have a create() function, and the return of this function should be an implementation of ModelClientResponseProtocol.
  2. There should be a message_retrieval() function to process the response and return a list containing strings or message objects.
  3. There should be a cost() function to return the cost incurred.
  4. There should be a get_usage() function to return a dictionary. The keys should come from ["prompt_tokens", "completion_tokens", "total_tokens", "cost", "model"]. This is mainly for analysis. If usage analysis is not required, an empty dictionary can be returned.

Practical Example

Here, I’m using the Claude model hosted by UNIAPI (an LLM proxy), but domestic LLMs can fully apply the following code.

The code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
"""
This code demonstrates how to customize a model. This model is based on UniAPI,
but any LLM that supports HTTPS calls can apply the following code.
"""

from autogen.agentchat import AssistantAgent, UserProxyAgent
from autogen.oai.openai_utils import config_list_from_json
from types import SimpleNamespace
import requests
import os


class UniAPIModelClient:
def __init__(self, config, **kwargs):
print(f"CustomModelClient config: {config}")
self.api_key = config.get("api_key")
self.api_url = "https://api.uniapi.me/v1/chat/completions"
self.model = config.get("model", "gpt-3.5-turbo")
self.max_tokens = config.get("max_tokens", 1200)
self.temperature = config.get("temperature", 0.8)
self.top_p = config.get("top_p", 1)
self.presence_penalty = config.get("presence_penalty", 1)

print(f"Initialized CustomModelClient with model {self.model}")

def create(self, params):
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}

data = {
"max_tokens": self.max_tokens,
"model": self.model,
"temperature": self.temperature,
"top_p": self.top_p,
"presence_penalty": self.presence_penalty,
"messages": params.get("messages", []),
}

response = requests.post(self.api_url, headers=headers, json=data)
response.raise_for_status() # Raise an exception for HTTP errors

api_response = response.json()

# Convert API response to SimpleNamespace for compatibility
client_response = SimpleNamespace()
client_response.choices = []
client_response.model = self.model

for choice in api_response.get("choices", []):
client_choice = SimpleNamespace()
client_choice.message = SimpleNamespace()
client_choice.message.content = choice.get("message", {}).get("content")
client_choice.message.function_call = None
client_response.choices.append(client_choice)

return client_response

def message_retrieval(self, response):
"""Retrieve the messages from the response."""
choices = response.choices
return [choice.message.content for choice in choices]

def cost(self, response) -> float:
"""Calculate the cost of the response."""
# Implement cost calculation if available from your API
response.cost = 0
return 0

@staticmethod
def get_usage(response):
# Implement usage tracking if available from your API
return {}


config_list_custom = config_list_from_json(
"UNIAPI_CONFIG_LIST.json",
filter_dict={"model_client_cls": ["UniAPIModelClient"]},
)

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list_custom})
user_proxy = UserProxyAgent(
"user_proxy",
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
)

assistant.register_model_client(model_client_cls=UniAPIModelClient)
user_proxy.initiate_chat(
assistant,
message="Write python code to print hello world",
)

If you want to change to another model, the only requirement is that the model supports HTTP calls. Then replace self.api_url = "https://api.uniapi.me/v1/chat/completions" with your own value.

Before running the above sample code, you need to create the UNIAPI_CONFIG_LIST.json file and make sure it can be read by the program. Its format is as follows:

1
2
3
4
5
6
7
8
9
[
{
"model": "claude-3-5-sonnet-20240620",
"api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"temperature": 0.8,
"max_tokens": 4000,
"model_client_cls": "UniAPIModelClient"
}
]

In fact, this JSON is essentially a configuration for the LLM, specifying some necessary parameters. The value of model_client_cls should be the name of the custom model class, and this cannot be written wrong.

That’s all for how to use custom LLMs in AutoGen.

In this blog, I only provided specific sample code without in - depth interpretation. If you’re interested, you can read the official documentation.

Here, I have to complain that the AutoGen documentation is not very good. Many sample codes are outdated and not updated in a timely manner along with the code, which has caused quite a few pitfalls.