大模型API稳定返回JSON – 梁工微型SaaS

开发大模型要求返回稳定的json api接口，往往不是那么好的效果，例如返回不是合法的json结构，或者形态总是和要求达不到，那么解决方法是什么：

1，利用Pydantic

Pydantic 是一个用于数据验证和设置管理的 Python 库，它允许你定义一个模型，然后验证输入数据以确保它们符合预期的格式。当你需要与一个 API 交互，特别是需要发送和接收 JSON 数据时，Pydantic 可以非常有用。

以下是一个使用 Pydantic 来请求 ChatGPT API 并获取正确 JSON 结构返回的 Python 代码示例。请注意，这个示例是假设性的，因为 ChatGPT API 的实际使用可能需要特定的认证和参数，这需要根据实际的 API 文档来调整。

from pydantic import BaseModel, Field
import requests

# 定义请求模型，根据实际 API 文档来定义参数
class ChatGPTRequest(BaseModel):
    message: str = Field(..., example="Hello, how can I help you today?")
    # 其他可能需要的参数...

# 定义响应模型，同样根据实际 API 文档来定义
class ChatGPTResponse(BaseModel):
    id: int
    text: str
    # 其他可能返回的字段...

# 假设这是 ChatGPT API 的 URL
CHAT_GPT_API_URL = "http://api.chatgpt.com/chat"

# 发送请求到 ChatGPT API
def send_request_to_chatgpt(message: str) -> ChatGPTResponse:
    # 创建请求对象
    request_data = ChatGPTRequest(message=message)

    # 将请求对象转换为 JSON
    request_json = request_data.json()

    # 发送请求
    response = requests.post(CHAT_GPT_API_URL, json=request_json)

    # 检查响应状态码
    if response.status_code != 200:
        raise Exception(f"Request failed with status code {response.status_code}")

    # 将响应解析为 JSON
    response_data = response.json()

    # 创建响应对象并返回
    return ChatGPTResponse(**response_data)

# 使用示例
try:
    response = send_request_to_chatgpt("Hello, how can I help you today?")
    print(response)
except Exception as e:
    print(str(e))

在这个示例中，我们首先定义了两个 Pydantic 模型 ChatGPTRequest 和 ChatGPTResponse，分别用于表示请求和响应的数据结构。然后，我们创建了一个函数 send_request_to_chatgpt，它接受一个字符串参数，创建一个请求对象，将其转换为 JSON，然后通过 requests 库发送到 ChatGPT API。

请注意，你需要根据实际的 API 文档来调整请求和响应模型的字段，以及 API 的 URL 和其他可能需要的参数。此外，错误处理应该根据你的具体需求进行调整。

参考： https://github.com/hassancs91/Google-Gemeni-Consistent-JSON-Response

2，利用 instructor

代码例子： https://github.com/hassancs91/AI-Tools-Pydantic-Instructor

#Replace With Your Output
class Titles(BaseModel):
    titles: List[str]

open_ai_client = OpenAI(
     api_key=openai_api_key,
)

instructor.patch(open_ai_client)   

def structured_generator(openai_model,prompt,custom_moel):
    result : custom_moel = open_ai_client.chat.completions.create(
        model = openai_model, 
        response_model = custom_moel,
        messages= [{"role":"user","content" : f"{prompt}, output must be in json"}]
    )
    return result

result = structured_generator(openai_model,prompt,Titles)
print(result.titles)

3，正则表达式提取

返回markdown格式的结果，利用正则表达式提取对应答案：

import re

# 假设这是从 ChatGPT API 返回的 Markdown 格式的字符串
markdown_response = """
# Response from ChatGPT
- **ID**: `123`
- **Message**: `Hello, how can I help you today?`
- **Additional Info**:
  - *Author*: `ChatGPT`
  - *Timestamp*: `2024-05-16T00:00:00Z`
"""

# 使用正则表达式提取类似 JSON 结构的数据
id_pattern = r"- **ID**: `([\d]+)`"
message_pattern = r"- **Message**: `([^`]+)`"

# 搜索匹配项
id_match = re.search(id_pattern, markdown_response)
message_match = re.search(message_pattern, markdown_response)

# 检查是否找到匹配项
if id_match and message_match:
    id_value = id_match.group(1)
    message_value = message_match.group(1)

    # 打印提取的数据
    print(f"ID: {id_value}")
    print(f"Message: {message_value}")
else:
    print("No match found.")

4，返回xml

例如我们不要求返回json，而是返回xml，要求<value></value>这样返回，就会提高返回准确性

5，不要要求太复杂返回，分多次请求不同维度数据

往往大模型不够能力的时候才会造成json返回失败，如果解决不了复杂的返回，那么就建议把答案返回简化，例如分2-3步返回简单的结构，就可以实现正常。

6，请求api 的时候配置启用 JSON 模式

为了防止这些错误并提升模型性能，在调用 gpt-4-1106-preview 或 gpt-3.5-turbo-1106 时，你可以将 response_format 设置为 { type: "json_object" } 以启用 JSON 模式。当启用 JSON 模式时，模型将被限制为仅生成能够解析为有效 JSON 的字符串。

验证Json是否合法：

base_prompt = f"Generate 5 Titles for a blog post about the following topic: [{topic}]"

json_model = model_to_json(TitlesModel(titles=['title1', 'title2']))

optimized_prompt = base_prompt + f'.Please provide a response in a structured JSON format that matches the following model: {json_model}'


# Generate content using the modified prompt
gemeni_response = generate_text(optimized_prompt)

# Extract and validate the JSON from the LLM's response
json_objects = extract_json(gemeni_response)

#validate the response
validated, errors = validate_json_with_model(TitlesModel, json_objects)

if errors:
    # Handle errors (e.g., log them, raise exception, etc.)
    print("Validation errors occurred:", errors)

else:
    model_object = json_to_pydantic(TitlesModel, json_objects[0])
    #play with json
    for title in model_object.titles:
        print(title)

参考代码：https://github.com/hassancs91/Google-Gemeni-Consistent-JSON-Response/blob/main/title_generator_tool.py

7 , Go语言的版本：

https://github.com/RealAlexandreAI/json-repair

package main

import (
    "github.com/RealAlexandreAI/json-repair"
)

func main() {
    // broken JSON string from LLM
    in := "```json {'employees':['John', 'Anna', ```"

    jsonrepair.RepairJSON(in)

    // output:	{"employees":["John","Anna"]}
}