Assistants API 베타 사용법 - Hello, AI world

Assistants API는 OpenAI에서 AI Assistant를 사용하기 쉽게 만들어 놓은 API 도구이다.

많은 앱들이 GPTs를 통해 만들어 지겠지만 좀 더 복잡한 기능을 구현하거나 내 서비스에서 GPT를 활용할 때 정말 많이 쓰이는 도구가 될 것 같다.

이걸 활용해서 많은 개발자들이 자신의 AI 앱을 만들 것으로 예상한다.

https://platform.openai.com/docs/assistants

Playground

코드를 설치하기 전에 모든 에이전트는 플레이 그라운드에서 만들고 사용해 볼 수 있다.

Assistants API 베타 사용법 - Hello, AI world image 1

https://platform.openai.com/playground

잘 만들어 놓은것 같다. Functions도 직접 추가할 수 있고 Code Interpreter, Retrieval, File Upload도 하고 로그까지 다 확인이 가능하다.

기본적인 원리

일단 assistant를 기본으로 시작한다. 여기에 스레드를 만든다.

스레드는 사용자와 앱과의 세션이나 메모리 같은 개념이다. 여기에 사용자의 메시지를 추가하고 run(실행)을 하면 된다.

run은 도구 등을 사용할지 등을 결정하고 프로세스를 만들고 실행한다.

완료가 되면 스레드에 Assitant의 메시지가 추가된다.

Assistants API 베타 사용법 - Hello, AI world image 2

쪼금 복잡하기는 한데 객체지향 프로그래밍을 많이 쓰는 것 같다. (엄청 어렵지는 않다.)

파이썬으로 실행해 보기

이제 파이썬으로 실행해 보자!

먼저 파이썬 SDK를 v1.2로 업그레이드 하자.

pip install --upgrade openai

적당한 폴더를 만들고 app.py 라는 파일을 만들었다.

app.py

from openai import OpenAI
  
client = OpenAI(api_key="YOUR_OPEN_AI_API_KEY")

YOUR_OPEN_AI_API_KEY 부분에 여러분의 키를 입력해 주면 된다. (키는 항상 노출되지 않게 조심)

Assistant 만들기

예제의 수학 선생님 봇을 만들어 보자.

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Write and run code to answer math questions.",
    tools=[{"type": "code_interpreter"}],
    model="gpt-4-1106-preview"
)

print(assistant)

instructions에 챗봇의 지침을 써주면 된다. model은 retrieval 도구를 쓰기 위해서 gpt-3.5-turbo-1106 또는 gpt-4-1106-preview 를 쓰면 된다.

python app.py 로 실행해 보자.

Assistant(id='asst_abcdefg', created_at=1699943063, description=None, file_ids=[], instructions='You are a personal math tutor. Write and run code to answer math questions.', metadata={}, model='gpt-4-1106-preview', name='Math Tutor', object='assistant', tools=[ToolCodeInterpreter(type='code_interpreter')])

이런 식으로 생성이 된다. 한번만 생성하면 저장이 되기 때문에 주석 처리하자.

id는 어시스턴트를 가져오기 위해서 필요하니 따로 저장하자.

스레드 만들기

이제 만든 어시스턴트를 가져오고 (retrieve) 여기에 스레드를 생성해보자.

# retrieve assistant
assistant = client.beta.assistants.retrieve("YOUR_ASSISTANT_ID")

thread = client.beta.threads.create()

YOUR_ASSISTANT_ID 는 여러분의 ID로 대체해 주자.

스레드는 유저가 대화를 시작하면 유저당 하나씩 만들기를 권장한다고 한다. 일종의 세션과 같은 개념으로 대화를 하다 아예 다른 맥락의 대화를 시작하려면 새로 만들어 주면 될 것 같다.

여기에 메시지나 파일을 추가할 수 있다. 그리고 스레드에 추가할 수 있는 메시지는 사이즈 제한이 없다고 한다. 이렇게 추가를 하면 요청을 할 때 알아서 모델의 컨텍스트 윈도우보다 적게 토큰을 자르는 등 관리해준다고 한다. (편하다...)

스레드에 메시지 추가하기

메시지에는 텍스트나 file을 추가할 수 있다.

...
thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?",
)

보면 threads의 messages 안에 추가하는 걸 볼 수 있다.

이렇게 추가한다고 바로 실행되는게 아니다. 이렇게 하고 run을 해야 한다.

어시스턴트 실행하기

이제 스레드의 run을 생성해야 한다. 이 안에서 어시스턴트는 도구를 사용해야 할지 바로 대답을 해야 할지 등을 결정할 수 있게 된다.

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
    instructions="Please address the user as Jane Doe. The user has a premium account.",
)

instructions 는 추가적인 지침이 필요할 때 넣을 수 있는 옵션 값이다. 예시에서는 사용자를 Jane Doe라고 부르라고 하고 있다.

Run의 상태 확인하기

run을 생성하면 queued 상태가 된다. 이 상태를 주기적으로 확인해서 completed 가 됐는지 확인해야 한다. 이게 좀 불편하기는 한데 곧 스트리밍이나 각 주기별 알림을 받을 수 있게 한다고 한다.

run = client.beta.threads.runs.retrieve(
  thread_id=thread.id,
  run_id=run.id
)

run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)

print(run)

이렇게 하고 실행을 해보자.

Run(id='run_J6ZLwK5MrOmhnDzeBNJxC87x', assistant_id='asst_dBpG4GnZgwM9VZ7abgNCsbkw', cancelled_at=None, completed_at=None, created_at=1699945983, expires_at=1699946583, failed_at=None, file_ids=[], instructions='Please address the user as Jane Doe. The user has a premium account.', last_error=None,

metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, started_at=1699945984, status='in_progress', thread_id='thread_MbJ8AtAeyuaOufjyxaQRtDyb', tools=[ToolAssistantToolsRetrieval(type='retrieval')])

이렇게 Run 파이썬 객체가 나온다. status 를 보면 in_progress 인것이 보인다. 이 status가 completed가 나오면 된다.

지금은 stream이 없기 때문에 polling 으로 1초마다 확인하고 completed 가 되면 끝나는 파이썬 코드를 간단하게 짜보자.

run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
# print(run)

while True:
    if run.status == "completed":
        break
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    print(run)
    time.sleep(1)

이렇게 completed 가 되면 다음 루프로 들어가지 않게 해보았다. 그리고 실행을 하면...

...

Run(id='run_eq5pJM00PZF0laqiy8fdkLCG', assistant_id='asst_dBpG4GnZgwM9VZ7abgNCsbkw', cancelled_at=None, completed_at=1699946536, created_at=1699946531, expires_at=None, failed_at=None, file_ids=[], instructions='Please address the user as Jane Doe. The user has a premium account.', last_error=None,

metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, started_at=1699946531, status='completed', thread_id='thread_iJzoLZiYgbkr5j5gQwvmi2zp', tools=[ToolAssistantToolsRetrieval(type='retrieval')])

이렇게 여러번 확인 후 status가 completed가 되고 created_at 에 값이 표현된 것을 알 수 있다.

마지막으로 threads에 assistant의 message가 추가됐는지 확인해 보자!

while True:
    if run.status == "completed":
        break
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    # print(run) - 주석처리
    time.sleep(1)

messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages)

이렇게 하면 다음과 같이 맨 위에 응답이 추가된 것을 알 수 있다.

SyncCursorPage[ThreadMessage](
    data=[
        ThreadMessage(
            id='msg_ltHvFCw15RanhJljUKjcqcAI',
            assistant_id='asst_dBpG4GnZgwM9VZ7abgNCsbkw',
            content=[
                MessageContentText(
                    text=Text(
                        annotations=[],
                        value='Sure, I can help you with that. To solve the equation `3x + 11 = 14`, we can start by isolating the variable `x`. \n\nFirst, we subtract 11 from both sides of the equation:\n\n\[3x + 11 - 11 = 14 - 11\]\n\nThis simplifies to:\n\n\[3x = 3\]\n\nThen, we divide both sides by 3:\n\n\[x = 1\]\n\nSo, the solution to the equation is \(x = 1\).'
                    ),
                    type='text'
                )
            ],
            created_at=1699946783,
            file_ids=[],
            metadata={},
            object='thread.message',
            role='assistant',
            run_id='run_Oz28TihquaTdWyuM50qNmsQR',
            thread_id='thread_4NJuABVmUvLOoGaKsSYCZIZ3'
        ),
        ThreadMessage(
            id='msg_KpShIC6RduFqhX4NMMCDuE9e',
            assistant_id=None,
            content=[
                MessageContentText(
                    text=Text(
                        annotations=[],
                        value='I need to solve the equation `3x + 11 = 14`. Can you help me?'
                    ),
                    type='text'
                )
            ],
            created_at=1699946780,
            file_ids=[],
            metadata={},
            object='thread.message',
            role='user',
            run_id=None,
            thread_id='thread_4NJuABVmUvLOoGaKsSYCZIZ3'
        )
    ],
    object='list',
    first_id='msg_ltHvFCw15RanhJljUKjcqcAI',
    last_id='msg_KpShIC6RduFqhX4NMMCDuE9e',
    has_more=False
)

Sure, I can help you with that. To solve the equation `3x + 11 = 14`, we can start by isolating the variable `x`. 

First, we subtract 11 from both sides of the equation:

[3x + 11 - 11 = 14 - 11]

This simplifies to:

[3x = 3]

Then, we divide both sides by 3:

[x = 1]

So, the solution to the equation is (x = 1).

3x + 11 = 14 에서 x는 1이 맞다! (기본적으로 step by step 프롬프트가 들어가 있는 것 같다.)

이런 식으로 호출해서 사용하면 된다.

이런 방식으로 다양한 도구들을 결합해서 창의적으로 사용할 수 있다. OpenAI 내부 도구인 Code Interpreter나 Knowledge retrieval 을 사용할 수 있고 우리가 만들거나 사용하는 외부 도구는 Function calling을 통해 사용하면 된다.

사실 토큰 관리나 리트리벌 같은 경우 개발자들이 직접 구현하려면 귀찮거나 힘든 일이 될 수 있는데 이런 것들을 잘 고려해서 만들어 놓은 것 같다. 확실히 개발자들의 이야기를 많이 들은 것 같다.

가능하면 다음에 도구 부분도 정리해 봐야겠습니다!

추가적인 사항

code_interpreter나 retrieval에서 사용하는 파일은 file endpoint를 통해서 올리거나 삭제하면 된다. SDK에서는 client.files 객체를 사용한다.
파일은 어시스턴트 레벨과 스레드 레벨에서 관리할 수 있다.
파일은 20개까지 올릴 수 있고 개당 512MB 제한을 가지고 있다. 조직 당 100GB 가 총량이다.
Retrieval은 $0.20/GB/assistant/day 라고 한다. 기가바이트 당, 일 당, 어시스턴트 당 가격이다. 임베딩은 처음 할 때만 빼놓으면 비용 거의 안 들텐데 아마 벡터 디비 때문에 그럴 것 같다. 1개의 어시스턴트 10GB의 경우 월 60달러 정도이다. Assistants API | OpenAI Help Center
코드 인터프리터 가격은 세션 당 $0.03이라고 한다. 세션은 한시간 동안 지속된다.
OpenAI에 보내는 데이터 및 파일은 모델 교육에 사용되지 않으며 필요할 때 언제든지 데이터를 삭제할 수 있다.
API Key가 있으면 전체 조직의 자원에 접근할 수 있다. 그래서 실서버 환경에 올린다면 권한을 확인해서 특정 리소스에 접근할 수 있게 하는 것이 필요할 것 같다. 만약 정말 중요한 환경이면 organization을 새로 만들어서 하는게 좋을 것 같다.
곧 도입 될 기능 : streaming, notifications, DALL-E 도구, 사용자의 메시지에 이미지를 첨부할 수 있는 기능.

전체 코드

from openai import OpenAI
  
client = OpenAI(api_key="YOUR_OPEN_AI_API_KEY")

# assistant = client.beta.assistants.create(
#     name="Math Tutor",
#     instructions="You are a personal math tutor. Write and run code to answer math questions.",
#     tools=[{"type": "code_interpreter"}],
#     model="gpt-4-1106-preview"
# )

# retrieve assistant
assistant = client.beta.assistants.retrieve("YOUR_ASSISTANT_ID")

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?",
)

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
    instructions="Please address the user as Jane Doe. The user has a premium account.",
)

while True:
    if run.status == "completed":
        break
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    # print(run)
    time.sleep(1)

messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages)