note
This article was last updated on February 26, 2024, 11 months ago. The content may be out of date.
We introduced the basic operations of a Telegram bots and how to create, configure a Telegram bot and how to use it to interact with users. This time, we talk about how to use Gemini to generate responses.
Quick Start
We can test Gemini api directly using the official website:
We can modify some settings on the run:
- Model: which model we’re using. Different models are suitable for different tasks.
- Temperature: Controls the balance between creativity and focus in the generated output. Higher temperature leads to more adventurous output and vice versa.
- Stop Sequence: Generation will stop when specified output is generated. Useful for controlling output length and creating a clear ending.
- Safety Settings: Adjusts the restriction regarding harmful and explicit content.
- Top K: Controls the tokens at each step of the generation. Only top k words are considered.
- Top P: Only considers tokens whose cumulative probability falls within a certain threshold.
Integration with Telegram
We can refer to the quickstart to familiarize ourselves with the api. As said before, we’re only interested in handling text and image messages.
For a simple chatbot, we want the bot to be able to respond to different types of messages automatically.
Handling Message Types
There are several types of messages in Telegram, judging from the optional fields of the message type.
For text messages, it’s simple, text
is not empty in this case.
For image messages, it’s complicated due to the way Telegram handles images:
In any message, there can be at most one image with different sizes that we can retrieve. There is an optional
caption
. If we send multiple images, it will be sent as several messages with one image each. The messages we receive will have amedia_group_id
that indicates which group the image belongs to.
tip
There are two different ways to handle messages with multiple images. We can use a timeout or let the user confirm all images are sent. For simplicity, we’ll let users confirm all images are sent.
In this case, we can use InlineKeyboardMarkup
to display a button to let users press to confirm. callback_data
will contain the information to let us know which media group is completely uploaded.
We’ll need to save the ids of these images for later use.
Handling Messages
When we receive a message, we’ll first need to find out who send this message to handle it properly.
- For text and image messages,
id
offrom
field indicates the user id. - For callback data when user confirms all images are uploaded, it’s the
id
offrom
ofcallback_query
field.
type From struct {
Id int `json:"id"`
IsBot bool `json:"is_bot"`
FirstName string `json:"first_name"`
LastName string `json:"last_name"`
LanguageCode string `json:"language_code"`
}
type Message struct {
MessageId int `json:"message_id"`
From From `json:"from"`
Chat struct {
Id int `json:"id"`
FirstName string `json:"first_name"`
LastName string `json:"last_name"`
Type string `json:"type"`
} `json:"chat"`
Date int `json:"date"`
Text string `json:"text"`
Entities []struct {
Offset int `json:"offset"`
Length int `json:"length"`
Type string `json:"type"`
} `json:"entities"`
MediaGroupId string `json:"media_group_id"`
Photo []struct {
FileId string `json:"file_id"`
FileUniqueId string `json:"file_unique_id"`
FileSize int `json:"file_size"`
Width int `json:"width"`
Height int `json:"height"`
} `json:"photo"`
Caption string `json:"caption"`
}
type CallbackQuery struct {
Id string `json:"id"`
From From `json:"from"`
Data string `json:"data"`
}
type Update struct {
UpdateId int `json:"update_id"`
Message Message `json:"message"`
CallbackQuery CallbackQuery `json:"callback_query"`
}
func extractFrom(update Update) int {
if update.Message.From.Id != 0 {
return update.Message.From.Id
}
return update.CallbackQuery.From.Id
}
Then we’ll handle these messages accordingly.
Text Messages
We assume this type of messages is part of a chat. We’ll retrieve chat history for this user and generate the response. If a response is successfully generated, the text and the response are appended to the chat history. We use the model gemini-pro
.
|
|
note
Gemini has a limit on the number of input tokens. If a chat is too long, the model will not generate any more outputs. We need to discard some of the histories in this case.
tip
We’re using redis to store chat history. Note that we can build chat history manually.
Messages with One Image
Because Gemini doesn’t support multi-turn conversations involving images yet, we just generate the response and forward it to the user. We use the model gemini-pro-vision
.
var parts []genai.Part
if update.Message.Caption != "" {
parts = append(parts, genai.Text(update.Message.Caption))
}
var ok bool
parts, ok = appendImagePart(writer, update.Message.Chat.Id, parts, update.Message.Photo[len(update.Message.Photo)-1].FileId)
if !ok {
return
}
var (
model = chatClient.GenerativeModel("gemini-pro-vision")
ctx = request.Context()
iter = model.GenerateContentStream(ctx, parts...)
sb strings.Builder
)
model.SafetySettings = contentSafetySettings
generateResponseStream(iter, &sb, writer, update.Message.Chat.Id)
note
Telegram only sends the file id of the image. To download the image, we need to first get the link of the image, then download it.
type GetFileResponse struct {
Ok bool `json:"ok"`
Result struct {
FileId string `json:"file_id"`
FileUniqueId string `json:"file_unique_id"`
FileSize int `json:"file_size"`
FilePath string `json:"file_path"`
} `json:"result"`
}
func getFileLink(fileID string) (string, error) {
resp, err := http.PostForm("https://api.telegram.org/bot"+botToken+"/getFile", url.Values{
"file_id": []string{fileID},
})
if err != nil {
return "", err
}
var r GetFileResponse
err = json.NewDecoder(resp.Body).Decode(&r)
_ = resp.Body.Close()
return "https://api.telegram.org/file/bot" + botToken + "/" + r.Result.FilePath, err
}
Messages Belonging to a Media Group
As said before, messages with images are not saved in the chat history. We have to wait for the user to confirm all the images are uploaded. Meanwhile, the image id from each message will be saved.
if update.Message.MediaGroupId != "" {
var (
mediaGroupKey = "media_group:" + update.Message.MediaGroupId
mediaGroupOnceKey = "media_group_once:" + update.Message.MediaGroupId
once = rdb.Incr(mediaGroupOnceKey).Val() == 1
pipe = rdb.Pipeline()
values []any
)
if once {
pipe.LPush(mediaGroupKey, update.Message.Chat.Id)
sendMessageInlineWithInlineKeyboard(writer, update.Message.Chat.Id, "press done when all photos are uploaded", "{\"inline_keyboard\":[[{\"text\":\"done\",\"callback_data\":\""+mediaGroupKey+"\"}]]}")
}
if update.Message.Caption != "" {
values = append(values, "caption:"+update.Message.Caption)
}
values = append(values, "file_id:"+update.Message.Photo[len(update.Message.Photo)-1].FileId)
pipe.RPush(mediaGroupKey, values...)
pipe.Expire(mediaGroupOnceKey, timeout)
pipe.Expire(mediaGroupKey, timeout)
_, _ = pipe.Exec()
return
}
note
We’ll send a prompt telling users should press the button after all images are uploaded. This prompt will only be sent once. INCR
is very useful here.
Callback Messages
After a user has confirmed all images inside a media group are uploaded, we can then generate a response using saved images ids for the media group.
var (
parts []genai.Part
entries = rdb.LRange(update.CallbackQuery.Data, 0, -1).Val()
chatID, _ = strconv.Atoi(entries[0])
captionCount int
captionIdx int
)
for i, entry := range entries[1:] {
prefix, val, _ := strings.Cut(entry, ":")
switch prefix {
case "caption":
parts = append(parts, genai.Text(val))
captionCount++
captionIdx = i
case "file_id":
var ok bool
parts, ok = appendImagePart(writer, chatID, parts, val)
if !ok {
return
}
}
}
if captionCount == 1 {
caption := parts[captionIdx]
copy(parts[1:captionIdx+1], parts[:captionIdx])
parts[0] = caption
}
var (
model = chatClient.GenerativeModel("gemini-pro-vision")
ctx = request.Context()
iter = model.GenerateContentStream(ctx, parts...)
sb strings.Builder
)
model.SafetySettings = contentSafetySettings
if generateResponseStream(iter, &sb, writer, chatID) {
answerCallbackQueryInline(writer, update.CallbackQuery.Id)
}
Generating a Response
To improve interactions, we use streaming APIs. Both the chat and one shot image text generation share the same iterator.
func transformResponseError(err error, resp *genai.GenerateContentResponse) error {
if err != nil {
return err
}
for _, c := range resp.Candidates {
if c.FinishReason != genai.FinishReasonStop {
return &genai.BlockedError{
Candidate: c,
}
}
}
return nil
}
func generateResponseStream(iter *genai.GenerateContentResponseIterator, sb *strings.Builder, writer http.ResponseWriter, chatID int) bool {
var (
messageID int
)
for {
resp, err := iter.Next()
err = transformResponseError(err, resp)
if err == iterator.Done {
return true
}
if err != nil {
if messageID == 0 {
sendMessageInline(writer, chatID, err.Error())
} else {
_ = updateMessage(chatID, messageID, "generate response error: "+err.Error())
}
return false
}
var text = string(resp.Candidates[0].Content.Parts[0].(genai.Text))
sb.WriteString(text)
if messageID == 0 {
messageID, err = sendMessage(chatID, sb.String())
if err != nil {
sendMessageInline(writer, chatID, err.Error())
return false
}
} else {
err = updateMessage(chatID, messageID, sb.String())
if err != nil {
return false
}
}
}
}