Writing a Chatbot Powered by Google Gemini Part II

note

This article was last updated on February 26, 2024, 5 months ago. The content may be out of date.

We introduced the basic operations of a Telegram bots and how to create, configure a Telegram bot and how to use it to interact with users. This time, we talk about how to use Gemini to generate responses.

Quick Start

We can test Gemini api directly using the official website:

We can modify some settings on the run:

Model: which model we’re using. Different models are suitable for different tasks.
Temperature: Controls the balance between creativity and focus in the generated output. Higher temperature leads to more adventurous output and vice versa.
Stop Sequence: Generation will stop when specified output is generated. Useful for controlling output length and creating a clear ending.
Safety Settings: Adjusts the restriction regarding harmful and explicit content.
Top K: Controls the tokens at each step of the generation. Only top k words are considered.
Top P: Only considers tokens whose cumulative probability falls within a certain threshold.

Integration with Telegram

We can refer to the quickstart to familiarize ourselves with the api. As said before, we’re only interested in handling text and image messages.

For a simple chatbot, we want the bot to be able to respond to different types of messages automatically.

Handling Message Types

There are several types of messages in Telegram, judging from the optional fields of the message type.

For text messages, it’s simple, text is not empty in this case.

For image messages, it’s complicated due to the way Telegram handles images:

In any message, there can be at most one image with different sizes that we can retrieve. There is an optional caption. If we send multiple images, it will be sent as several messages with one image each. The messages we receive will have a media_group_id that indicates which group the image belongs to.

tip

There are two different ways to handle messages with multiple images. We can use a timeout or let the user confirm all images are sent. For simplicity, we’ll let users confirm all images are sent.

In this case, we can use InlineKeyboardMarkup to display a button to let users press to confirm. callback_data will contain the information to let us know which media group is completely uploaded.

We’ll need to save the ids of these images for later use.

Handling Messages

When we receive a message, we’ll first need to find out who send this message to handle it properly.

For text and image messages, id of from field indicates the user id.
For callback data when user confirms all images are uploaded, it’s the id of from of callback_query field.

type From struct {
	Id           int    `json:"id"`
	IsBot        bool   `json:"is_bot"`
	FirstName    string `json:"first_name"`
	LastName     string `json:"last_name"`
	LanguageCode string `json:"language_code"`
}

type Message struct {
	MessageId int  `json:"message_id"`
	From      From `json:"from"`
	Chat      struct {
		Id        int    `json:"id"`
		FirstName string `json:"first_name"`
		LastName  string `json:"last_name"`
		Type      string `json:"type"`
	} `json:"chat"`
	Date     int    `json:"date"`
	Text     string `json:"text"`
	Entities []struct {
		Offset int    `json:"offset"`
		Length int    `json:"length"`
		Type   string `json:"type"`
	} `json:"entities"`
	MediaGroupId string `json:"media_group_id"`
	Photo        []struct {
		FileId       string `json:"file_id"`
		FileUniqueId string `json:"file_unique_id"`
		FileSize     int    `json:"file_size"`
		Width        int    `json:"width"`
		Height       int    `json:"height"`
	} `json:"photo"`
	Caption string `json:"caption"`
}

type CallbackQuery struct {
	Id   string `json:"id"`
	From From   `json:"from"`
	Data string `json:"data"`
}

type Update struct {
	UpdateId      int           `json:"update_id"`
	Message       Message       `json:"message"`
	CallbackQuery CallbackQuery `json:"callback_query"`
}

func extractFrom(update Update) int {
	if update.Message.From.Id != 0 {
		return update.Message.From.Id
	}
	return update.CallbackQuery.From.Id
}

Then we’ll handle these messages accordingly.

Text Messages

We assume this type of messages is part of a chat. We’ll retrieve chat history for this user and generate the response. If a response is successfully generated, the text and the response are appended to the chat history. We use the model gemini-pro.

flowchart TD Message --> B[Extract User ID] ---> C[Retrieve Chat History] ---> D{Generate Response} D ---> |Successful|E[Append History] D ---> |Unsuccessful|F[Display Error Message]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var (
    historyKey = "chat_history:" + strconv.Itoa(fromID)
    model      = chatClient.GenerativeModel("gemini-pro")
    cs         = model.StartChat()
    history    = rdb.LRange(historyKey, -32, -1).Val()
    ctx        = request.Context()
)
model.SafetySettings = contentSafetySettings
for _, entry := range history {
    k, v, _ := strings.Cut(entry, ":")
    cs.History = append(cs.History, &genai.Content{
        Parts: []genai.Part{genai.Text(v)},
        Role:  k,
    })
}

var (
    iter = cs.SendMessageStream(ctx, genai.Text(update.Message.Text))
    sb   strings.Builder
)
if generateResponseStream(iter, &sb, writer, update.Message.Chat.Id) {
    pipe := rdb.Pipeline()
    pipe.RPush(historyKey, "user:"+update.Message.Text, "model:"+sb.String())
    pipe.LTrim(historyKey, -32, -1)
    pipe.Expire(historyKey, timeout)
    _, _ = pipe.Exec()
}

note

Gemini has a limit on the number of input tokens. If a chat is too long, the model will not generate any more outputs. We need to discard some of the histories in this case.

tip

We’re using redis to store chat history. Note that we can build chat history manually.

Messages with One Image

Because Gemini doesn’t support multi-turn conversations involving images yet, we just generate the response and forward it to the user. We use the model gemini-pro-vision.

var parts []genai.Part
if update.Message.Caption != "" {
    parts = append(parts, genai.Text(update.Message.Caption))
}

var ok bool
parts, ok = appendImagePart(writer, update.Message.Chat.Id, parts, update.Message.Photo[len(update.Message.Photo)-1].FileId)
if !ok {
    return
}

var (
    model = chatClient.GenerativeModel("gemini-pro-vision")
    ctx   = request.Context()
    iter  = model.GenerateContentStream(ctx, parts...)
    sb    strings.Builder
)
model.SafetySettings = contentSafetySettings

generateResponseStream(iter, &sb, writer, update.Message.Chat.Id)

note

Telegram only sends the file id of the image. To download the image, we need to first get the link of the image, then download it.

type GetFileResponse struct {
	Ok     bool `json:"ok"`
	Result struct {
		FileId       string `json:"file_id"`
		FileUniqueId string `json:"file_unique_id"`
		FileSize     int    `json:"file_size"`
		FilePath     string `json:"file_path"`
	} `json:"result"`
}

func getFileLink(fileID string) (string, error) {
	resp, err := http.PostForm("https://api.telegram.org/bot"+botToken+"/getFile", url.Values{
		"file_id": []string{fileID},
	})
	if err != nil {
		return "", err
	}

	var r GetFileResponse
	err = json.NewDecoder(resp.Body).Decode(&r)
	_ = resp.Body.Close()
	return "https://api.telegram.org/file/bot" + botToken + "/" + r.Result.FilePath, err
}

Messages Belonging to a Media Group

As said before, messages with images are not saved in the chat history. We have to wait for the user to confirm all the images are uploaded. Meanwhile, the image id from each message will be saved.

if update.Message.MediaGroupId != "" {
    var (
        mediaGroupKey     = "media_group:" + update.Message.MediaGroupId
        mediaGroupOnceKey = "media_group_once:" + update.Message.MediaGroupId
        once              = rdb.Incr(mediaGroupOnceKey).Val() == 1
        pipe              = rdb.Pipeline()
        values            []any
    )
    if once {
        pipe.LPush(mediaGroupKey, update.Message.Chat.Id)
        sendMessageInlineWithInlineKeyboard(writer, update.Message.Chat.Id, "press done when all photos are uploaded", "{\"inline_keyboard\":[[{\"text\":\"done\",\"callback_data\":\""+mediaGroupKey+"\"}]]}")
    }
    if update.Message.Caption != "" {
        values = append(values, "caption:"+update.Message.Caption)
    }
    values = append(values, "file_id:"+update.Message.Photo[len(update.Message.Photo)-1].FileId)
    pipe.RPush(mediaGroupKey, values...)
    pipe.Expire(mediaGroupOnceKey, timeout)
    pipe.Expire(mediaGroupKey, timeout)
    _, _ = pipe.Exec()
    return
}

note

We’ll send a prompt telling users should press the button after all images are uploaded. This prompt will only be sent once. INCR is very useful here.

Callback Messages

After a user has confirmed all images inside a media group are uploaded, we can then generate a response using saved images ids for the media group.

var (
    parts        []genai.Part
    entries      = rdb.LRange(update.CallbackQuery.Data, 0, -1).Val()
    chatID, _    = strconv.Atoi(entries[0])
    captionCount int
    captionIdx   int
)
for i, entry := range entries[1:] {
    prefix, val, _ := strings.Cut(entry, ":")
    switch prefix {
    case "caption":
        parts = append(parts, genai.Text(val))
        captionCount++
        captionIdx = i
    case "file_id":
        var ok bool
        parts, ok = appendImagePart(writer, chatID, parts, val)
        if !ok {
            return
        }
    }
}
if captionCount == 1 {
    caption := parts[captionIdx]
    copy(parts[1:captionIdx+1], parts[:captionIdx])
    parts[0] = caption
}

var (
    model = chatClient.GenerativeModel("gemini-pro-vision")
    ctx   = request.Context()
    iter  = model.GenerateContentStream(ctx, parts...)
    sb    strings.Builder
)
model.SafetySettings = contentSafetySettings

if generateResponseStream(iter, &sb, writer, chatID) {
    answerCallbackQueryInline(writer, update.CallbackQuery.Id)
}

Generating a Response

To improve interactions, we use streaming APIs. Both the chat and one shot image text generation share the same iterator.

func transformResponseError(err error, resp *genai.GenerateContentResponse) error {
	if err != nil {
		return err
	}

	for _, c := range resp.Candidates {
		if c.FinishReason != genai.FinishReasonStop {
			return &genai.BlockedError{
				Candidate: c,
			}
		}
	}
	return nil
}

func generateResponseStream(iter *genai.GenerateContentResponseIterator, sb *strings.Builder, writer http.ResponseWriter, chatID int) bool {
	var (
		messageID int
	)
	for {
		resp, err := iter.Next()
		err = transformResponseError(err, resp)
		if err == iterator.Done {
			return true
		}

		if err != nil {
			if messageID == 0 {
				sendMessageInline(writer, chatID, err.Error())
			} else {
				_ = updateMessage(chatID, messageID, "generate response error: "+err.Error())
			}
			return false
		}

		var text = string(resp.Candidates[0].Content.Parts[0].(genai.Text))
		sb.WriteString(text)
		if messageID == 0 {
			messageID, err = sendMessage(chatID, sb.String())
			if err != nil {
				sendMessageInline(writer, chatID, err.Error())
				return false
			}
		} else {
			err = updateMessage(chatID, messageID, sb.String())
			if err != nil {
				return false
			}
		}
	}
}

Quick Start#

Integration with Telegram#

Handling Message Types#

Handling Messages#

Text Messages#

Messages with One Image#

Messages Belonging to a Media Group#

Callback Messages#

Generating a Response#

Quick Start

Integration with Telegram

Handling Message Types

Handling Messages

Text Messages

Messages with One Image

Messages Belonging to a Media Group

Callback Messages

Generating a Response