GPT4ALLと日本語で会話したい

2023年4月11日

gpt4allとの会話の画像 > あなたについて教えてください。 Please tell me about you. 私はAI言語モデルであり、機械学習習やディープニューラルネットワークなどの自然言語処理(NLP)技術を通じて、ユーザから与えられたプロンプトに基づいてテキストを生成することが目的です。> どんなことができますか？例を示してください。 What can I do? Please give me an example. 私は言語モデルであり、私の能力は使用のコンテキストに依存していますが、一般的には、「お気に入りの色を説明してください」や「2+3は何ですか?」などのプロンプトに基づいてテキストを生成するなど、さまざまなタスクを実行するように私に尋ねるか、指示することができます特定のパターンに従う特定の文を生成するように指示することもできます。

とおもったら、すでにやってくれている方がいた。

【Python】DeepL API を介して GPT4All と日本語で対話する│株式会社シーポイントラボ

　題名の通りです。DeepL API による翻訳を用いて、オープンソースのチャットAIである GPT4All と対話 ...

https://cpoint-lab.co.jp/article/202304/23883/

DeepL APIなどもっていないので、FuguMTをつかうことにした。これで、LLMが完全ローカル、それも日本語でうごくぞ…。

import signal
import subprocess
from time import sleep
 
import re

from transformers import pipeline

from prompt_toolkit import prompt
from prompt_toolkit.formatted_text import HTML
from prompt_toolkit.key_binding import KeyBindings
from prompt_toolkit.keys import Keys

kb = KeyBindings()

@kb.add(Keys.Tab)
def _(event):
　　event.app.exit(result=event.app.current_buffer.text)

# テキストの翻訳を行う関数
def translate(text: str, input_lang='en') -> str:
　　text = remove_ansi_escape_sequence(text)
　　if text == '' or text is None:
　　　　return ''
　　if input_lang=='en':
　　　　translation = ej_translator(text.lstrip(">"))
　　else:
　　　　translation = je_translator(text.lstrip(">"))
　　if translation == '' or translation is None:
　　　　return text
　　return translation
 
 
# ANSIエスケープシーケンスを削除する関数
def remove_ansi_escape_sequence(text: str) -> str:
　　return re \
　　　　.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') \
　　　　.sub('', text)


#複数行受け取り
def prompt_continuation(width, line_number, wrap_count):
　　"""
　　The continuation: display line numbers and '->' before soft wraps.
　　Notice that we can return any kind of formatted text from here.
　　The prompt continuation doesn't have to be the same width as the prompt
　　which is displayed before the first line, but in this example we choose to
　　align them. The `width` input that we receive here represents the width of
　　the prompt.
　　"""
　　if wrap_count > 0:
　　　　return " " * (width - 3) + "-> "
　　else:
　　　　text = ("- %i - " % (line_number + 1)).rjust(width)
　　　　return HTML("<strong>%s</strong>") % text


ej_translator = pipeline("translation", model="staka/fugumt-en-ja")
je_translator = pipeline("translation", model="staka/fugumt-ja-en")

if __name__ == '__main__':
　　# 対話するプログラムを起動する
　　process = subprocess.Popen(
　　　　[r'C:\xxxx\gpt4all\chat\gpt4all-lora-quantized-win64.exe'],
　　　　stdin=subprocess.PIPE,
　　　　stdout=subprocess.PIPE,
　　　　cwd=r'C:\xxxx\gpt4all\chat'
　　)
 
 
　　# プロセスをクリーンアップする関数
　　# Python終了時にこの関数からプロセスをキルすることで、メモリやCPUをがっつり食べるプロセスが残るのを防ぐ
　　def cleanup():
　　　　process.terminate()
　　　　sleep(5)
 
 
　　signal.signal(signal.SIGTERM, cleanup)
　　init_load = True
　　try:
　　　　output = ''
　　　　while True:
　　　　　　# プログラムからの出力を読めるだけ読む
　　　　　　output += process.stdout.readline().decode('utf-8')
　　　　　　if output == '' and process.poll() is not None:
　　　　　　　　break
　　　　　　# 起動時のメッセージはフィルタリングしない
　　　　　　if init_load:
　　　　　　　　print(output)
　　　　　　　　init_load = False
　　　　　　else:
　　　　　　　　# 出力をフィルタリングする
　　　　　　　　filtered_output = translate(output, input_lang='en')
　　　　　　　　filtered_output = str(filtered_output).replace("[{'translation_text': '", "").replace("'}]", "")
　　　　　　　　# フィルタリングされた出力を表示
　　　　　　　　print(filtered_output)
　　　　　　output = ''
 
　　　　　　# ユーザー入力を読み取る
　　　　　　user_input = prompt(
　　　　'> ', multiline=True, prompt_continuation=prompt_continuation, key_bindings=kb
　　)

　　　　　　user_input = translate(user_input, input_lang='ja')
　　　　　　user_input = re.sub(r"\[{'translation_text': ['\"]", "", str(user_input))
　　　　　　user_input = re.sub(r"['\"]}]", "", user_input)
　　　　　　#str(user_input).re.sub("[{'translation_text': ['\"]", "").re.sub("['\"]}]", "")
　　　　　　print(user_input)
　　　　　　# 入力をプログラムに渡す
　　　　　　process.stdin.write((user_input + '\n').encode('utf-8'))
　　　　　　process.stdin.flush()
　　finally:
　　　　# プログラムの終了
　　　　signal.signal(signal.SIGTERM, signal.SIG_IGN)
　　　　signal.signal(signal.SIGINT, signal.SIG_IGN)
　　　　cleanup()
　　　　signal.signal(signal.SIGTERM, signal.SIG_DFL)
　　　　signal.signal(signal.SIGINT, signal.SIG_DFL)

複数行入力も対応した。タブキーで決定する。プログラムはchatGPTに雑に編集させたので、なんかおかしいとこあるかも。