准备:
- OpenAI Key
- api代理域名(cloudflare搭建)
- 百度飞桨AIStudio账号
- 准备翻译的Word文件
和ChatGPT私聊获得代码
import requests
from docx import Document
from docx.shared import Pt
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
def chatgpt(prompt):
h = {
'Content-Type': 'application/json',
'Authorization': 'Bearer Openai密钥'
}
d = {
"model": "text-davinci-003",
"prompt": prompt,
"max_tokens": 1000,
"temperature": 0
}
u = 'https://网址/v1/completions'
r = requests.post(url=u, headers=h, json=d, verify=True, timeout=30).json()
if 'choices' in r:
return r['choices'][0]['text']
# Read the Word document
doc = Document('/home/aistudio/[EN]利用数据共享工具实现循环经济:以产品数字护照为例.pdf.docx')
# Create a new Word document to save the translated results
new_doc = Document()
# Create a ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=20)
# Process paragraphs
prompts = [f'Translate the following English text to Chinese: {paragraph.text}' for paragraph in doc.paragraphs]
for prompt, translation in tqdm(zip(prompts, executor.map(chatgpt, prompts)), total=len(prompts)):
# Add the original text and translation to a new paragraph in the new document, and set paragraph spacing
new_para = new_doc.add_paragraph()
new_para.add_run(prompt.replace('Translate the following English text to Chinese: ', ''))
new_para.space_after = Pt(12)
new_para = new_doc.add_paragraph()
new_para.add_run(translation)
new_para.space_after = Pt(12)
# Process tables
prompts = [f'Translate the following English text to Chinese: {cell.text}' for table in doc.tables for row in table.rows for cell in row.cells]
for prompt, translation in tqdm(zip(prompts, executor.map(chatgpt, prompts)), total=len(prompts)):
# Add the original text and translation to a new table in the new document
new_cell = new_doc.add_table(1, 1).cell(0, 0)
new_cell.text = prompt.replace('Translate the following English text to Chinese: ', '')
new_cell = new_doc.add_table(1, 1).cell(0, 0)
new_cell.text = translation
# Save the new document
new_doc.save('[EN]利用数据共享工具实现循环经济:以产品数字护照为例.docx')
在百度飞桨的notebook运行
效果和过程见链接https://aistudio.baidu.com/projectdetail/6708613
得到结果后,还需要调整文本格式,并清理掉AI的胡言乱语。
PS
日常用的话,推荐效果更好的“沉浸式翻译”
本来想继续调试,调用百度文心一言的api(此处使用Dify简化后的API),但千帆平台目前给个人注册用户的QPS只有5,翻译速度很慢,报工单给了客服,说不通过商务经理的话目前只能反馈给项目部。虽然百度在开放api付费token方面已经做的挺好了,但和烧钱扩张的Openai比还是有差距。飞桨Studio虽然不错,但和GoogleColab比起来,在网络环境配置和平台开放性上还是有欠缺。
吐槽结束。。。