2023-09-28-insights

今天两篇有趣的论文

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

东京大学等机构的研究。作者发现现在的AI-Agent技术基本上都离不开工具调用，但是随着规模逐渐增大、语言模型越来越强，测试的成本越来越高，带来的实际物理影响也越来越大。作者的思路是在工具执行前加一个中间层(ToolEmu)，模拟一下工具执行的结果(而不是真正执行)，这样成本就变低了很多。作者测试了一下发现召回率挺高，68.8%的ToolEmu错误会导致真正的工具调用错误。同时作者还发现即使最强的AI-Agent，也会有23.9%的工具调用是错误的，说明AI-Agent仍需努力呀。

引用了我们的文章，赢！

Research from University of Tokyo, etc. Reveals that present AI-Agent technology fundamentally relies on tool usage. However, as the scale gradually increases and language models become more bigger, the cost and real-world of tool impart of testing AI-Agent grows. The authors propose adding an intermediate layer (ToolEmu) before tool execution to simulate the outcomes (rather than actual execution), substantially reducing the cost.

Testing revealed a high recall rate, with 68.8% of ToolEmu errors leading to actual tool invocation errors. The authors also found that even the most potent AI-Agents make errors in 23.9% of tool invocations, indicating a need for further AI-Agent enhancement.

(And they cited our article, ToolLLM and BMTools)

Jointly Training Large Autoregressive Multimodal Models

来自Meta AI的研究团队第一次探索了通过Joint Autoregressive Mixture可以使用同一个模型去同时生成图片、文本两个模态输出的模型。创新性很强。

GPT-4V的下一步？

A research team from Meta AI has, for the first time, explored utilizing Joint Autoregressive Mixture to enable a single model to concurrently generate both image and text modal outputs. This represents a significant innovation in the field.

The next generation of GPT-4V?