转载自The Verge: “DeepMind says its new AI coding engine is as good as an average human programmer”
转载自The Verge:
“DeepMind says its new AI coding engine is as good as an average human programmer”
@Alex Castro / The Verge
引言
Introduction
DeepMind创建了一个名为AlphaCode的AI系统,该系统可以编写相当有竞争力的计算机程序。Alphabet子公司针对人类编程竞赛中使用的编程挑战测试了该系统,发现其程序在人类程序员中达到了前54%的排名。DeepMind表示,这一结果是AI编程向前迈出的重要一步,尽管AlphaCode的技能并不能完全覆盖一名普通程序员所面临的所有编程任务。
DeepMind has created an AI system named AlphaCode that it says “writes computer programs at a competitive level.” The Alphabet subsidiary tested its system against coding challenges used in human competitions and found that its program achieved an “estimated rank” placing it within the top 54 percent of human coders. The result is a significant step forward for autonomous coding, says DeepMind, though AlphaCode’s skills are not necessarily representative of the sort of programming tasks faced by the average coder.
DeepMind的首席研究科学家Oriol Vinyals通过电子邮件告诉The Verge,该研究仍处于早期阶段,但更接近于创建一个灵活的解决问题的AI: 一个可以自主解决目前只有人类领域的编程挑战的程序。"从长远来看,我们对AlphaCode帮助程序员和非程序员编写代码,提高生产力或创造新软件的潜力感到兴奋。"Vinyals说。
Oriol Vinyals, principal research scientist at DeepMind, told The Verge over email that the research was still in the early stages but that the results brought the company closer to creating a flexible problem-solving AI — a program that can autonomously tackle coding challenges that are currently the domain of humans only. “In the longer-term, we’re excited by [AlphaCode’s] potential for helping programmers and non-programmers write code, improving productivity or creating new ways of making software,” said Vinyals.
1. AlphaCode将被用来创建编程助手,并会在某一天能够编写自己的软件
AlphaCode could be used to create coding assistants, and one day write its own software
AlphaCode针对Codeforces策划的挑战进行了测试,Codeforces是一个竞争性的编程平台,为编程人员提供每周挑战和挑战排名,类似于国际象棋中使用的Elo评级系统。这些挑战不同于程序员在制作商业应用程序时可能面临的任务: 它们更加独立,需要对计算机科学中的算法和理论概念有更广泛的了解。我们可以将这些挑战视作结合了逻辑,数学和编程等专业性知识的复合难题。
AlphaCode was tested against challenges curated by Codeforces, a competitive coding platform that shares weekly problems and issues rankings for coders similar to the Elo rating system used in chess. These challenges are different from the sort of tasks a coder might face while making, say, a commercial app. They’re more self-contained and require a wider knowledge of both algorithms and theoretical concepts in computer science. Think of them as very specialized puzzles that combine logic, maths, and coding expertise.
在 AlphaCode 测试的一个示例挑战中,参赛者被编写一段代码: 使用一组有限的命令将一串随机重复的 s 和 t 字母转换为另一个相同字母的字符串。例如,竞争对手不能键入新字母,而必须使用"退格"命令来删除原始字符串中的几个字母。您可以在下面阅读有关挑战的完整描述:
In one example challenge that AlphaCode was tested on, competitors are asked to find a way to convert one string of random, repeated s and t letters into another string of the same letters using a limited set of inputs. Competitors cannot, for example, just type new letters but instead have to use a “backspace” command that deletes several letters in the original string. You can read a full description of the challenge below:
一个名为"Backspace"的示例挑战,用于评估DeepMind的程序。该挑战为中等难度,图中左侧为问题描述,右侧为问题示例。
An example challenge titled “Backspace” that was used to evaluate DeepMind’s program. The problem is of medium difficulty, with the left side showing the problem description, and the right side showing example test cases. Image: DeepMind / Codeforces
共有十个挑战被输入到AlphaCode中(输入的内容与人类程序员所收到的挑战文本一模一样)。然后,AlphaCode生成了大量可行的答案,并通过运行代码和检查输出的方式来筛选这些答案,就像人类参赛者一样。"整个过程是自动的,没有人为筛选。"AlphaCode论文的联合负责人Yujia Li和David Choi通过电子邮件告诉The Verge。
Ten of these challenges were fed into AlphaCode in exactly the same format they’re given to humans. AlphaCode then generated a larger number of possible answers and winnowed these down by running the code and checking the output just as a human competitor might. “The whole process is automatic, without human selection of the best samples,” Yujia Li and David Choi, co-leads of the AlphaCode paper, told The Verge over email.
AlphaCode在Codeforces网站上测试了已被5000名用户解决的10个挑战。平均而言,它排在了前54.3%。DeepMind算出,这10个挑战给AlphaCode带来了1238的Codeforces Elo得分,这使其在六个月内于网站上参与过竞赛的用户中达到了前28%的总排名。
AlphaCode was tested on 10 of challenges that had been tackled by 5,000 users on the Codeforces site. On average, it ranked within the top 54.3 percent of responses, and DeepMind estimates that this gives the system a Codeforces Elo of 1238, which places it within the top 28 percent of users who have competed on the site in the last six months.
"我可以肯定地说,AlphaCode的结果超出了我的预期,"Codeforces创始人Mike Mirzayanov在DeepMind分享的一份声明中说。"我曾对这一结果持怀疑态度,因为即便在最简单的竞赛挑战中,我们不仅需要实现算法,还需要来发明它,而发明是最困难的部分。但AlphaCode的表现好似一个前途无量的新人程序员。"
“I can safely say the results of AlphaCode exceeded my expectations,” Codeforces founder Mike Mirzayanov said in a statement shared by DeepMind. “I was sceptical [sic] because even in simple competitive problems it is often required not only to implement the algorithm, but also (and this is the most difficult part) to invent it. AlphaCode managed to perform at the level of a promising new competitor.”
AlphaCode 解决编程挑战的示例界面。与发给人类程序员相同的挑战文本被输入到左侧,AlphaCode在右侧生成了解决方案
An example interface of AlphaCode tackling a coding challenge. The input is given as it is to humans on the left and the output generated on the right. Image @DeepMind
DeepMind指出,AlphaCode目前的技能仅适用于竞赛编程领域,但它的能力为创建未来的工具打开了大门,这些工具将使编程技术更容易被人们利用,并且有朝一日完全自动化。
DeepMind notes that AlphaCode’s current skill set is only currently applicable within the domain of competitive programming but that its abilities open the door to creating future tools that make programming more accessible and one day fully automated.
许多其他公司也在开发类似的应用程序。例如,微软和人工智能实验室OpenAI已经调整了OpenAI的语言生成程序GPT-3,以达成代码字符串的自动编程。(与GPT-3一样,AlphaCode也基于称为transper的AI架构,该架构特别擅长解析顺序文本,包括自然语言和代码)。对于终端用户来说,这些系统的工作方式就像Gmail的智能撰写功能一样: 他们会为您正在编写的任何内容提出建议。
Many other companies are working on similar applications. For example, Microsoft and the AI lab OpenAI have adapted the latter’s language-generating program GPT-3 to function as an autocomplete program that finishes strings of code. (Like GPT-3, AlphaCode is also based on an AI architecture known as a transformer, which is particularly adept at parsing sequential text, both natural language and code). For the end user, these systems work just like Gmails’ Smart Compose feature — suggesting ways to finish whatever you’re writing.
2. AI系统因产生有缺陷,易受攻击的代码而饱受争议
AI systems have been criticized for producing buggy, vulnerable code
近年来,AI编程系统的开发取得了很大进展,但这些系统远未准备好去接管人类程序员的工作。它们生成的代码通常有缺陷,并且由于经常在公共代码库上进行训练,它们有时会挪用受版权保护的材料。
A lot of progress has been made developing AI coding systems in recent years, but these systems are far from ready to just take over the work of human programmers. The code they produce is often buggy, and because the systems are usually trained on libraries of public code, they sometimes reproduce material that is copyrighted.
在一项关于代码存储库GitHub开发的名为Copilot的AI编程工具的研究中,研究人员发现,其输出的代码中约有40%包含安全漏洞。安全分析师甚至猜想,意图不端者可能故意编写并共享了隐藏的后门代码,这些后门代码可能会误导AI将一些错误写入后续的编程中。
In one study of an AI programming tool named Copilot developed by code repository GitHub, researchers found that around 40 percent of its output contained security vulnerabilities. Security analysts have even suggested that bad actors could intentionally write and share code with hidden backdoors online, which then might be used to train AI programs that would insert these errors into future programs.
这样的阻碍意味着人工智能编码系统只能以助手身份缓慢地融入程序员的工作中。在被证明有足够能力单独完成任务之前,AI程序员给出的建议仍然是可疑的。换句话说:AI现在仍然处于学徒阶段。但到目前为止,这些AI程序正在快速学习。
Challenges like these mean that AI coding systems will likely be integrated slowly into the work of programmers — starting as assistants whose suggestions are treated with suspicion before they are trusted to carry out work on their own. In other words: they have an apprenticeship to carry out. But so far, these programs are learning fast.
不感兴趣
看过了
取消
人点赞
人收藏
打赏
不感兴趣
看过了
取消
您已认证成功,可享专属会员优惠,买1年送3个月!
开通会员,资料、课程、直播、报告等海量内容免费看!
打赏金额
认可我就打赏我~
1元 5元 10元 20元 50元 其它打赏作者
认可我就打赏我~
扫描二维码
立即打赏给Ta吧!
温馨提示:仅支持微信支付!
已收到您的咨询诉求 我们会尽快联系您