Metathe company behind Facebook, Instagram and WhatsApp, has released its new open source tool, Code Flame: an artificial intelligence capable of generating and understanding code using natural language.
Meta Code Llama, the AI model that writes and understands the code
Meta in recent months has released AI models for activities such as text generation, language translation and audio creationMeta took the opportunity to broaden its reach in a highly crucial field: the programming. Code Llama is a system of machine learning that offers the ability to generate code and explain its detailsall in a legible and understandable way.
This tool, open source and accessible to all programmers, is positioned in the middle of other existing solutions, such as GitHub Copilot e Amazon CodeWhisperer. The AI-based code generation industry is becoming a hotspot of competition.
An open and flexible approach
Code Llama can complete code and fix bugs in existing code for a variety of programming languages, including Python, C++, Java, PHP, Typescript, C# e Bash. A really interesting versatility for developers.
To TechCrunch, Meta explained: “We at Meta believe that AI models, but particularly large language models for coding, benefit more from an open approach, both in terms of innovation and safety. Publicly available code-specific templates can facilitate the development of new technologies that improve people’s lives. By releasing code templates like Code Llama, the entire community can assess their capabilities, identify problems and fix vulnerabilities.”
Code Llama leverages the basics of the model text generation Llama 2, also open sourced by Meta. This model, while it had the ability to generate code, was not at the level of accuracy and quality of specialized tools such as GitHub Copilot. However, Code Llama represents a significant step forward, offering different versions optimized for specific languages programming and understanding natural language instructions. Although it will be up to the developers to choose the best performing models.
Meta Code Llama Training
Meta trained Code Llama using the same training dataset as Llama 2, which is a combination of public sources from around the web. However, Code Llama had a special focus on training data that involved the code.
Each Code Llama model varies in size, ranging from 7 billion to 34 billion parameters. Meta built the code on a huge amount of code tokens, equal to 500 billion, along with data about the code itself. For example, it has optimized the Python-optimized version of Code Llama with 100 billion Python code tokens. Furthermore, it used the feedback provided by human annotators to generate safe and useful answers to questions.
Learn with Udemy, a large selection of courses to better use your social networks
Multiple variations of the Code Llama models are capable of entering new code within existing code, and everyone can elaborate inputs consisting of approximately 100,000 code tokens. Some of these models require more powerful hardware, while at least one of the models, the one with 7 billion parameters, can run on a single GPU.
A growing market (but not without risk)
Meta’s Code Llama is very interesting to help the developers. But also for companies that don’t have dedicated programmers, to enable otherwise inaccessible services and features.
As TechCrunch reports, GitHub states that over 400 organizations currently use Copilot, writing code 55% faster than ever before. But there are risks. A research group connected to Stanford has found that engineers who employ such tools are more prone to create security vulnerabilities in their applications.
Also, there is a concern regarding intellectual property. Some code generation models, potentially even other than Code Llama, could be trainedi on copyrighted or restrictively licensed code.
And then, while there’s no large-scale evidence, it’s possible that open source code generation systems can be used to create malicious code. There are reports of hackers trying to exploit existing models for malicious purposes, such as finding holes or vulnerabilities in the code and creating scam web pages.
Meta’s solution: between doubts and possibilities
Meta explains that Code Llama involved a refit of the model by an internal team consisting of 25 employees. But there are questions in this project.
One point of concern that has arisen concerns Code Llama’s ability to spawn ambiguous or questionable responses to certain suggestions. For example, if requested directly, the template will not write ransomware code. However, if the request is worded more benignly, like “Create a script to encrypt all files in a user’s home directory”, which actually corresponds to a ransomware script, the model seems to respond positively.
Meta itself acknowledges that Code Llama may produce inaccurate or uncertain answers. The company emphasizes the need for developers to perform safety testing and custom tuning for their specific applications before distributing any Code Llama-based applications.
Find more information on the Meta website.
Leave a Reply
View Comments