{"id":203,"date":"2024-04-13T18:36:37","date_gmt":"2024-04-13T18:36:37","guid":{"rendered":"https:\/\/responsiblepraxis.ai\/?page_id=203"},"modified":"2024-04-13T18:36:53","modified_gmt":"2024-04-13T18:36:53","slug":"anisa-habib-github-copilot","status":"publish","type":"page","link":"https:\/\/responsiblepraxis.ai\/?page_id=203","title":{"rendered":"Anisa Habib &#8211; GitHub Copilot"},"content":{"rendered":"\n<p>1<br>Anisa Habib<br>Honors 3700<br>7 Dec 2023<br>GitHub Copilot: AI Audit<br>GitHub Copilot is a cloud-based artificial intelligence tool developed by Microsoft<br>subsidiary GitHub and OpenAI that aims to assist users by autocompleting code. Released on 29<br>June 2021, it is currently available to individual users or businesses that purchase a subscription.<br>It is possible to add Copilot as an extension in Visual Studio Code, Visual Studio, Vim, Neovim,<br>Azure Data Studio, and JetBrains integrated development environments (IDEs). Copilot\u2019s<br>primary objective is to serve as an AI pair programmer and help programmers code faster<br>through suggesting code completion and writing code in response to natural language prompts.<br>The OpenAI Codex, a descendant of GPT-3, is the model that powers GitHub Copilot.<br>According to the model\u2019s official description, \u201cits training data contains both natural language<br>and billions of lines of source code from publicly available sources, including code in public<br>GitHub repositories\u201d (Zaremba et al., 2021). While OpenAI Codex is most capable in Python, it<br>is also proficient in over a dozen other programming languages. As of 2021, it has a memory of<br>14KB for Python code. Compared to GPT-3\u2019s 4KB memory, GitHub Copilot can take into<br>account over 3 times as much contextual information while performing any task (Zaremba et al.,<br>2021). The authors of OpenAI Codex argue that computer programming can be thought of as<br>having two major tasks: breaking a problem down into simpler problems and mapping these<br>problems to existing code libraries, APIs, or functions (Zaremba et al., 2021). GitHub Copilot<br>and OpenAI Codex aim to alleviate the barrier that is this second task through analyzing the<br>context of the user\u2019s input and suggesting code.<br>Drawing context from the user\u2019s private code and comments, the algorithm suggests line<br>completion or even entire blocks of code to the user. The tool allows the user to manually edit<br>2<br>and cycle through alternative suggestions, autofill repetitive code, and create unit tests. This all<br>contributes to GitHub\u2019s goal of allowing developers to \u201cquickly discover alternative ways to<br>solve problems, write tests, and explore APIs without having to tediously tailor a search for<br>answers\u2026 across the internet\u201d (Friedman, 2021). Some small studies from GitHub report that<br>professional developers using Copilot take 55% less time to complete programming tasks than<br>those that do not (Kalliamvakou, 2022). The official webpage also reports that over 37,000<br>companies have adopted GitHub Copilot and that 55% of developers prefer the technology<br>according to a StackOverflow 2023 survey.<br>While the technology is promising, the quality and security of anything developed in part<br>by an AI tool are important aspects to consider. Since Copilot\u2019s release, there have been many<br>discussions concerning its security and impact on education. One study reports to have<br>successfully \u201cextracted 2,702 hard-coded credentials from Copilot and 129 secrets from<br>CodeWhisper under the black-box setting, among which at least 3.6% and 5.4% secrets are real<br>strings from GitHub repositories\u201d (Huang et al., 2023). Researchers from the New York<br>University\u2019s Center for Cybersecurity found that Copilot \u201c generated code contained security<br>vulnerabilities about 40% of the time\u201d (Pearce et al., 2021). This means that it is possible to<br>successfully extract sensitive information such as account credentials through code generated<br>with Copilot, raising severe privacy and security concerns. Additionally, since Copilot is trained<br>with billions of lines of public code, this likely includes almost every popular introductory<br>university programming assignment (Claburn, 2022). While solutions to these assignments could<br>already be found online with little to some effort, educators could normally rely on \u201cplagiarism<br>checkers\u201d to determine if students are cheating. However, code generated with GitHub Copilot<br>\u201cactually generates novel solutions \u2026 that are superficially different enough that they plausibly<br>3<br>could have come from a student\u201d (Claburn, 2022). This now makes it more difficult for educators<br>and recruiters to catch cheating and determine the level of ability and understanding that students<br>have.<br>Similar to ChatGPT and other LLMs, many in the field consider Copilot as a valuable<br>tool that can increase productivity by autocompleting redundant tasks. However, others hold the<br>perspective that Copilot produces weak, insecure code and is a threat to the integrity of<br>educational and business environments. This audit of GitHub Copilot aims to focus on its quality<br>of product in terms of performance and security. Further, it attempts to explore the extent to<br>which its output may be considered plagiarism. In other words, what are the limits of GitHub<br>Copilot and how do those limits affect user privacy, security, and integrity?<br>Blue Sky Audit<br>In a world with unlimited time and resources, it would be ideal to conduct a thorough<br>analysis of bias and inaccuracies present in the training data of GitHub Copilot and OpenAI<br>Codex. Information on how this data was collected and any sensitive information it may include,<br>as well as evaluation metrics used to detect bias are key features of any algorithm (Diakopoulos,<br>2014). It is possible that training data was not efficiently filtered or tested against all possible<br>inaccuracies or bias. As the models were trained on both natural language and publicly available<br>code scraped from the internet, any human-generated bias and error present in the training<br>dataset could replicate itself in the model output. However, this would require a lot of time and<br>advanced data scraping skills.<br>In order to measure \u201cintegrity\u201d and quality of output, analyses on the general quality of<br>Copilot\u2019s suggestions are also required. To evaluate this, a feasible approach could be to prompt<br>4<br>the Copilot AI a relevant number of times with a variety of programming tasks and test its<br>accuracy and security. Researchers at Wuhan University conducted a study that analyzed 435<br>code snippets generated by Copilot from GitHub projects and used multiple security scanners to<br>identify vulnerabilities (Fu et al., 2023). A number of similar, small studies on code generated<br>with Copilot have been done in the past few years (Huang et. al, 2023; Pearce, 2021). These<br>studies mostly all focused on hard-coded secrets, i.e. embedded credentials or plain text<br>passwords and other sensitive information in source code. Further testing on the security and<br>accuracy of Copilot generated code would be beneficial. With unlimited time and resources, this<br>audit would aim to prompt Copilot with thousands of tasks that consist of generating secure code<br>and completing assignments at varying levels of difficulty. Through testing and analyzing the<br>resulting output, we could then gain a greater understanding of the extent to which using Copilot<br>affects the security and quality of a programmer\u2019s code.<br>Proof-of-Concept Audit<br>To conduct this audit I subscribed to a free trial of the Copilot Individual plan on a brand<br>new GitHub account and installed the Copilot extension on Visual Studio Code. When<br>subscribing to Copilot, GitHub prompts the user to select whether they would like Copilot to<br>allow suggestions from public code and whether they allow GitHub to use your code snippets for<br>product improvements. I selected to allow for public code suggestions, and did not allow for my<br>code to be used for future model training.<br>Due to a limited amount of time, skills, and resources, this audit analyzed a small set of<br>code generation prompts. My first task consisted of using Copilot to create a login page for a<br>simple PHP application. Through this, I aimed to analyze the security of the tool\u2019s code<br>5<br>suggestions for basic applications. For my second task, I asked Copilot to complete my solutions<br>for 15 different HackerRank problems. HackerRank is an online archive of programming<br>problems\u2013 these problems are similar to ones used in some university classes. Many companies<br>also use HackerRank as a service to assist with technical recruitment. Solutions to HackerRank<br>problems are tested for both accuracy and performance. Through tasking Copilot to complete<br>these problems, I aimed to determine the extent of Copilot\u2019s quality of output and whether it may<br>be used to violate programmer integrity.<br>Task 1: Secure Application Code Generation<br>PHP is an open source scripting language that can be used to write websites and any kind<br>of web-based application and service. Wikipedia, WordPress, and Etsy are just a few examples of<br>commercial websites written in the language. To begin auditing Copilot\u2019s suggestions for a<br>simple PHP login form, I started with a completely blank project. After writing two lines,<br>Copilot quickly recommended an entire block of code, lines 7-21 as shown in the image below.<br>While this AI generated code does compile and complete the task, it already contains security<br>issues. The query on line 10 uses the exact values entered by the user, which makes it vulnerable<br>6<br>to SQL injection attacks. An SQL injection attack is a common web hacking technique that<br>involves the placement of malicious code in SQL statements, via web page input (W3Schools).<br>A hacker could easily execute arbitrary SQL commands to gain access to user passwords or other<br>sensitive information. Instead, Copilot should have suggested some lines that \u201csanitize\u201d user<br>input before creating a query.<br>To guard against malicious actors, it is also important to never store direct passwords in a<br>database (W3Schools). When I prompted Copilot to fill in the code for an account registration<br>page, it did suggest adding some MD5 hash encryption to the original password entered by the<br>user. This is shown on line 18 in the image below.<br>However, for some applications, this amount of encryption may not be enough. There are more<br>advanced methods to go about password protection that professional developers may prefer to<br>use instead. It is also important to note that with this registration form, the login page Copilot<br>previously suggested would not work correctly. The saved password and the password that we<br>search for upon login would never match. Another alarming discovery is that when I attempted<br>7<br>to add an \u201cauthor\u201d comment on my code, GitHub Copilot suggested auto-completing full names<br>that were not mine.<br>Task 2: Problem Solutions<br>Algorithm practice problems on HackerRank are rated with a difficulty of either easy,<br>medium, or hard. For this audit, I prompted GitHub Copilot to complete my solutions for 5<br>randomly selected problems from each of the easy, medium, and hard categories. As OpenAI<br>Codex is reported to suggest code most accurately for Python, all problems were solved using<br>Python. Every problem on HackerRank has multiple test cases that must be passed in order for a<br>solution to be accepted. These tests measure for both accuracy and performance (HackerRank).<br>A summary of results is shown in the table below.<br>Table 1: Summary of HackerRank Problems Solved by GitHub Copilot Code Generation<br>solved unsolved required significant editing?<br>easy<br>simple array sum N<br>minimax sum N<br>day of the programmer N<br>subarray division N<br>minimum distances N<br>medium<br>queens attack II Y<br>encryption N<br>extra long factorials N<br>climbing the leaderboard Y<br>3D surface area Y<br>hard<br>morgan and a string N<br>dfs edges Y<br>dijkstra: shortest reach II Y<br>beautiful 3 set Y<br>traveling salesman in a grid Y<br>8<br>Copilot generated code was able to solve 100% of the selected easy problems. All of the<br>problems in this set had over 90% success rates on HackerRank. Once I began typing a solution<br>in Visual Studio, Copilot quickly suggested entire code blocks to complete these simple tasks.<br>The suggested code even included comments explaining what each line accomplishes. One of<br>these generated solutions is depicted in the image below.<br>For the selected medium-difficulty problems, Copilot was able to solve about 60%. Out<br>of the three block suggestions for the Climbing the Leaderboard problem, none accurately<br>addressed the problem\u2019s context or provided a correct solution. As shown in the image below,<br>one Copilot suggestion even modified the correct \u2018ranked\u2019 and \u2018player\u2019 variable names to<br>\u2018scores\u2019 and \u2018alice\u2019\u2013 which are not included anywhere in the surrounding code. This particular<br>problem only has a 60.73% success rate on HackerRank.<br>9<br>For the Queen\u2019s Attack II problem, with a 68.03% success rate, Copilot similarly generated three<br>different solutions. Only one of the solutions was accepted after I performed some small edits on<br>the output. Copilot did accurately generate a solution to the Encryption and Extra Long<br>Factorials problems, which have a 91.97% and 95.58% success rate on HackerRank respectively.<br>Copilot generated code and suggestions assisted in solving 40% of the selected hard<br>difficulty problems. Copilot provided accurate suggestions for my starter solutions to Morgan<br>and a String and DFS Edges. However, it failed to produce any valuable suggestions for the other<br>problems, which all have around a 50% or less success rate on HackerRank. When providing the<br>prompter with a description of these problems in natural language, Copilot was unable to<br>generate an accurate solution to these remaining problems. When attempting solutions to these<br>problems with my own code, Copilot continued to suggest different line and block<br>auto-completion options. While some of these suggestions were useful to quickly correct syntax<br>and auto-complete things such as closing parentheses and variable names, more were incorrect<br>and did not accurately complete what I intended to write.<br>Discussion<br>10<br>While GitHub Copilot is a practical tool that can enhance developer efficiency<br>(Kalliamvakou, 2022), it is imperative to exercise caution, especially when handling sensitive<br>data or building applications with security considerations. The tool&#8217;s efficacy is evident in<br>resolving approximately 67% of common algorithmic problems, particularly those with high<br>success ratings HackerRank. As some individuals upload their HackerRank problem solutions in<br>public GitHub repositories, it is likely that the suggested code is taken directly from these users.<br>However, the accuracy of Copilot\u2019s suggestions diminishes for more complex tasks, and there is<br>a noteworthy concern regarding the generation of insecure code for common web applications.<br>GitHub Copilot can only be as secure, accurate, and unbiased as its data set. This is true<br>for any generative AI tool\u2013 massive data sets and predictive analytics do not always reflect<br>objective truth (Crawford, 2013). OpenAI Codex is trained on billions of lines of both natural<br>and programming languages, the majority of which is scraped from public sources. The tool may<br>not prioritize secure programming practices, often opting for general or \u201cmost probable\u201d<br>suggestions. It is crucial to recognize that the biases and limitations present in the training data<br>can be reflected in the generated code. As stated by Kate Crawford in The Hidden Biases in Big<br>Data, \u201cdata and data sets are not objective; they are creations of human design. We give numbers<br>their voice, draw inferences from them, and define their meaning through our interpretations.<br>Hidden biases in both the collection and analysis stages present considerable risks, and are as<br>important to the big-data equation as the numbers themselves\u201d (Crawford, 2013). Look no<br>further than OpenAI\u2019s ChatGPT and Meta\u2019s Galactica for examples of large language models<br>tendencies to assert prejudice and falsehood as facts. These LLMs \u201care not really knowledgeable<br>beyond their ability to capture patterns of strings of words and spit them out in a probabilistic<br>manner\u201d (Heaven, 2022).<br>11<br>In its FAQ, GitHub warns users that \u201cyou should always use GitHub Copilot together<br>with good testing and code review practices and security tools, as well as your own judgment.\u201d<br>Code generated by Copilot should only be used as a starting point\u2014 while it may be efficient at<br>completing simple tasks, it is not possible for the model to be aware of an entire application\u2019s<br>context and the programmer\u2019s intent. Developers must exercise their own judgment and refrain<br>from relying solely on Copilot-generated code, as the tool\u2019s efficiency may lead less experienced<br>developers to believe that the suggested code is always correct. While the tool may allow new<br>developers to \u201ccheat\u201d through simpler tasks, its limitations become apparent in more challenging<br>assignments, requiring substantial editing and scrutiny by the developer. For example, this is<br>shown in how Copilot&#8217;s suggestions for more complex algorithms, such as Dijkstra\u2019s problem,<br>required extensive modifications. This caution becomes particularly crucial when handling<br>sensitive information. No one is fully protected from data leakage\u2013 GitGuardian reportedly<br>detected 10 million new secrets in public GitHub commits in 2022 (Git Guardian, 2023). While<br>this audit did not extensively evaluate hard-coded secrets, Copilot\u2019s lack of secure programming<br>recommendations was evident. Developers must be mindful that models can only be as secure<br>and efficient as their learning data set. While the capabilities of GitHub Copilot are limited, it<br>cannot compromise the security and integrity of developers that are cautious of these limitations.<br>12<br>Works Cited<br>Claburn, T. (2022, August 19). GitHub copilot: Perfect for cheating in Compsci exercises?. The<br>Register\u00ae &#8211; Biting the hand that feeds IT.<br>https:\/\/www.theregister.com\/2022\/08\/19\/copilot_github_students\/<br>Fu, Y. et. al. (2023, October 3). Security Weaknesses of Copilot Generated Code in GitHub, Wuhan<br>University. https:\/\/arxiv.org\/abs\/2310.02059<br>GitGaurdian. (2023). State of Secrets Sprawl Report 2023. State of Secrets Sprawl Report 2023.<br>https:\/\/www.gitguardian.com\/state-of-secrets-sprawl-report-2023<br>GitHub (n.d.). GitHub Copilot. https:\/\/github.com\/features\/copilot<br>HackerRank. (n.d.). Solve algorithms code challenges. HackerRank.<br>https:\/\/www.hackerrank.com\/domains\/algorithms<br>Huang, Y. et. al. (2023, September 14). Neural Code Completion Tools Can Memorize Hard-coded<br>Credentials, Hong Kong University. https:\/\/arxiv.org\/abs\/2309.07639<br>Kalliamvakou, E. (2022, September 7). Research: Quantifying github copilot\u2019s impact on developer<br>productivity and happiness. The GitHub Blog.<br>https:\/\/github.blog\/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-an<br>d-happiness\/<br>Pearce, H. (2021, August 20). Asleep at the Keyboard? Assessing the Security of GitHub Copilot&#8217;s Code<br>Contributions. EEE Symposium on Security and Privacy 2022. https:\/\/arxiv.org\/abs\/2108.09293<br>Radchenko, V. (2023, April 22). GitHub copilot security concerns. Medium.<br>https:\/\/vlad-rad.medium.com\/github-copilot-security-conserns-d4209f0d5c28<br>Rawat, A. (2022, April 21). GitHub copilot: All you need to know. Medium.<br>https:\/\/medium.com\/analytics-vidhya\/github-copilot-all-you-need-to-know-8e6fc1d5ccc<br>Segura, T. (2023, October 12). Yes, github\u2019s copilot can leak (real) secrets. GitGuardian Blog &#8211; Automated<br>Secrets Detection. https:\/\/blog.gitguardian.com\/yes-github-copilot-can-leak-secrets\/<br>W3Schools. (n.d.). SQL Injection. https:\/\/www.w3schools.com\/sql\/sql_injection.asp<br>Zaremba, W., Brochman, G., &amp; OpenAI. (2021, August 10). OpenAI Codex.<br>https:\/\/openai.com\/blog\/openai-codex<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1Anisa HabibHonors 37007 Dec 2023GitHub Copilot: AI AuditGitHub Copilot is a cloud-based artificial intelligence tool developed by Microsoftsubsidiary GitHub and OpenAI that aims to assist users by autocompleting code. Released on 29June 2021, it is currently available to individual users or businesses that purchase a subscription.It is possible to add Copilot as an extension in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":9,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-203","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/pages\/203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=203"}],"version-history":[{"count":1,"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/pages\/203\/revisions"}],"predecessor-version":[{"id":204,"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/pages\/203\/revisions\/204"}],"up":[{"embeddable":true,"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=\/wp\/v2\/pages\/9"}],"wp:attachment":[{"href":"https:\/\/responsiblepraxis.ai\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}