Brian Arcos-Flores – Capital One’s Fraud Detection AI

Brian Arcos-Flores

Honors Praxis: Responsible AI 

AI Audit

For my AI Audit, I chose to investigate Capital One’s AI fraud detection software, which utilizes a combination of AI and ML software to detect and report fraud activity if it detects suspicious or inconsistent activity or behavior in a customer bank activity. Typically, the software looks at the frequency of transactions, amount of money spent, location of transactions, and compares them to previous existing data that Capital One servers contain. The use of ML and AI in this process is not new but has been championed by Capital One as a key benefit to their service, as the customer does not need to call for support to report fraud. Citing that victims of fraud often don’t notice that they have been victims of wrongdoing until it is too late, which is where the detection software comes into play. In a matter of a few minutes and data calculations, Capital One’s software automatically investigates situations of fraud and if it determines that customer activity is suspicious and shows signs of fraud, will inform the customer through warning messages from the Capital One Mobile App that it works in conjunction with. The customer is then able to lock up their card and shop while being able to validate real transactions without needing to call Capital One. The auditing of a financial AI is important as it is vulnerable to the exacerbation of inequalities in a significant manner that directly impacts the lives of people. 

Historically, banking institutions have been used as a weapon to restrict and discriminate against minority groups in America. Following the creation of HOLC, restrictive covenants, and the practice of redlining, financial inequality has been a source of conflict and inequality that continues to persist today. Investigating the technological aspect of financial AI and their interactions with minority groups is crucial in understanding to what extent bias and discrimination continues to exist in these avenues in American life. People are more capable of identifying racial discrimination in unjust loan practices, but what happens when secret algorithms and software are utilized in the decision process in the background without a person’s knowledge? There is a clear value in analyzing these systems in order to ensure trust and ethical service between customer and institution, which has already been steeped in mistrust and doubt. My audit of Capital One’s fraud detection software and its overreaching impacts in other facets of economic services is focused on the question of whether Capital One’s AI fraud detection software is biased towards rich or poor customers in particular geographical locations. 

Ideally, in my blue-sky audit, to uncover whether Capital One’s fraud detection software is unbiased, we could begin by looking directly at their customer data that the software uses to compare with suspicious activity. According to the fraud detection software, suspicious transactions and activity are compared against what Capital One developers have deemed ‘normal’ behavior of the customer, which is stored in their own servers. In order to figure out whether bias exists in the system, it would be important to learn precisely the calculations that are used to determine what aligns with ‘suspicious’ behavior and if this standard makes sense on a larger scale. It is important to recognize that affluence, prosperity and inequality, and what someone would determine ‘normal behavior’ varies extremely between geographical location. Customers in poorer areas may not have access to what Capital One defines as ‘trusted merchants.’ In my blue-sky audit, I’d like to further explore this aspect of their development process of their software, as learning about the qualities and standards that play into the defining of terms and criteria for ‘normal behavior,’ as language itself isn’t exempt from bias.  

Additionally, Capital One directly states that they collect information by simply using their websites and services and are quite honest about its extent in their Privacy Policy. The collection of account data, identity, demographic, and marketing behavior is typical for this type of institution, but the implication in its use is just as important. It is not uncommon for apps and corporations to shift their advertising to fit their target demographic, and I’d be interested in investigating whether Capital One’s targeted advertising is connected to their fraud detection software or other in-house systems that it used to create a profile of the customer. From there, we could analyze and categorize the types of ads that are served to the customer depending on their geographical location, yearly income, and credit history, and if it reflective of a biased and unfair perspective that is magnetized through the algorithm. 

 However, they also cite that they collect information about customers from third-party data sources, business partners, and publicly available data services for fraud prevention. This raise concerns over the neutrality of the datasets and their source, as if it is obtained by the collection of data that the user is not knowledgeable about, Capital One’s own stance on transparency falls short if they utilize data not their own. I would be interested in learning more about the origin of the datasets that they utilize and conduct an additional audit on it to learn whether the acquisition of the data was ethical in nature and the context of its collection. For instance, if the origin of this data stems from corporations entrenched in the collection of data through advertising, web research and user-profile creation, there are real concerns over the ethics of the data and the types of biases that are reflected in them. 

For my proof-of-concert audit, I took to explore and learn more about Capital One’s software that they have cited as relevant to their fraud-detection algorithm and investigate potential avenues of concern in regard to bias and discrimination. Capital One’s innovative use of AI and ML technology is more evident in their approach to fraud detection. Normal investigations process typically begins with an alert of fraud, moving on to an investigation process where Capital One determines whether fraud has occurred. 

However, with the use of AI and ML technology, when an alert is received it is run through the ML model, which begins a simultaneous process of a streamlined investigative process, normal investigative process, and prioritized investigative process. In the model above, it’s worth noting that the reference data that is used to determine fraudulent behavior is based off hundreds of thousands of previous investigations that are used to validate ‘normal behavior’ and fraudulent activities. Through the process above, the ML model determines what it deems an important assessment most likely to be a case of laundering and fraud, allowing investigators to trim their workload and focus on more specific instances of fraud. The use of algorithms to cut down unnecessary labor for their workers is not uncommon but does raise concerns over the adoption of this technique, especially considering the ethical implications of determining what is more important.

Additionally, as part of the audit, I wanted to learn more about Capital One’s approach to AI and whether its development accounts for internal bias or data imbalance. Arpit Bansal’s “MetaBalance: High-Performance Neural Networks for Class-Imbalanced Data” highlights the issue that ML tools often run into when working with class-imbalanced datasets. In the realm of banking institutions and fraud detection, the availability of data reflecting real occurrences of fraud and laundering is substantially less compared to data of real, valid transactions. This inequal representation of data is important when designing algorithms to detect software, as if the data is too small and limited, the fraud detection software only improves its capability to determine fraud that is more like its trained data. If this algorithm is then rolled out to the public, the fraud detection software would potentially only catch specific cases of fraud that represent its trained data, and not general instances of fraudulent activity. However, if the dataset is too large and vague, the same algorithm may stay assigning false verdicts of fraud on legitimate transactions. To the customer, this service is then rendered useless and ineffective if it cannot guarantee a high success rate in detecting fraud. The researcher argues that in order to mitigate the imbalance of data, ML tools can adopt practices like data augmentation, optimizer, and label smoothing in order to improve the capabilities of the software. This instance of software development and considering bias in datasets is an instance of Capital One adopting an ethical value of reconstructing and creating calculative systems and programs to account for biased and incomplete data. 

Capital One has also established itself as a leading banking institution that utilizes and shares technological innovations with other partners in the field, emphasizing its strength and understanding of technology to better serve their customers. Their commitment to open-source software development and sharing of projects has also contributed to the accumulation of greater knowledge and accessibility with AI and ML tools, as developer can work together to create software, allowing other companies and developers to jump and contribute to the software without having to start at the beginning. Capital One’s open-source approach is a big step in the right direction, particularly when it comes to transparency and accessibility to software that institutions utilize on customers without their knowledge. An open-source mindset allows companies to consider ethical programming of AI and ML tools, as they are held accountable by customers, researchers, and writers on their development and rollout of software.  

We have learned about AI software like COMPAS, an algorithm whose selling point has been the calculating of recidivism rates of prisoners that was then used in sentencing trials by judges. Concerns of bias in the data set and unfair risk scores have been called into question multiple times throughout its inceptions, raising questions about the validity of the algorithm the be used as evidence against someone if courts cannot access the software directly to prove its validity and neutrality. The unfortunate reality is that its dataset, software, and development cannot be accessed with a higher influence enabling it, which limits active efforts to discredit its influence and use. Capital One’s approach to develop and share software publicly goes against this notion of hiding algorithms behind businesses’ bottoms lines or trust over ‘propriety.’ If more companies and developers adopt Capital One’s vision on software and machine learning development, there could be a rise of movement that advocates for greater transparency of AI and ML tools that are used in medicine, sentencing, facial recognition, and generative tools to be open-source and available to investigate and form trusts between user and tool. 

Beyond the matter of transparency, Capital One’s developer is also contributing towards expanding educational and programming tools to a wider audience, and with a mindset committed to ethical and open-source development, partners of Capital One can reflect a similar stance and appreciation for transparency and accessible tools. However, there are real concerns about Capital One’s decision to publicize and share ML and AI software to a greater audience- with the rise of OpenAI and its famous ChatGPT, the selling of AI and ML software to the public has enable a rise of startups a company to expand and adopt similar principles and ideas of their contemporaries. This often means that behaviors, practices, and ideas towards AI development are also adopted without concern about their implications and long-term impacts. The use of datasets in these large-scale models are filled with biased data, and depending on their geographical source, reflecting a particular worldview and ideas that are not widely reflected, and in its most extreme, violent and discriminatory. While more access to AI is good for the field of programming and software development, these same institutions must also ensure that these startups and partners recognize how to ethically use the data and resources that are being shared. Capital One does an excellent job in providing articles, podcasts, and tools to interested users, but it is very limited in its knowledge on how to create an AI ethically and ensure a healthy and positive rollout to the public. There is room for more work in this avenue, as well as spreading information about ethical AI development for any field interested in participating in this race towards generative AI and ML development.

Capital One Dev Exchange also offers partner’s access to Enhanced Decisioning Data API, which lets Capital One and its partners work together to combat against online fraud by submitting information about their transactions. EDD’s transaction information is then used alongside Capital One’s internal systems and models to create and refine the software that is used to detect fraud. Partners have access to two different types of operations, where they can submit information of their transaction to Capital One so that it can be used and collected as data, and/or receive findings of the EDD data, which provides the user information and a verdict on the information Most importantly, however, is the criteria that the EDD API service asks of the user, expecting private information like the customer’s address, shipping address, identity, client credentials, type of card, card digits and card activity, partner status and risk score, city code, geographical information, device information, name, age, email address, and much more. If an official Capital One partner requests to receive the results of the EDD data, they receive a conclusion based on their information, such as whether the user’s first and last name, phone number, and email address match up with the information that Capital One’s database holds. The user can also see their risk score and why their transaction was declined. 

This service is interesting because it allows the user direct access to some of the open software that Capital One shares publicly with their customers and partners. Their website also includes a step-by-step guide to using their software as well as what it’s supposed to look like throughout the process. Additionally, based off what we have learned about privacy, there are real concerns regarding the collecting of personal information and banking information of a customer, especially if they don’t know how their information will be used. While the EDD API can inform the user about the calculations, form, and verdict on whether fraud has occurred, this type of transparency isn’t reflected in Capital One’s fraud detection software itself, and more broadly, all of Capital One’s technological activity. Capital One has been the subject of a few data breaches and hacking controversies, particularly following the 2019 Cyber incident where Capital One was subject to a data breach in which more than 4500 of customer credit cards and 140,000 social security numbers were accessed by a hacker, affecting more than 100 million US customers. 

Building on concerns over Capital One’s data collection, there are also matters of how user data is used by the institution. If a customer decides to use Capital One as their bank, they can also take advantage of the Capital One Mobile App and its benefits, like the chatbot Eno and more seamless implementation of technology in their services such as fingerprint recognition, purchase notifications and alerts, opportunities to redeem rewards, and depositing checks online. Unfortunately, these features are locked behind the mobile app, and if a customer were concerned about their privacy and personal information, they would not have access to them. The practice of withholding valuable features or abilities from customers is not limited to the field of banking, as it’s also seen in different forms of media online like YouTube, Facebook and Twitter, where users can opt in to share data and their behaviors to companies to get a better curated service. The main selling point is ease of use, efficient service and more quality recommendations relevant to the customer. As it relates to advertiser’s, Capital one also offers a plug-in that users can install in their Chrome browser called Capital One Shopping, which offers online spenders targeted discounts by recommending cheaper sites or discounts to include in their shopping. The extension works by collecting customer behavior, tracking their online behavior and sites they visit, and then scouring the web to find cheaper alternatives or coupons to apply. In this instance, the user grants the extension access to their information and data in order to get a positive experience- discounted costs.