Capital One Invests in Snowflake Data Lakes for AI

Hello and welcome to Protocol Enterprise! Today: how cloud-based data lakes are transforming the way Capital One thinks about AI, how the war in Ukraine could threaten the supply of critical chipmaking materials, and the week ahead for enterprise technology .

Twirl up

It’s a story as old as time, or at least as old as the mainframe. Legacy but business-critical applications are preventing 79% of companies surveyed by 2nd Watch from moving into the modern era, but 91% of them acknowledged that they will need to modernize these applications to remain competitive.

Great Lakes (data)

Mischievous, reluctant, hesitant – downright scared. Financial services firms have been all of these and more when it comes to migrating their heavily regulated and data-heavy businesses from legacy systems to the cloud.

But while some banks and credit card providers are just dipping their toes, Capital One has been “all in” on the public cloud since 2015, according to the company’s senior vice president, CIO Enterprise Data and Machine Learning at Capital One, Mike Reason. These days, Eason and its team of 1,800 engineers and technicians are busy developing a self-service data pipeline and platform with tools for internal staff to access data to build and train models. machine learning.

Protocol sat down with Eason this week to discuss why Data Lake is making a difference, why the company wants to automate the way it explains its AI models and its efforts to expand Capital One’s team of 11,000 engineers. inside.

Capital One has a data lake. Why is it necessary? What’s unique about what you can do in a cloud data lake environment?

From a macro perspective, the cost of data and computation is greatly reduced. When we were on site, we were using world Teradata and others, and the cost of compute and space is drastically different than it is today.

We’re a big credit card provider, and over the holidays we can spin up more compute and more space and everything to handle the different charges while everybody’s doing their holiday shopping, and so that aspect of the cloud has just been phenomenally important to us, and just a game changer.

From the lake’s perspective, the amount of data we can capture and use in our models is just hugely different, like exponentially different. The lake provides this copy of everything for us, and it’s the only place where all the data will be.

So we use a combination of the lake and the snowflake for some traditional and structured warehouse data.

What kinds of data points or data sources would flow into the lake versus a more structured environment?

The lake is everything. It is the receipt and copying of all company data. So we built a data pipeline to publish our data. And as an end user, you can then determine, I want to publish the data, so I’m going to go to the lake, but I want to publish these attributes or this data to Snowflake.

Or – and this is something we just created – I might want to put data into some type of low-latency operational database that our operational systems can reach, or our models can reach.

So it’s a single pipeline that can publish to many different places. It is a simple and more self-service platform for publication data end users. The lake is the copy of everything. And then there may be a subset of things needed in Snowflake for reporting, doing general analysis, merging data together.

And then there’s the low-latency environment for more very fast back-end models, making a fraud decision in the moment, when you use the data to determine if Kate’s transaction is going to go through.

What is an example of low latency usage for a data lake?

Fraud is a prime example. You swipe the card, we have less than 100 milliseconds to determine if it is a fraudulent transaction or not. And you want as much data and as much [many] data points to be able to make this decision.

Read the rest of the interview with Mike Eason here.

-Kate Kaye (E-mail | Twitter)


The concept of flexible working is not new, but its widespread adoption is. Flexible working helps us all find some semblance of control in the midst of an out-of-control pandemic. Providing options makes people happier and less stressed. This leads to a greater desire to participate, which helps us build our communities and our culture.

Learn more

Chipmakers prepare for Russian-Ukrainian conflict

The White House has urged the chip industry to develop workarounds to potential hardware disruptions that could arise from political tension between the United States and Russia, Reuters reported on Friday.

Russia produces neon and palladium, which are important elements used in chip manufacturing. According to materials consultancy Techcet, factories use neon from Ukraine for laser gases in lithography, and companies use palladium from Russia for chip packaging as well as in some sensors and memories.

As reports surfaced on Friday that Russia could invade Ukraine as early as next week, White House National Security Council staffer Peter Harrell has been in contact with chip companies and urging them to find other sources of material, according to Reuters. The chip equipment industry association also recently surveyed its members about their exposure to materials from the region, Reuters reported, which could further prolong the protracted shortage of chips.

— Max A. Cherney (E-mail | Twitter)

Coming next week

Intel will hold its annual Investor Day on Thursday, led by CEO Pat Gelsinger and other senior executives, and Max will be there with coverage of the event.

Fresh out of the Arm deal collapse, Nvidia will release its fourth quarter results on Wednesday.

drop box is expected to release its fourth-quarter results on Thursday.

Amplitude will release its fourth quarter results on Wednesday.

Thanks for reading – see you Monday!

Previous AR and VR software market is booming globally – Talking Democrat
Next Missouri prosecutor refuses to indict journalist for finding loophole in state website