NODE - A life in data

Cybersecurity

Dun & Bradstreet is an organisation that sits in rarefied air; the company provides commercial data, analytics and business insight services for around 90% of the Fortune 500 companies, a group that accounts for something in the region of two thirds of the U.S. GDP.

The backbone of the company is its Data Cloud, which Dun & Bradstreet says is the most detailed of its kind. Comprising over 330 million business records, more than 120 million hierarchy members, and with 375 million data elements updated every day, it provides a depth and breadth of business information that is arguably unrivalled.

The data is used by some of the world’s leading organisations to make decisions on credit risk, vendor management, marketing and lead generation, compliance and master data management.

The man behind the numbers is the company’s Chief Data Scientist Anthony Scriffgnano, who is something of a legend in the data community. Scriffignano, who was recognised as the U.S. Chief Data Scientist of the Year 2018 by the Chief Data Officer Club, is routinely invited to provide thought leadership for senior executives and high-level government officials globally.

Recently, he briefed the US National Security Telecommunications Advisory Committee and contributed to three separate reports to the President, on Big Data Analytics, Emerging Technologies Strategic Vision, and Internet and Communications Resilience. In short, he is a man in demand.

All of which means that when Digital Bulletin was offered the chance to pick Scriffgnano’s brains on all things data, we jumped at the chance.

Speaking from Dun & Bradstreet’s HQ in Millburn, New Jersey, Scriffgnano says he has worked in every part of the organisation that touches data over the last two decades, “which is just another way of saying I’m old,” he jokes.

Almost every single day I talk to someone that has an issue where there’s a corpus of data that would help that issue

“We’ve been doing big data since before big data was a term, way before. We curate arguably the largest global database of its type, which is updated millions of times a day. It comes from hundreds of countries where there are different laws, languages and rules around data localisation,” he says.

“We have to be on top of all of that, while data regulation is changing all over the world, so our data strategy is pretty much focused on governance, compliance and quality.”

Scriffignano breaks off from the narrative, keen to address an untruth about how Dun & Bradstreet collects, processes and uses its vast records of data.

“A lot of the data that we have never gets seen outside of these four walls, but there’s a common misconception that we just buy a load of data from around the world and put it in a database and sell it – nothing could be further from the truth.

“We have to make a lot of the data through imprints, triangulation, adjudication, and we take great information from data that never gets exposed, like signals and detailed pieces of trade information that simply can’t leave the company.

“There is a huge effort that goes into making sense of data to uncover opportunity and risk, fraud adjudication, and patterns of behaviour.”

Score settled, Scriffignano moves on to the increasing role of disruptive technology and how it can be utilised in the field of big data. He is effusive in his belief in artificial intelligence (AI) and how it is being leveraged in the world of data, listing off a number of use cases carried out by Dun & Bradstreet.

“We are using AI to curate the data, which again looks at triangulation around the truth and can be very helpful to use conversational linguistics,” he says. “We’re also using something called semantic disambiguation, which is looking at unstructured data and trying to discern which elements of what we think are true are true, because not all data comes nicely packaged with a dictionary that tells you how to use it.

“Most of the really good stuff is raw and you have to be able to use it in that form – AI is extremely useful for that.”

My advice is to be humble, because the problem you’re working on is very likely to be bigger than it looks

Scriffignano and his team are also using AI for advanced anomaly detection in a bid to work out what the “bad guys” will do next.

“The first thing the best bad guys will do if they suspect that they’re being watched is change their behaviour. If you’re just doing simple modelling on prior bad behaviour, you’re going to be modelling on ways that the bad guys no longer behave. It’s called an observer effect,” he says.

“AI and solutions that are not improvised – so not based on the past – can be extremely helpful in finding emerging types of behaviour that might represent new maleficence. So, advanced anomaly detection is absolutely being revolutionised by not only the availability of data, but also the availability of compute power.

“Deep learning is a great example of that; it is not a super new concept, but you need an awful lot of data and an awful lot of horsepower to do something like that, and we didn’t have that available to us when these ideas were being conceived. There were examples but they were really theoretical, but the results we’re seeing now are real.”

But away from Dun & Bradstreet, companies are finding it difficult to come to terms with what could be considered – to the likes of Scriffignano, at least – pretty elementary data challenges. He riffs on a number of topics, including the challenge of aligning cloud and data strategies, but settles on data silos as an issue to drill down into.

“Companies are absolutely struggling. Almost every single day I talk to someone that has an issue where there’s a corpus of data that would help that issue, but they either don’t have access, don’t know about it or can’t see it, and the list goes on. And many times, this is in the same enterprise, which is even sadder. And this happens in corporate America, it happens in academia, medical research – it happens everywhere,” he says.

“Sometimes siloing is a necessary evil to protect privacy or legal considerations, very often it’s just noise or the way we set up our systems, or departments not communicating very well, so a lot of siloing happens by accident.

“The challenge is that data is increasing at an immeasurable rate; data begets data, so this whole data is the new oil thing is silly, because oil does not beget oil.”

Scriffignano asks himself rhetorically: “So what do you do about it?”

He breaks data down into three buckets: the data a company has and can be seen; the data a company knows exists and can access if it’s prepared to invest time and resource; and data that is simply not accessible.

“When you’re talking about problems, you want to try to get an estimate of the size and the quality of each of those corpora of data. If we’re going to make a decision about enterprise risk and you know that there is data sitting in the office, you have to decide whether going and getting it is going to change the decision you’re going to make or not,” says Scriffignano.

“Companies can’t start the conversations saying: ‘okay, let’s bring all the data together’, there’s too much data and they don’t have time, so you have to purposefully choose the data you want, and you have to know what you didn’t use and why you didn’t use it. That is the best strategy I can advise.”

Another factor companies are struggling to get to grips with is the ever-changing landscape of data governance, rules and regulations, says Scriffignano, who draws parallels with a theory first discussed in Alice in Wonderland, no less.

“There are a number of GDPR-like regulations around the world, and companies that ignore them will find themselves in a lot of trouble,” he says.

“In computer science, there is what is known as a Red Queen Problem, which comes from Alice in Wonderland, during which Alice says to the Red Queen: ‘this is a really odd place, I’m running and running and I don’t seem to be getting anywhere’. The Queen replies: ‘That’s the kind of place this is, you have to run as fast as you can just to stay where you are’.

“With a lot of this regulation we find ourselves in a Red Queen Problem, we can’t stop running, stop everything and assess the regulations and figure out how to comply with it and start up again, you always have to keep running.

“But what I would say is that everyone has been given notice. Companies need to step up what they’re doing and look at those broader issues. They need to develop a strategy that is inclusive of the regulations they are concerned about, but also includes future regulations that they know are going to come. Otherwise you’ll never get out from behind it.”

With the interview drawing to a close, Scriffignano is keen to address the future and, in particular, a new generation of graduates and employees that will solve the data challenges of tomorrow.

“We have to look forward because if we don’t we’ll just continue to have the same conversations,” he says. “My advice is to be humble, because the problem you’re working on is very likely to be bigger than it looks. Don’t be convinced that you have a view on everything, bring other people in and expand the table you’re working on.

“Secondly, be minimalistic and break problems down into chunks. You’re not going to be able to implement AI across the enterprise, it’s too big an initiative, so think about what the problem is you are trying to solve and how is AI going to address those issues, what should you solve first, then break it down from there.

“And lastly, when you’re driving ahead and solving problems, remember to pick your head up from time-to-time. I used to play water polo and while you might be able to swim faster by just keeping your head down, it does mean the ball is going to hit you in the head every now and then, which can be embarrassing.”