Data Governance and Integrity: Pharma, Biotech, and the Data Deluge (Part 1)

Data powers today’s leading businesses. But why not drug developers?

On the surface, industries like aerospace, mining, retail, and energy don’t appear to have much in common. But they all share one critical attribute: top businesses in every one of these sectors have learned how to turn their vast volumes of data into a powerful, performance-driving resource.

For many industries, “big data” is already old news. They’ve moved well beyond marveling at the scale of information they can generate, and started exploring innovative ways to plug their data generation engines straight back into their organizations – in the form of predictive algorithms, adaptive processes, AI-enabled workflows, and more. 

But drug developers? Far too many of them are worried about where they’re going to store all those paper files. In an era defined by the power of data, drug developers are struggling to adapt, much less keep up. 

And yet, there’s growing reason to believe they can, and will. Over the next few weeks we’ll explore why and how, with a new series on data governance, quality, and integrity in the drug development industry. Starting with this first chapter of our story, we’ll dig into the unique challenges facing pharma companies, biotech businesses, and their digital transformation, and the steps they can take to modernize their information infrastructure and achieve true digital maturity. 

So let’s dive in, right at the root of the problem: unstructured data.

Standing on water: Data governance, integrity, and the challenge of unstructured data

Let’s start with some good news: in an era when “data is the new oil,” drug developers and manufacturers have drilled a well that would make Daniel Plainview blush. The ISPE “recently” reported that a single mid-sized biomanufacturing facility can generate anywhere from 0.5 to 10 petabytes of data every year – and that was in 2018. Just for reference, 1 petabyte is 4 times the content of the US Library of Congress. 

But then there’s the bad news: there’s an enormous difference between tapping data and tapping its value. In fact, 70% of that continual data goes completely unused. For the most part, the life sciences’ information geyser delivers precious few valuable resources. Instead, it produces ever-expanding, nominally “managed” data slicks.

Why? Simple: to put a resource to use, it has to be in a usable form. The vast majority of drug development data isn’t, even as enormous volumes of it continue to be produced – only to be trapped in paper documents, PDFs, Microsoft files, and employees’ minds. No sooner is this data generated then it is marooned in these unstructured formats, only to be accessed with tedious, time-consuming, and all-too-frequently duplicated effort.

We have an unstructured data problem: no unified knowledge base for CMC

The impact of that challenge goes far beyond untapped organizational value or lost institutional knowledge. 

Scattered, disconnected data sources can be impossibly hard to manage to regulatory standards like 21.CFR.11. Bridging islands in a fragmented, analog data ecosystem takes time – so much, in fact, that key data-sharing workflows like tech transfer now take 18 months or longer, on average. And drug developers awash in unstructured data often struggle to fully understand and articulate their own processes, much less how they need to be managed and controlled – leading to underpowered control strategies replete with unknown, unmitigated risks. 

How big are those risks? We don’t want to name names,  but just ask a pharma company that had their biosimilar rejected for the 3rd time because of data integrity issues. Or a beleaguered global CDMO whose recent control-related 483s have roiled customer product flow.

To put a resource to use, it has to be in a usable form. The vast majority of drug development isn’t, even as enormous volumes of it continue to be produced – only to be trapped in paper documents, PDFs, Microsoft files, and employees’ minds.

Just these two examples – along with many other recent ones – show how urgently drug developers and biomanufacturers need to evolve their policies and strategies for data governance. It’s never been more important for these businesses to establish a comprehensive approach to both generating and managing their data in ways that support effective, holistic data integrity – before they discover just how vulnerable their data ecosystems have become.

As we’ll see in this new series, developing that kind of approach is one of the smartest investments drug developers and manufacturers can make in their business – but it’s one that requires thoughtful planning, careful management, and serious commitment. 

The first step, though, doesn’t start with the data itself. It starts with the mindset of the people creating it – and how businesses can help them escape one of the industry’s most common data integrity traps.

Squaring off with ALCOA+: Why it’s time to evolve industry’s legacy framework

To develop truly modern data governance methods, drug developers and manufacturers need to start with a fundamental principle: What constitutes “good” data management?

For decades now, the answer to that question has been simple: it means keeping data attributable, legible, contemporaneous, original, accurate, as well as complete, consistent, enduring, and available. It’s a framework that served the industry well at a time when information sources were limited and blockbuster small molecules stacked up new data in large but predictable batches. 

Fast forward to an era of arcane biologic compounds, personalizable therapies, IoT-enabled manufacturing facilities, and sprawling global partner networks. Today, in an industry with vastly more, more complex, and more heterogeneous data to synthesize, the ALCOA framework is beyond brittle. When data is pouring in from countless sources, and needs to be leveraged across teams, facilities, and projects, who can limit “well-controlled data” to the latest time-stamped PDF with a legible signature? 

ALCOA is a framework that served the industry well at a time when information sources were limited and blockbuster small molecules stacked up new data in large but predictable batches. But drug developers’ digital reality is now fundamentally different.

And yet, many drug developers and manufacturers are still struggling to break free from this document-centric mindset and imagine how ALCOA can evolve for a new era. The result: persistent friction and flaws in many drug development and manufacturing workflows. As famed FDA inspector Peter Baker observed recently, nearly 80% of recent CDER warning letters are the result of rigid adherence to ALCOA+ principles. 

Shifting the assumed definition of effective data governance – moving it beyond instinctive alignment with ALCOA – will be the first mile of our industry’s long path to digital maturity. Today, many stakeholders still stand firm behind that framework, even as oceans of new data flood their servers and flow between siloes with little control.

 

Tomorrow’s data governance: What does the “right” structure look like?

Evolving our industry’s conception of “good” data management will be the catalyst for many transformative benefits – not the least of which will be lower risk of regulatory blowback. After all, well managed, consistently structured data is key to nearly every golden fleece in drug development: automated reporting and compliance, streamlined tech transfers, continuous manufacturing, and more. 

There are two key dimensions of that structure, which we’ll delve into across this series: 

  • Where data is structured: Consolidating data resources in a single source of truth (SSOT) where it can be secured, integrated, and readily accessed in compliant ways – as well as prepared for use in more advanced applications.
  • How data is structured: Generating and maintaining data in a properly labeled, easily usable format like FAIR (findable, accessible, interoperable, reusable) that enables any appropriate stakeholder to readily and efficiently leverage that resource.

Well managed, consistently structured data is key to nearly every golden fleece in drug development: automated reporting and compliance, streamlined tech transfers, continuous manufacturing, and more.

Both of these factors are crucial for pharma and biotech businesses that want to protect the quality and integrity of their  drug development data. An effective SSOT transforms a core challenge of ALCOA+ – isolated, static documents as either the only or competing sources of truth – by creating canonical data records that can be accessed, linked, analyzed, and leveraged without a single compliance misstep. Using a framework like FAIR turns free data electrons – dispersed, disconnected, and chaotic – into a usable charge that can power many different applications. 

Implementing either of those structures requires a robust data governance strategy that dictates how a business will generate, manage, and deploy its information resources. It also demands a clear methodology for how those policies and methods will be implemented – including how an organization will make the leap from document-centric compliance and quality control to data-focused efficiency and adaptability. 

So what should those strategies and methods look like? That, dear reader, is for next time. 

Coming up next: Laying the foundation of the data quality pyramid

So how do we get from here to there? How can drug developers lay the groundwork of data-centricity in their organizations, and ensure that their knowledge and information resources flow directly into usable structures – not unmanageable lakes? 

In the next chapter of our series, we’ll explore what that process looks like for businesses that want to escape the gravitational pull of paper, paper-on-glass, and all their Adobe and Microsoft analogues. Stay tuned to find out where that process starts – with building an organizational data culture – and why that foundation is essential to strong data governance and consistent data integrity.

Until then, check out some of our team’s other insights on evolving applications, challenges, and best practices for drug development data: 

 

GET IN TOUCH

Ready to start structuring your product and process data?

No need to wait for the rest of our series. Reach out to our experts today to learn how we can help.

Shameek Ray

Head of Quality Manufacturing Informatics, Zifo

Shameek Ray is the Head of Quality Manufacturing Informatics and Zifo and has extensive experience in implementing laboratory informatics and automation for life sciences, forensics, consumer goods, chemicals, food and beverage, and crop science industries. With his background in services, consulting, and product management, he has helped numerous labs embark on their digital transformation journey.

Max Peterson​

Lab Data Automation Practice Manager, Zifo

Max Petersen is the Lab Data Automation Practice Manager at Zifo responsible for developing strategy for their Lab Data Automation Solution (LDAS) offerings. He has over 20 years of experience in informatics and simulation technologies in life sciences, chemicals, and materials applications.

Michael Stapleton

Board Advisor, QbDVision

Michael Stapleton is a life sciences leader with success spanning leadership roles in software, consumables, instruments, services, consulting, and pharmaceuticals. He is a constant innovator, optimist, influencer, and digital thought leader identifying the next strategic challenge in life sciences, executing and operationalizing on high impact strategic plans to drive growth.

Matthew Schulze

Head of Digital Pioneering Medicines & Regulatory Systems, Flagship Pioneering

Matt Schulze is currently leading Digital for Pioneering Medicines which is focused on conceiving and developing a unique portfolio of life-changing treatments for patients by leveraging the innovative scientific platforms and technologies within the ecosystem of Flagship Pioneering companies.

Daniel Matlis

Founder and President, Axendia

Daniel R. Matlis is the Founder and President of Axendia, an analyst firm providing trusted advice to life science executives on business, technology, and regulatory issues. He has three decades of industry experience spanning all life science and is an active contributor to FDA’s Case for Quality Initiative. Dan is also a member of the FDA’s advisory council on modeling, simulation, and in-silico clinical trials and co-chaired the Product Quality Outcomes Analytics initiative with agency officials.

Kir Henrici

CEO, The Henrici Group

Kir is a life science consultant working domestically and internationally for over 12 years in support of quality and compliance for pharma and biotech. Her deep belief in adopting digital technology and data analytics as the foundation for business excellence and life science innovation has made her a key member of PDA and ISPE – she currently serves on the PDA Regulatory Affairs/Quality Advisory Board

Oliver Hesse

VP & Head of Biotech Data Science & Digitalization, Bayer Pharmaceuticals

Oliver Hesse is the current VP & Head of Biotech Data Science & Digitalization for Bayer, based in Berkeley, California. He has a degree in Biotechnology from TU Berlin and started his career in a Biotech start-up in Germany before joining Bayer in 2008 to work on automation, digitalization, and the application of data science in the biopharmaceutical industry.

John Maguire

Director of Manufacturing Sciences, Sanofi mRNA Center of Excellence

With over 18 years of process engineering experience, John is an expert in the application of process engineering and operational technology in support of the production of life science therapeutics. His work includes plant capability analysis, functional specification development, and the start-up of drug substance manufacturing facilities in Ireland and the United States.

Chris Kopinski

Business Development Executive, Life Sciences and Healthcare at AWS

As a Business Development Executive at Amazon Web Services, Chris leads teams focused on tackling customer problems through digital transformation. This experience includes leading business process intelligence and data science programs within the global technology organizations and improving outcomes through data-driven development practices.

Tim Adkins

Digital Life Science Operations, ZÆTHER

Tim Adkins is a Director of Digital Life Sciences Operations at ZÆTHER, serving the life science industry by assisting companies reach their desired business outcomes through digital IT/OT solutions. He has 30 years of industry experience as an IT/OT leader in global operational improvements and support, manufacturing system design, and implementation programs.

Blake Hotz

Manufacturing Sciences Data Manager, Sanofi

At Sanofi’s mRNA Center of Excellence, Blake Hotz focuses on developing data ingestion and cleaning workflows using digital tools. He has over 5 years of experience in biotech and holds degrees in Chemical Engineering (B.S.) and Biomedical Engineering (M.S.) from Tufts University.

Anthony DeBiase

Offering Manager, Rockwell Automation

Anthony has over 14 years of experience in the life science industry focusing on process development, operational technology (OT) implementation, technology transfer, CMC and cGMP manufacturing in biologics, cell therapies, and regenerative medicine.

Andy Zheng

Data Solution Architect, ZÆTHER

Andy Zheng is a Data Solution Architect at ZÆTHER who strives to grow and develop cutting-edge solutions in industrial automation and life science. His years of experience within the software automation field focused on bringing innovative solutions to customers which improve process efficiency.

Sue Plant

Phorum Director, Regulatory CMC, Biophorum

Sue Plant is the Phorum Director of Regulatory CMC at BioPhorum, a leading network of biopharmaceutical organizations that aims to connect, collaborate, and accelerate innovation. With over 20 years of experience in life sciences, regulatory, and technology, she focuses on improving access to medicines through innovation in the regulatory ecosystem.

Yash Sabharwal​

President & CEO, QbDVision

Yash Sabharwal is an accomplished inventor, entrepreneur, and executive specializing in the funding and growth of early-stage technology companies focused on life science applications. He has started 3 companies and successfully exited his last two, bringing a wealth of strategic and tactical experience to the team.

Joschka Buyel

Process & Knowledge Management Scientist, Bayer Pharmaceuticals | Ph.D., Drug Science

Joschka is responsible for the rollout and integration of QbDVision at Bayer Pharmaceuticals. He previously worked on various late-stage projects as a Quality-by-Design Expert for Product & Process Characterization, Process Validation, and Transfers. Joschka has a Ph.D. in Drug Sciences from Bonn University and a M.S. and B.S. in Molecular and Applied Biotechnology from the RWTH University.

Luke Guerrero

COO, QbDVision

A veteran technologist and company leader with a global CV, Luke currently oversees the core business operations across QbDVision and its teams. Before joining QbDVision, he developed, grew, and led key practices for international agency Brand Networks, and spent six years deploying technology and business strategies for PricewaterhouseCoopers’ CIO Advisory consulting unit.

Gloria Gadea Lopez

Head of Global Consultancy, Business Platforms | Ph.D., Biosystems Engineering

Gloria Gadea-Lopez is the Head of Global Consultancy at Business Platforms. Using her prior extensive experience in the biopharmaceutical industry, she supports companies in developing strategies and delivering digital systems for successful operations. She holds degrees in Chemical Engineering, Food Science (M.S.), and Biosystems Engineering (Ph.D.)

Speaker Name

Speaker’s Pretty Long Title, Specialty, and Business

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam dignissim velit et est ultricies, a commodo mauris interdum. Etiam sed ante mi. Aliquam vestibulum faucibus nisi vel lacinia. Nam suscipit felis sed erat varius mollis. Mauris diam diam, viverra nec dolor et, sodales volutpat nulla. Nam in euismod orci.