Structuring CMC Data: Chapter 3 in Our Series on Drug Development Data

There’s a special challenge waiting in drug development’s data ecosystem.

The next time you get dressed in the morning, imagine this: You walk into your closet, look for an outfit, and find nothing but large piles of fiber. 

That’s right: this morning, all of your clothes have been transformed into the bundles of individual threads that make them up. Cotton, wool, rayon (go on, admit it), silk, you name it. Your clothes are in there, somewhere. But today, you have to put them all together before you can wear them.

Panicking? I would be!

Have you been transported overnight to your new Kafkaesque reality? Hardly. For someone responsible for CMC workstreams, this experience is an average day with their data resources. 

If you’ve been following our series on data governance and integrity, you may know the feeling: you have a regulatory submission coming up, or a report due to leadership, or a risk assessment to handoff to MSAT, and the data you need to create that deliverable is… here, somewhere. In a tangled mass of spreadsheets, PDFs, attachments, and OneDrive folders. And it’s up to you to piece it all together. 

Even by drug development’s 3.5-σ standards for data management, CMC data faces particularly acute challenges – ones that continue to delay and degrade many many different critical workflows in the development cycle. But as we’ll see in the third chapter of this series, there’s hope: a new structured approach to managing CMC information that promises to provide a roadmap for modernizing many forms of drug development data. 

Let’s take a look!

CMC data is essential to product success. So why isn’t it managed that way?

Despite its vital importance – underpinning every successful regulatory submission, tech transfer, and development pathway – CMC data rarely get the white-glove treatment they deserve. If they’re managed in any purposeful way at all. 

Today, far too many drug developers have servers and cabinets full of paper and electronic documents, each one full of countless information “threads” that have to be painstakingly woven into the form of reports, analyses, and submissions. That process can already take weeks or months, and is only growing more burdensome as data volumes exponentially expand every year.

And yet, every day, CMC stakeholders add new “threads” to the mass: more spreadsheets, more paper documents, more PDFs, Word docs, and folders. “Throw it on SharePoint” and “can you email me those results” are as much a part of CMC vocabulary as “have we established temperature parameters for that bioreaction” and “are our CQAs and CPPs fully aligned at this step?”

CMC data rarely get the white-glove treatment they deserve. Today, far too many drug developers have closets full of paper and electronic “threads” that have to be painstakingly woven into the cohesive form of each report, analysis, or submission.

For most drug developers, reliance on outdated paper and electronic formats is so ingrained it’s hardly noticed day-to-day. But the cumulative impact on industry performance is becoming inescapable: spiking rates of 483s and other regulatory setbacks, extended time-to-market, and painfully protracted development steps.

As we explored in the second chapter of this series, many drug developers have awakened to the scale of this challenge. At a foundational level, frameworks like FAIR may help the industry bring desperately-needed structure to its data resources. But new paradigms like that take root over time, as organizations rethink their fundamentals of data management and work to build a data culture on principles of modern data governance. What can CMC leaders do now to break today’s vicious cycle of save, upload, forget, redo?

Indeed there is, as a growing number of drug developers are discovering. And it begins with a simple but profound shift of focus: from the documents that continue to slow the industry’s digital development, to the data in those documents.

From collation to structure: How CMC programs need to shift their focus

Today, CMC program’s many paper and electronic documents are a worst-of-both-worlds challenge: each one contains its own isolated bundle of information threads that have to be sorted and untangled, and that then need to be coordinated with numerous other bundles. Is it any wonder that so many regulatory submissions struggle to cross the compliance finish line?

There has to be a better way – and there is. 

The premise is simple: instead of creating countless, isolated bundles that need to be meticulously unraveled, focus on the individual threads of information from the start. At QbDVision, we like to refer to those threads as the “atomic data” at the core of every document. And that is where the magic happens.

As a growing number of drug developers are discovering, unlocking digital transformation starts with a simple but profound shift of focus: from the documents that continue to slow the industry’s digital development, to the data in those documents.

Ask yourself what seems like an equally simple question: what is a QTPP? Is it a PDF referenced in countless reports? Is it one of the files you know you need to include in your handoff to MSAT? An attachment you’ve fished out of your inbox a dozen times? Or is it every one of your individual CQAs, CMAs, CPPs, and all the key decisions they anchor – incidentally rolled into a document?

Once you unbundle all those critical data points, every one of them can then be easily linked to upstream, downstream, and lateral steps that each data point influences. And then, all of a sudden, you’ve transformed isolated documents into linked nodes of interconnected data points that can be aggregated into structured datasets – with a host of powerful applications

Just like that, we’re not talking about threads: we’re talking about a continuous, cohesive digital fabric that can span your entire CMC program, save your workforce countless hours of work, and accelerate critical steps across the development cycle. 

For CMC programs struggling under a growing deluge of data, this can be a transformational approach that unlocks remarkable new levels of robustness, innovation, productivity, and speed. But of course, it’s only feasible if it aligns with regulatory requirements for each stage of the journey to market. 

So once we’ve created a digital fabric of CMC data, how can we begin to layer it in ways that support continual compliance?

Aligning CMC data structures with regulatory expectations

At QbDVision, bringing structure to CMC data is what we do best. We’ve had the pleasure of doing so for drug developers of every size, across a wide variety of product types and therapeutic areas.

Here’s a closer look at a typical knowledge framework that we deploy for our customers, and the categories of data that can be tracked and connected:

Once in place, though, these connected data sets do more than provide a centralized, integrated data hub for your CMC program. They provide the foundation of a vertically integrated knowledge base rooted in ICH guidelines:

  • Fundamental patient, product, and process information naturally make up the base layer of that structure (ICHQ8).
  • Our data framework integrates QRM principles by design, making risk assessments and their outputs a fundamental layer of the knowledge base (ICHQ9)
  • Data and analytics generated during the development cycle can then be fed directly back into the knowledge base as well, enabling CMC leads to iteratively refine related risk assessments, specifications, and requirements (ICHQ9).
  • As general recipes evolve into site-specific recipes, all the tangible assets involved in executing them can be added to the knowledge structure (ICHQ10, ICHQ12). 
  • And finally, all these layers aggregately support control and lifecycle management strategies that secure and guide the product’s compliance over time (ICHQ8-12).


Structure in one dimension begets structure in all: fully, purposefully integrated project data provide the perfect foundation for integrated layers of knowledge that protect data integrity and compliance over time. That multidimensional framework is an integral feature of QbDVision, and not just essential to our customers’ alignment with ICH guidelines – it also supports multiple applications that further unlock the value of structured CMC data. 

…but more on that next time.

Coming up next: Unlocking the power of automated compliance

Like any valuable information resource, structure is just the first step for CMC data. Once it’s in place, you can start to reveal the real value: a host of cost-saving, workflow-accelerating, and insight-generating use cases that can all be built on a foundation of consistently structured data. 

In the 4th post in this series, we’ll dive into one of the most powerful applications for CMC programs: automated reporting and compliance. Can you say “weeks of tedious data gathering and collation reduced to a few keystrokes”? Next time, you’ll see how.  

Until then, check out some of our team’s other insights on evolving applications, challenges, and best practices for drug development data: 


Ready to start structuring your product & process data?

No need to wait for the rest of our series. Reach out to our experts today to learn how we can help.

Vijay Raju

Vice President, CMC Management, Flagship Pioneering

Vijay currently leads CMC activities to deliver on Pioneering Medicines portfolio. The portfolio is built on Flagship Pioneering’s bio-platforms covering multiple modalities (small molecules, biologics, cell & gene therapies). Vijay was previously in technical leadership roles at Novartis.

Greg Troiano

Head of cGMP Strategic Supply & Operations, mRNA Center of Excellence, Sanofi

Greg serves as Head of cGMP Strategic Supply and Operations at the mRNA Center of Excellence at Sanofi, where he is responsible for all aspects of clinical production and raw material supply chain. He joined Sanofi via acquisition of Translate Bio, where he was Chief Manufacturing Officer and responsible for Technical Operations. Over his 20+ year career in the drug delivery field, Greg had various roles leading the pharmaceutical development of complex formulations, including numerous nano- and microparticle based systems. Greg received his MSE and BS in Biomedical Engineering from The Johns Hopkins University and was elected and inducted into the American Institute for Medical and Biological Engineering (AIMBE) College of Fellows in 2020 for recognition of his accomplishments in drug delivery.

Pat Sacco

Senior Vice President Manufacturing, Quality, and Operations, SalioGen

Pat is a Biotechnology technical operations executive with 30+ years of experience leading and managing technical operations functions at numerous innovative companies in the biotech and life sciences industries. He has a passion for advancing and implementing best practices in pharmaceutical manufacturing.

Diana Bowley

Associate Director, Data & Digital Strategy, AbbVie

Diana is the Associate Director, Data & Digital Strategy in S&T-Biologics Development and Launch leading the organization’s Digital Transformation since October 2021. She joined AbbVie in 2012 in the R&D-Discovery Biologics group focused on antibody and multi-specific protein screening and engineering, leading multiple programs to the cell line development stage. In 2017 she joined Information Research and led a team of IT professionals who supported AbbVie’s Discovery Scientists in Biotherapeutics, Chemistry, Immunology and Neuroscience. She has a PhD in Molecular Biology from The Scripps Research Institute and Bachelor of Science in Chemistry from The University of Northern Iowa.

Robert Dimitri, M.S., M.B.A.

Director Digital Quality Systems, Thermo Fisher Scientific

Robert Dimitri is a Director of Digital Quality Systems in Thermofisher’s Pharma Services Group. Previously he was a Digital Transformation and Innovation Lead in Takeda’s Business Excellence for the Biologics Operating Unit while leading Digital and Data Sciences groups in Manufacturing Sciences at Takeda’s Massachusetts Biologics Site.

Devendra Deshmukh

Global Head, Digital Science Business Operations, Thermo Fisher Scientific

Devendra Deshmukh currently leads Global Business Operations for Digital Science Solutions at Thermo Fisher Scientific. In this role he oversees operations broadly for the business across its product portfolio and leads the global professional services, technical support, and product education teams.

Grant Henderson

Sr. Dir. Manufacturing Science and Technology, VernalBio

Grant Henderson is the Senior Director of Manufacturing Science and Technology at Vernal Biosciences. He has years of expertise in pharmaceutical manufacturing process development/characterization, advanced design of experiments, and principles of operational excellence.

Ryan Nielsen

Life Sciences Global Sales Director, Rockwell Automation

Ryan Nielsen is the Life Sciences Global Sales Director at Rockwell Automation. He has over 17 years of industry experience and a passion for collaboration in solving complex problems and adding value to the life sciences space.

Shameek Ray

Head of Quality Manufacturing Informatics, Zifo

Shameek Ray is the Head of Quality Manufacturing Informatics and Zifo and has extensive experience in implementing laboratory informatics and automation for life sciences, forensics, consumer goods, chemicals, food and beverage, and crop science industries. With his background in services, consulting, and product management, he has helped numerous labs embark on their digital transformation journey.

Max Peterson​

Lab Data Automation Practice Manager, Zifo

Max Petersen is the Lab Data Automation Practice Manager at Zifo responsible for developing strategy for their Lab Data Automation Solution (LDAS) offerings. He has over 20 years of experience in informatics and simulation technologies in life sciences, chemicals, and materials applications.

Michael Stapleton

Board Director, QbDVision

Michael Stapleton is a life sciences leader with success spanning leadership roles in software, consumables, instruments, services, consulting, and pharmaceuticals. He is a constant innovator, optimist, influencer, and digital thought leader identifying the next strategic challenge in life sciences, executing and operationalizing on high impact strategic plans to drive growth.

Matthew Schulze

Head of Digital Pioneering Medicines & Regulatory Systems, Flagship Pioneering

Matt Schulze is currently leading Digital for Pioneering Medicines which is focused on conceiving and developing a unique portfolio of life-changing treatments for patients by leveraging the innovative scientific platforms and technologies within the ecosystem of Flagship Pioneering companies.

Daniel R. Matlis

Founder and President, Axendia

Daniel R. Matlis is the Founder and President of Axendia, an analyst firm providing trusted advice to life science executives on business, technology, and regulatory issues. He has three decades of industry experience spanning all life science and is an active contributor to FDA’s Case for Quality Initiative. Dan is also a member of the FDA’s advisory council on modeling, simulation, and in-silico clinical trials and co-chaired the Product Quality Outcomes Analytics initiative with agency officials.

Kir Henrici

CEO, The Henrici Group

Kir is a life science consultant working domestically and internationally for over 12 years in support of quality and compliance for pharma and biotech. Her deep belief in adopting digital technology and data analytics as the foundation for business excellence and life science innovation has made her a key member of PDA and ISPE – she currently serves on the PDA Regulatory Affairs/Quality Advisory Board

Oliver Hesse

VP & Head of Biotech Data Science & Digitalization, Bayer Pharmaceuticals

Oliver Hesse is the current VP & Head of Biotech Data Science & Digitalization for Bayer, based in Berkeley, California. He has a degree in Biotechnology from TU Berlin and started his career in a Biotech start-up in Germany before joining Bayer in 2008 to work on automation, digitalization, and the application of data science in the biopharmaceutical industry.

John Maguire

Director of Manufacturing Sciences, Sanofi mRNA Center of Excellence

With over 18 years of process engineering experience, John is an expert in the application of process engineering and operational technology in support of the production of life science therapeutics. His work includes plant capability analysis, functional specification development, and the start-up of drug substance manufacturing facilities in Ireland and the United States.

Chris Kopinski

Business Development Executive, Life Sciences and Healthcare at AWS

As a Business Development Executive at Amazon Web Services, Chris leads teams focused on tackling customer problems through digital transformation. This experience includes leading business process intelligence and data science programs within the global technology organizations and improving outcomes through data-driven development practices.

Tim Adkins

Digital Life Science Operations, ZÆTHER

Tim Adkins is a Director of Digital Life Sciences Operations at ZÆTHER, serving the life science industry by assisting companies reach their desired business outcomes through digital IT/OT solutions. He has 30 years of industry experience as an IT/OT leader in global operational improvements and support, manufacturing system design, and implementation programs.

Blake Hotz

Manufacturing Sciences Data Manager, Sanofi

At Sanofi’s mRNA Center of Excellence, Blake Hotz focuses on developing data ingestion and cleaning workflows using digital tools. He has over 5 years of experience in biotech and holds degrees in Chemical Engineering (B.S.) and Biomedical Engineering (M.S.) from Tufts University.

Anthony DeBiase

Offering Manager, Rockwell Automation

Anthony has over 14 years of experience in the life science industry focusing on process development, operational technology (OT) implementation, technology transfer, CMC and cGMP manufacturing in biologics, cell therapies, and regenerative medicine.

Andy Zheng

Data Solution Architect, ZÆTHER

Andy Zheng is a Data Solution Architect at ZÆTHER who strives to grow and develop cutting-edge solutions in industrial automation and life science. His years of experience within the software automation field focused on bringing innovative solutions to customers which improve process efficiency.

Sue Plant

Phorum Director, Regulatory CMC, Biophorum

Sue Plant is the Phorum Director of Regulatory CMC at BioPhorum, a leading network of biopharmaceutical organizations that aims to connect, collaborate, and accelerate innovation. With over 20 years of experience in life sciences, regulatory, and technology, she focuses on improving access to medicines through innovation in the regulatory ecosystem.

Yash Sabharwal​

President & CEO, QbDVision

Yash Sabharwal is an accomplished inventor, entrepreneur, and executive specializing in the funding and growth of early-stage technology companies focused on life science applications. He has started 3 companies and successfully exited his last two, bringing a wealth of strategic and tactical experience to the team.

Joschka Buyel

Senior MSAT Scientist at Viralgen, Process and Knowledge Management Scientist at Bayer AG

Joschka is responsible for the rollout and integration of QbDVision at Bayer Pharmaceuticals. He previously worked on various late-stage projects as a Quality-by-Design Expert for Product & Process Characterization, Process Validation, and Transfers. Joschka has a Ph.D. in Drug Sciences from Bonn University and a M.S. and B.S. in Molecular and Applied Biotechnology from the RWTH University.

Luke Guerrero

COO, QbDVision

A veteran technologist and company leader with a global CV, Luke currently oversees the core business operations across QbDVision and its teams. Before joining QbDVision, he developed, grew, and led key practices for international agency Brand Networks, and spent six years deploying technology and business strategies for PricewaterhouseCoopers’ CIO Advisory consulting unit.

Gloria Gadea Lopez

Head of Global Consultancy, Business Platforms | Ph.D., Biosystems Engineering

Gloria Gadea-Lopez is the Head of Global Consultancy at Business Platforms. Using her prior extensive experience in the biopharmaceutical industry, she supports companies in developing strategies and delivering digital systems for successful operations. She holds degrees in Chemical Engineering, Food Science (M.S.), and Biosystems Engineering (Ph.D.)

Speaker Name

Speaker’s Pretty Long Title, Specialty, and Business

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam dignissim velit et est ultricies, a commodo mauris interdum. Etiam sed ante mi. Aliquam vestibulum faucibus nisi vel lacinia. Nam suscipit felis sed erat varius mollis. Mauris diam diam, viverra nec dolor et, sodales volutpat nulla. Nam in euismod orci.