top of page

The million faces of Data

Will it blend? That is the question.

For those of you who don’t know the reference, Google it for your daily entertainment.

The research question is simple enough, and can be tested over a wide range of inputs - material compositions, hardness, size, shape, etc.

But the more you drill into this question, the more nuance emerges, and the answer becomes complex.

What DOESN’T blend? Will a viscous liquid blend? What happens to composite materials? How fine of a powder are you taking as a “yes”?

In Pharma we always ask complex questions, and get complex (a.k.a. vague) answers.

Take Identity for example. Is this The protein?

You’d expect a simple Yes/No would do. But in Biology as always, nothing is simple.

Our analytical methods are like the four blind men describing an Elephant. Each sensing one part, and providing just a partial description of the whole.

That's why we always use orthogonal methods when we test our product and that's why the release profile only gives a part of the picture.

To fully appreciate the attributes of a protein, we do characterization tests, which go a few levels deeper than routine, to sense additional parts of the Elephant.

We end the day with enourmous amounts of data in all shapes and forms: numbers, colors, chromatograms, biological responses and many more.

You'd think this is heaven for a data scientist, but the reality is far from it.

We are good at generating data, but not so much at contextualizing it and adding metadata- so we end up with petabytes of fragmented and oftentimes innaccesible data.

That's why we have a long way to go before we could fully employ the power of Big Data to help us once and for all, describe the Elephant.


Join us this Thursday (11 May) to hear more about the CMC Data Elephant and what the industry and authorities are doing to (start) tame it.

Register here;:


bottom of page