Customer Information Management

Data Quality Strategies: Stemming a Tide of Bad Data

by Scott Arnett, Director of Product Management, Customer Information Management, Pitney Bowes

Consumer-facing companies today are privy to an enormous wealth of customer data that could be utilised to improve business. It’s crucial that this abundance of information doesn’t become an embarrassment of riches, however, as the rapid pace of collection makes it difficult for organisations to glean actionable insights from constantly expanding data stockpiles.

The concept of quality-over-quantity must be applied to customer information management. There is a need for organisations to "separate the wheat from the chaff" in this scenario, no different than the wisdom in ensuring high quality data in a CRM system, but now at a much bigger scale. Businesses that take this non-discretionary approach to collection are under the dangerous impression that any and all data must present a truly worthwhile opportunity - a mindset that can quickly drain analytical resources.

Big-data platforms like Hadoop and Spark are becoming essential in this situation, as they help businesses integrate and optimise data across their operations and from many different big-data sources. These platforms are in the position to enable organisations to bridge data silos, operate at enormous speed and scale, and integrate data to best capture, analyse and take advantage of today’s information boom.

The problem is that bad or uninformed decisions can be driven by poor data preparation. While big-data technologies can capture and store the litany of communication being collected through social and mobile channels, often discussed in the context of volume and velocity, flawed data can result in poor analytics, or as I like to say, the most critical "v" - veracity of data. This is why organisations need to dig deeper to ensure data sets that come from new sources don’t include duplicate records or inaccurate information, or lack critical customer location and relationship attributes. Gartner Analyst, Ted Friedman, recently tweeted "Winning organisations will stop collecting data and start connecting data."  I couldn’t agree more.

Organisations need to integrate data quality processing directly into Hadoop or Spark-based platforms to improve their ability to analyse critical information, cleanse and link data at extraordinary speeds and scale for real-time, trusted insights. This will allow them to share a richer, more accurate "single view of the customer" across their organisations.

At Pitney Bowes, we’ve been hard at work solving for exactly this challenge, utilising our 30+ years’ experience and expertise in helping clients deal with similar data quality challenges at scale, back when big data wasn’t as trendy. The latest release of our flagship Spectrum Technology Platform enables our clients to deploy Big Data Parsing and Big Data Matching capabilities directly into Hadoop and Spark based environments to deliver the same quality our clients come to expect, regardless of the environment. Our Data Federation and Integration capabilities extend their investment in traditional data warehousing and business intelligence environments with data from these new big data stores, like HDFS, Couchbase, Mongo, Cassandra, Neo4J, Hive to name a few, to deliver insights into current systems without any disruption. Our market leading Master Data Management Solution utilising a NoSQL Graph DB underpinning, allows organisations to deliver contextually relevant insights to systems of records, interactions and insights by linking ALL customer information across the enterprise along with their digital footprint in real-time.