Location Intelligence, Spatial Data Analysis | Pitney Bowes
Location Intelligence for Big Data comes with big challenges
By the Pitney Bowes Big Data team (Francica, Winterton, McKenzie, Kernighan)
Location-based information has an inherent tendency toward unwieldiness. To be useful, data must contain a coordinate pair of latitude and longitude. Data suppliers might also embellish, or “geoenrich,” data with yet more descriptive attributes, such as time or demographic characteristics. The objective is usually to prepare data for an organization’s true needs like analytics, and to ultimately uncover the best answer or solution to whatever business challenge is currently pressing.
But before an organization can get started, they must first be able to assess the large amount of location-based data broadcast from sensors, social media, financial transactions, utility infrastructure and other “data in motion.” Processing data may take hours, even days depending on how many records are processed. However, if the data runs into billions of sensor data points there may be insufficient computing performance.
We know the three “Vs” of big data: volume, variety and velocity. But, what about “variability,” which factors the elements of time? Current Earth observation satellite systems, for example, image the same area of the Earth every day. What could change? Temperature, precipitation, man-made intrusions, or sunlight angle—all which could alter measurements captured from soil, vegetation or structures. Then there is “veracity” of data and therefore, keeping the data accurate. The simplest example is contact information that enters a marketing automation system with false names and inaccurate data. How many times have you seen Mickey Mouse entered into a client database or mailing list?
The journey to delivering valuable data insight can be bucketed three ways: go faster, improve analytics, and better organize the data. Let’s look at how these organizational tactics can help to manage the problems of location-based big data.
The need for faster geoprocessing can be accomplished with a data flow accelerator. Here, the solution is to provide data quality, geocoding and geospatial calculations natively into a big data framework, such as Hadoop.
A major property and casualty (P&C) insurance company, for example, achieved better pricing decisions by building a nine-billion-row dataset describing risk for each address in the United States. The solution used a geocoding SDK for Hadoop, whereby Pitney Bowes geocoded nine billion records in approximately 10 hours. Subsequently the P&C insurer was able to analyze fire risk by finding the closest three fire stations by drive-time to each U.S. address, resulting in 1.8 billion route calculations in less than six hours.
Get better organized
The volume of data poses a challenge for data storage and organization. A lack of data organization hampers the ability to undertake analytics. The solution is to organize a data lake geospatially. A spatially-enabled data lake enables users to customize related settings in different processes, connecting them to create workflows. The solution also makes it easy to manage related data store integration. Users can control and customize business logic in related processes. For example, when a user aggregates XY-point records with different attributes from mobile devices into grids, they often want to calculate and select different statistics on selected attributes (e.g. mean or median of wireless signal strength).
Again, taking an example from an insurance company, the task of underwriting an insurance policy can be facilitated if the organization can better analyze the details surrounding a catastrophic weather event. The insurer would use the technology to apply a storm risk value to every address in the U.S., which quickly identifies all houses impacted by the event.
Improve location analytics
Geocoding faster and having better-organized data get users more quickly to the task of utilizing data for analysis. The Pitney Bowes method adds a common ID to link together datasets. The ID is added to that of the user’s own data, such as a customer record, or one of Pitney Bowes’ pre-calculated datasets. Typically, data has links to a postal address, a geographic administrative boundary, such as a postal code area, or a customized grid. These options for geoenrichment present a unique capability to analyze location-based data at different levels of geographic resolution, thereby improving the interpretation of results.
As an example, Pitney Bowes is helping a large mortgage company give its bankers access to deep property information in real-time, while they are having their initial conversations with potential clients, to improve early decisions on viability of the mortgage opportunity and ability to select the correct mortgage product.
But, what happens when information about a specific property address changes? Ownership, a parcel boundary, political district affiliation or the street name could change. Property attributes change as farmland is developed into condos, apartment complexes or housing subdivisions. Flood boundaries change. Demographic areas change. However, the association of the property with a unique ID prevents information about that property from being lost.
Big data visualization
Location analytics can be underappreciated without the ability to visualize the results. A major mobile telecommunications company is using Pitney Bowes’ mapping solutions to show subscriber confirmed network performance as an acquisition engine for proving quality of service to perspective new subscribers. Visualizations help identify the business rules needed to create the operational workflows and understand the proximity relationships that might be missed with traditional graphs and charts.
Location analyses of big data come with big challenges. Organizations must look at their expected business needs when investing in these big data frameworks and the use cases that yield the best results. With the proper solution architecture, many of these challenges can result in superior outcomes.