Location Intelligence | Pitney Bowes
When is geocoding a Big Data problem?
By Rose Winterton
Geocoding allows us to connect a real world location (longitude and latitude) to a text-based address. For many uses cases this can be done as a batch job on a standard server, or even as a cloud based API transaction. The answer to the question “When is geocoding a Big Data problem?” can be as simple as knowing where the data is stored.
Large organizations are moving from application-centric stacks where the data and the processing are organized by use case or department to centralized data lakes from which different use cases and applications can be served. When you think about a process as core as spatializing your data, it becomes obvious that that process should operate where the data resides rather than brought to a specialized application for processing. Geocoding natively within Hadoop via Hive or Spark lets the data stay in HDFS. Once geocoded, the data can be used for any number of applications within the organization.
If we take an example of a property and casualty insurer, various teams may need the property locations as the basis for different applications or analysis. The property locations might be assessed for risk with relation to natural hazards such as flood or hurricane by an underwriting team. They might also be used by event response or claims with respect to a wildfire or hailstorm footprint. Claims adjusters may look for property locations that fall outside known event footprints for potentially fraudulent claims. The marketing team may have a different use case again in proactively identifying areas to market for new business.
One use case within a large financial institution has pushed us to view geocoding differently. They view geocoding as a data process the same as any other and therefore believe it belongs inside their Big Data framework where all their data is stored. With adoption of Big Data becoming more mainstream, this is a trend that’s here to stay.