When we think about Data in computer technology terms, we think of the bits stored in databases or files. Data can be structured in tables with rows and columns, semi-structured in formats like JSON or XML, or unstructured and stored in various formats. Data can be both text and binary, like photos or videos.
Those bits become information when retrieved and examined to seek patterns or create a higher understanding of the data they represent. Information is the first level of semantic organization, processing, and categorization of data. For example, knowing the total number of orders in the previous month is helpful information for an organization selling products.
We create knowledge when we compare the newfound understanding brought by this information with other available information. This is necessary for any informed action. To continue our example, knowing if the number of orders last month grew compared to the same month a year ago is precious knowledge for an organization trying to increase its sales.
We can all probably agree that there is no shortage of data created all the time. Many technologies are available for examining and processing the data and turning it into information. Snowflake’s Data Cloud is one we’re strongly invested in at Infostrux.
What strikes me is that if data and information are abundant, then the rate-limiting step for most organizations must be the creation of knowledge.
Lack of knowledge and the time it takes to acquire that knowledge can significantly impact the businesses’ ability to make quick and vital decisions that can influence growth, market position, customer satisfaction, and more.
Companies obsess over KPIs, OKRs, and SMART goals, yet they don't have reliable ways to measure the metrics they identified as key.
Without a reliable way to measure those goals and initiatives' effectiveness, companies can rely on intuition, anecdotes, or past experiences.
Companies implementing data-driven decision cultures and processes are generally more successful. There is no silver bullet that guarantees success, and there is such a thing as “high confidence, bad answers” that an improperly designed or poorly implemented data platform can lead to.
The path toward a solution starts with centralizing the company’s knowledge and democratizing access to it. To that end, we need data and information to be put together in one central place along with any previous knowledge gained before.
While we know this, most businesses create silos and hoard data at the department level instead of making sharing across departments or with their partners easier.
Fear or laziness often drives this: “Security requires that we prevent unauthorized access to financial data, so we need to segregate it into a data mart used for financial reporting only to the finance team.” I have often seen silos disguised as a policy excuse created by protectionism: “IT lacks a solid data governance policy, so we have no proper way to share our data outside my department.”
Then there are technical challenges, too: “Loading more data into our EDW will make our analytics perform slower, so we have to do the processing outside first and only load the aggregated data. This takes much of a developer’s time, which we don’t have.”
The problem of data silos is an old problem. Many have tried to solve it on the product side for a long time. I was lucky to have participated in some of those attempts in the second half of the 2000s and early 2010. I have been reflecting a lot lately on those experiences, and this post attempts to share some of those reflections.
Of course, we built great solutions uniquely enabled by the cloud and automated many processes using principles like infrastructure, CI/CD, the policy as code, etc. Ultimately the value of the technology was limited without the cultural change.
For this reason, I am motivated to drive a similar cultural change with Infostrux and our partner, Snowflake. We’re making progress towards incentivizing organizations to break data silos by making it possible to get all of the data in one place and removing performance barriers for processing and analyzing that data.
Concepts like governed data sharing, which make it easy for business units and vendors in a supply chain to access and integrate their data, might incentivize businesses to break their data silos.
The proliferation of cloud platforms and SaaS used by organizations create integration challenges. Businesses find themselves dealing with a lot more sources of data than before, and getting data from some sources is challenging.
To provide a specific example from e-commerce, omnichannel strategies with businesses offering their products through their own digital properties as well as channels like Amazon Marketplace, Facebook, and Instagram find it hard to understand the performance of their investments as the various platforms are not incentivized to give their customers full access to their data.
Unfortunately, their business models are built around obtaining and retaining much data about their users and suppliers. Transparently sharing that data with their customers is not part of the service.
Digital transformation puts a lot of pressure on organizations to deploy more technology, adopt more software, instrument more of their products, etc. Suddenly, organizations that previously ran their business on the data from a limited set of systems like ERP, CRM, and accounting find themselves collecting a plethora of data across many systems and measuring all aspects of their business, products, users, etc.
Bringing that data together and creating valuable insights has developed a whole new industry, set of technologies, and approaches that we know as Big Data, Data Analytics, Data Science, etc.
This increases data and demands organizations to drive more data analysis and make “data-driven decisions.” Organizations often find out they’ve fallen into the “high confidence, bad answers” trap by building solutions that don’t work well enough due to missing or unreliable data.
This will be an ongoing race where the goalpost is constantly changing. Product innovation, new models for analyzing data, new tools for integrating data, and efforts from businesses like ours will all be needed to stay ahead of the curve.
At Infostrux, we help you deal with the undifferentiated heavy lifting required to improve data reliability and break data silos.