I bet you didn’t realize that Bruce Lee knew a lot about why it’s so hard to share data.
“Boards don’t hit back.”
-Bruce Lee on practice vs. real life.
— Collaborative Fund (@collabfund) January 4, 2018
Many of you know, I practice martial arts with one of my main ones being iaido, the practice of swinging a sword.
It’s really an amazing practice to watch. Leaving the level of detail aside, it’s not enough to have a “perfect cut”. What you need is a perfect cut every time.
That’s where Bruce Lee idea comes into play.
It’s not hard, it’s complicated
If you have data that I need, in theory it’s easy to share it. Make a SQL call, get the data. Want to make it simpler? Make a data API and get the data. No big deal. If you were evaluating what’s needed in abstract, you’d say “we can do that, simple.”
How many times have we heard “oh, we can do that easily” followed by people moving on from even thinking about it until the solution doesn’t happen. Notice, I didn’t say ‘until the solution doesn’t work.’ We know it will work, but that’s different from making it happen. Let me explain.
It’s easy to do it once, complicated to manage sharing data as a repetitive and permission-less process. In plain English, it’s not about one app accessing your data, it’s about anyone in the organization being able to do it while governing access and continuing to meet appropriate SLAs. Here’s an example of what I mean.
Let’s say we’re in a big organization.
You own a database, that you have because it’s part of your application, or department, or something.
I have a need for that data… but I’m not a part of your application team, your department, or whatever.
If I ask you for the data, the things you need to consider are:
- If David starts using my database, will it impact the performance of the thing I care about?
- We never have enough budget, why should I use my budget to help David for “free”?
- If what David does requires that I upgrade (speed, performance, scale) who pays for that?
- If I give access to David, maybe everyone will need access, and I know I can’t support that.
- Speaking of support, what if David needs help… I don’t have time (or a charter) for solving David’s problems.
- I love David, but if he messes with my main application because I was kind enough to share data I’ll lose my job. Is it worth risking my job “for the good of the company”?
This goes way beyond “how do we make a SQL call, or create a data API?” and is a bigger problem than the technical “how.” It’s usually solved “politically.” Those with influence can make things happen.
But don’t we want to share data to get a greater return? To be more competitive? To deliver on a better experience?
Yes, yes, and hell yes.
But the business model isn’t there. At least, it’s not there when one is trying to repurpose a silo-ed business solution (with data in the silo) into a data platform.
Keep in mind, don’t confuse data platform with big data or with a data lake. In my mind, big data is about finding insights and patters in large datasets. A data lake is a general way of storing data to make sure that response time is fast without sacrificing data integrity (so perhaps caching information closer to where it’s needed and dealing with how to make sure the latest updates are cached, etc).
But I digress.
Boards don’t hit back, and databases don’t exist in abstract. We need to share data to innovate, and we also need to ensure that those using the data don’t impact other applications using it (and the other concerns I’ve listed above). The way we’ve solved the “impact problem” in the past was to simply not share. Or, better said, not share too much (“don’t tell anyone I’ve given you this… now go away”).
But, that’s exactly the opposite of the outcome we want. It doesn’t make sense that we’d behave the way we do, unless there’s more concern over the issues I raise above than the technical concerns of providing access (which are actually quite simple).
People think IoT requires new sensors and all that… and in part it does. However, there’s so much data collected and available today that’s simply not used. Don’t believe me? Next time you have a failure, look me in the eyes and tell me that data that would have let you know there was an imminent problem doesn’t exist. The problem is that data is too hard to get in real world settings when it’s not just about the SQL query (or API call) but about “how the heck does all this stuff work together even when we don’t report to the same person?”
This is why a data platform is needed. Abstracting away from the actual data store, you can provide policy based access that scales properly, and delivers a level of governance necessary to balance regulatory/privacy concerns with business competitiveness.
Anyways, a short post with some food for thought based on some work I’ve been doing on Smart City data platforms like this one for transport data. Stay warm.