The thing that has become very apparent is that access to quality & proprietary data is a crucial part of creating a system that learns and gets better over time.
However, in industries such as ours, giving up all your data to a 3rd party is like dropping your drawbridge, and allowing others to walk right across the protective moat you've spent so much blood, sweat, tears and money building and defending.
We are already seeing the big leaders in the field of AI sharing and open-sourcing their Machine Learning Systems. For example, late last year, Google open sourced their TensorFlow system. This is the same technology that powers their speech recognition, image search, and more.
So why would Google, Facebook, etc give away the results of millions or possibly billions of dollars of research, that could potentially create competition? The simple answer is that they have access to more training data than most competitors. Just look at this image showing the domains android apps are sending data to: You'll notice Google and Facebook domains are #1 and #2
When building a Machine Intelligence System, part of the process, is giving the system training data to learn from. In general, the more data the system can use for learning, the "smarter" more accurate it gets.
We are in an industry that's already been very data driving for years, and has rapidly been moving to high levels of automation and machine intelligence. To ensure that you remain relevant, your job is to amass and retain full control of as much relevant data as possible.
In the very near future, everyone is going to have access to easy to use personal Intelligent Machines that will assist in the optimization and monetization of impressions and clicks. Google's TensorFlow is already compact enough to run directly on your phone, and Apple's iOS 10 will have machine learning tech built into the software to add intelligence to the OS without sending data to Apple's servers.
In the highly competitive industries, we expect the need for on-premis self managed Intelligent Bots, that can train and learn over time with locally available impression, click and conversion data without sending it off to a 3rd party is going to be an even bigger deal.
The secret source that sets you apart from competitors, will be the valuable unique data you generate with each impression, click and conversion you pay for. Some key points to keep in mind are:
- How easy is it to get access to the raw data to train your systems?
- How many years worth of data do you have access to for training?
- Does a 3rd party have access to your data, and if so, how likely are they to use that data to train their own AI systems to compete with you?
- What are you doing to prevent data leakage?
As I mentioned earlier, Machine Learning is better with more data. There will come a time where the algorithms and systems available will be able to operate and learn with less data points. For example, how is it that a toddler, can identify all cats after just seeing and interacting with the family cat? Current systems need tens of thousands of images of cats to be able to do what a toddler can do with just a few data points.
We don't know how far off we are from being able to train a system with minimal data, but until then there will be the occasional need for a central system to collect massive data, learn from it and make that available to everyone.
There's already technology out there to enable this to happen in a secure private way. Apple's machine learning systems will use something called Differential Privacy to anonymize data while still allowing them to provide valuable services to end users.
Expect to hear more about this tech as the need to keep proprietary data private becomes more important to marketers looking to maintain a competitive edge.
Our vision is that in the next few months and years, you will have access to Intelligent Bots that run on your own on-premise or cloud servers. The system will ingest data and learn from all data sources, such as trackers and marketing tools you use, automatically boot up GPU instances for Accelerated training, then shut them down when done. The results, (updated brain) then gets sent to your Intelligent Bot for use on live traffic.
As you may have guessed, we are investing heavily into AI and ML. As we have always done, a lot of our tech will be open sourced over time to help advance the industry and grow it, and some will remain proprietary.
There's a LOT of work and learning still left to do. We'll share our progress as we move closer to our goals.