Conversations with Watson (How IBM’s Watson works)
I had the opportunity to spend one whole (intense) day with professionals from all over Europe in the IBM Research Center in Dublin. The main guess? Watson. The idea was to introduce Watson to a selected group of institutions and start developing awareness and interest around this game changer. In this brief post, I will try to explain how Watson works and how could it change the way we relate to data.
As you might know, computing has gone through 3 major breakthroughs.
- 1900 tabulation
- 1950 programmable
- 2011 cognitive
Watson lives in the last area and it represents IBM’s involvement in great challenges for humankind, such as landing a man on the moon, creating Deep Blue or winning Jeopardy with a somehow previous version of the Watson I met in Dublin.
How does Watson works?
First, it has to understand natural language. In order to do this, Watson will “read massively”. Once it understands, it will give suggestions, with several confidence levels (something similar to the world famous PageRank de Google). Watson actually doesn't learn, but you teach him through trial and error.
So, let’s see how it moves (consider that this is done by massive parallel computing).
You will ask a Question and Watson will apply its NPL skills (natural processing language). The question should be written in a very natural way. Something like : "When will I get my next raise?" will perfectly do.
Question analysis -> Primary search -> Hypothesis generation-> Hypothesis evidence search -> Synthesis-> Answer confidence.
Final answer. That is "the algorithm" shown step by step.
However, Watson is still naive, that is why things like this happened:
Watson has to be properly fed.
Watson is just like a kid. In order to train him, you have to feed him. That means you have to load data on its corpus. Data should not surpass 150 GB (and yes, this is still big data) and should be clean before uploaded. Let’s talk about how you teach Watson.
- Be specific South America, Chile, Santiago, Restaurants, Vegetarian Restaurants.
- Upload clean data. word, pdf and html. However, if you want to do it properly, go for .txt and word files. If you want to feed Watson with the best, then format the word files: title, heading, body.
- Don’t repeat. Don’t upload documents talking about the same. The info is there, let Watson understand and don’t confuse him
- Start training him. The biggest point with Watson is natural language and precise responses. How do you do that? Well, you talk to Watson.
(1) Start writing natural language questions and wait for the feedback.
Watson will ask: Did you ask that already? Was it written differently? Would you like to unify those questions into one?
(2) If the question is clear, then Watson will give you suggestions for the answer. Same move, you select which on(s) are good. Then you save.
Every time you upload new information to the corpus, you need to train Watson. You talk to him repeatedly. You correct him and you make him improve. You have to invest resources and serious time to train Watson to answer questions like:
- Which one is the best vegetarian restaurant in Providencia?
- Is there a michelin award winner in Santiago for veg food?
How do we build the corpus?
The corpus is the mind of Watson. With a poor build corpus, Watson is far from brilliant. Let’s see how to do it properly.
- Narrow the scope. South America > Chile > Santiago > Restaurants > Vegetarian Restaurants +Use few quality documents (less is more)
- Upload files under 10 GB (remember, we talk word or text files, which is a lot of info, which might be 100 - 150 files).
- Do not duplicate documents (that would confuse Watson).
- Do not deploy the corpus several times (do it only 1 or 2 a week)
- Upload word, pdf or htm files.
How do I know I have a good document to upload?
Not everything is good. Same as you with your body, if you eat fast food always, you will get slow and tired fast. Watson works pretty much the same. Feed him wisely.
- Titles are everything.
- Nested tables do not make Watson happy.
- Have clean and well organized content (title, heading 1, 2 or 3, body etc).
A good document to put in Watson’s corpus is:
Validated and has recorded the changes, is well structured, it has a built in vocabulary, the html files had its header and footer removed, documents are relevant to the questions asked.
When you have done this, you might ask 2.000 questions, then you will move to 600 better questions, and manually re-teach Watson. Once you have reach something around 50% of accuracy in the answers, jump to SPSS to keep the training.
How does Watson earns a living?
Unlike Google, IBM doesn't print money and therefore, they decide not to play as Google does. Watson has come up with a share revenue strategy in which the apps using Watson as “the engine” will share revenue between IBM and the organization behind the app. In this way, IBM hopes to bet on talented entrepreneurs, Universities and companies ready to specialize and jump in the cognitive computing bandwagon.
Who is already in?
The ecosystem is still far from big, but here I give you a few names of companies working with Watson. This consumer applications are mainly center now in finances and healthcare, but there is room for every aspect of human behaviour.
- Chef Watson
- Red Ant
- Well Talk
- Pathway Genomics
- Point of care
- Sell Point
- Sofie (life learn)
During the day, we witness two applications for Watson; one was for weatlh management and the other was the collaboration with Bon Appetite called Chef Watson. Both were quite unimpressive truth be told. But then again, we cannot judge a baby boy.
What do I think
I cant help but compare IBM with Google or Apple. Siri, Google Voice, Cortana or Echo have natural processing language capabilities. So what is the big deal with Watson?
(1) It learns. Watson will keep improving as time goes by. It will learn from mistakes and even make connection that we didn't thought about
(2) It is precise. Google gives you millions of pages when you input a query. Think of Watson as a “I’m feeling lucky” thing. And not even that, because it will not give you a website, but a concrete answer (validated by sources and facts).
(3) It works with your information This is the biggest feature for me. This is the game changer. If you are a company, you will put your emails, call logs, tweets, CRM, databases and everything else that is text (as Watson grows, you will put video, sounds or even smell). No one gives you this now. Not even all mighty Google. When Watson gets properly trained, it might elevate the data to a whole new level. What about smart data? Watson might take us there.
Watson will turn 1 year old January 9 2015. There is so much to be done. But when you think about everything IBM’s team has put together in a few months, its awesome. The Watson engine we saw, is completely new; it is not even the same that it was use in Jeopardy. Give this kid 10 years and be awe.
If you are interested in Cognitive Computing, I might recommend you the book Smart Machines written by John E. Kelly III, director of IBM Research, and Steve Hamm, a writer at IBM and a former business and technology journalist. Here, you will understand a lot about what IBM is doing and let me tell you, its ambitious (example? An operative system for a whole city, which in this case is Rio de Janeiro).
I think the business implications this big smart data brings to the table are absolutely unpredictable, yet attractive. Is up to us to build i together.