With machine learning being one of te big things at the moment, I thought I’d cast my mind back to my first ever c programming assignment at university – write the game of Pangolins. The game is based on the 20 Questions game, whereby the user thinks of an object and the machine aims to guess that object by asking simple yes/no style questions – ideally less than 20. the system starts off by knowing about only a single object – a small ant-eating mammal called a Pangolin.
Each time a user thinks of something the system isn’t aware of, it learns from this. The internal implementation of this is just a simple set of nodes, which can either be a question, or an object. A question node has two pointers to a yes and a no node. It’s probably easiest to illustrate with a walkthough. I created a little demo app which can be accessed here on apex.oracle.com. The sample code to create can be found at the bottom of this post.
We start off with a single entry – and we are therefore asked “are you thinking of a Pangolin”?
So assume we were actually thinking of a pencil, so we say no. The system then asks us what were we actually thinking of. Let’s tell it so.
Next we are asked to give a yes/no question that will distinguish between a pencil and a pangolin.
And clearly the answer for that is No.
So now our system knows about both pangolins and pencils. So the next time around it starts off by asking us Does it eat ants?, and then depending on our answer it will decide whether it thinks we’re thinking of either a pangolin or an ant. So if I’m now thinking of a telephone, I would click no, and the system would then ask if I was thinking of a pencil. So no again, and now I am asked to provide a question to differentiate between a pencil and a phone. I might say Does it have a screen or simliar.
So now with my three objects, I have a binary tree that looks something like this:
so you can see very quickly that the more and more information that is fed into the system, the more “intelligent” it becomes. More-so, once we start getting large trees, we can then start to do things like height-balance them, so that we reduce number of questions to each leaf node. We can even eliminate questions completely (i.e. any question trail that doesn’t lead to a leaf node that is an object can be removed). For example, given the following tree:
It would take us three questions to determine whether the user was thinking about a phone or a computer monitor. However if we balance that tree as follows, we only ever have to ask a maximum of two questions for each object node.
Now guessing what a user is thinking about isn’t all that much use, and in the real world things are a lot more complicated than yes/no answers, however you should be able to see here that even with a massively simple structure and concepts, we can do some reasonably interesting things. Things don’t need to be input manually from users all the time – in some systems we could store the states of different signals/sensor value ranges in such a tree structure to arrive at resolution quickly (automating the resolution should the issue arise again). Each time the system experiences an unknown, it can gain a little bit more intelligence.
Flow chart systems similar to this are used in many aspects of life – doctors use them to make diagnosis of conditions etc. However in many cases, new information isn’t fed back into the system as it happens.
Please feel free to have a mess around in the APEX Application. Sensible contributions to the knowledge base are welcomed!