Effective Maximum a posteriori (MAP) Estimate in Machine Learning

I am borrowing an example from Tom Mitchell’s video lecture to share some ideas on how we can effectively and objectively use MAP. Here goes the equation for outcomes of coin flips where our coin may not be an ideal coin (that’s the only reason we are making an intelligent machine to find probabilistic outcomes):

θMAP = arg maxθ P(D|θ) P(θ) = (α1 + β1 – 1)  /  (α1 + β1 – 1) (α0 + β0 – 1)

How we choose β values from our previous knowledge of coin can have interesting facts.

  1. If β values are very large compared to our current usages (α values) of our intelligent machine, it has several aspects
    1. The machine is heavily biased towards our prior knowledge. Biased here means more practical in human sense.
    2. The machine learns slowly as it’s heavily biased right from the beginning. It will take a lot of α values to eventually offset the impact of β values. So, this machine cannot adapt to the change of environment fast enough. If we try to temper or tamper our original coin, the machine will continue to give wrong outcomes for a long time.
    3. A great advantage is we can start to use the machine to predict outcomes from day one, as it already has human knowledge.
  2. When β values are low
    1. The machine adapts to any change in environment very fast at the beginning. Eventually any impact of β values will become negligible. So the equation becomes virtually the same as MLE estimate equation, and the machine will have exactly the same behavior
    2. May not be suitable to use from the beginning.
    3. Good when we don’t have substantial amount of prior knowledge of the environment.
  3. Moderated β values
    1. This is the hardest part.
    2. It denotes how much adaptable or resilient we want our machine to be.
    3. When do we want to start using machine’s predictions?
    4. It’s a continuous process. We can take out old values of the total set (β ∪ α) to keep the machine’s brain plastic enough to meet our requirements. The pruning algorithm will make different kinds of learning machines, just like different kinds of intelligent people.

Please, share anything to make this article better. I wish we could prune our β values in such a way to keep our capacity to adapt and learn to an optimal level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s