I am borrowing an example from Tom Mitchell’s video lecture to share some ideas on how we can effectively and objectively use MAP. Here goes the equation for outcomes of coin flips where our coin may not be an ideal coin (that’s the only reason we are making an intelligent machine to find probabilistic outcomes):

θ^{MAP} = arg max_{θ} **P**(D|θ) **P**(θ) = **(**α_{1} + β_{1} – 1**) / ** **(**α_{1} + β_{1} – 1**) (**α_{0} + β_{0} – 1**)**

How we choose **β values** from our previous knowledge of coin can have interesting facts.

- If
**β values**are very large compared to our current usages (**α values**) of our intelligent machine, it has several aspects- The machine is
**heavily biased towards our prior knowledge**. Biased here means more**practical in human sense**. - The machine
**learns slowly**as it’s heavily biased**right from the beginning**. It will take a lot of α values to eventually offset the impact of β values. So, this machine cannot adapt to the change of environment fast enough. If we try to temper or tamper our original coin, the machine will continue to give wrong outcomes for a long time. - A great advantage is we can start to use the machine to predict outcomes
**from day one**, as it already has human knowledge.

- The machine is
- When
**β values**are low- The machine adapts to any change in environment very fast at the beginning. Eventually any impact of
**β values**will become negligible. So the equation becomes virtually the same as MLE estimate equation, and the machine will have exactly the same behavior - May not be suitable to use from the beginning.
- Good when we don’t have substantial amount of prior knowledge of the environment.

- The machine adapts to any change in environment very fast at the beginning. Eventually any impact of
- Moderated
**β values**- This is the
**hardest part.** - It denotes how much adaptable or resilient we want our machine to be.
- When do we want to start using machine’s predictions?
- It’s a
**continuous process**. We can take out old values of the total set (**β ∪ α)**to keep the machine’s brain plastic enough to meet our requirements. The pruning algorithm will make different kinds of learning machines, just like different kinds of intelligent people.

- This is the

Please, share anything to make this article better. I wish we could prune our **β values **in such a way to keep our capacity to adapt and learn to an optimal level.