Definition:
- Calculating some useful quantity from a joint probability distribution
- We have a joint distribution of a joint distribution and wish to find P(Q∣e1,e2,...)
- Evidence variables are variables in that help to query
- Query variables are variables we wish to learn out
- Hiddent variables are variables that are not in question
- O(dn) time complexity to find probability
- Ex:
- Posterior probability: P(Q∣E1=e1,...,Ek=ek), query with evidences
- Most likely explaination: argmaxpP(Q=q∣E1=e1,...)
Basic Inference:
- ex:
- P(B∣+j,+m)∝BP(B,+j,+m)=∑e,aP(B,e,a,+j,+m)=∑e,aP(B)P(e)P(a∣B,e)P(+j∣a)P(+m∣a)
- P(B∣+j,+m)=P(B,+j,+m)/P(+j,+m), therefore proportional, but the denominator is easy to calculate so we ignore for now
- which is very long
Factors:
- Joint Distribution factor
- Joint distributions: entries sum to 1
- Selected Joint: only selected entries of joint distribution for some fixed value of some variables
- assigned x, varied Y
- nb of dimenionality of table is the number of varying variable
- Conditional Distribution
- Single conditional: entries P(y∣x), sums to 1
- Family of conditionals: P(Y∣X): sums to domain of X
- varied x and y
- has multiple conditionals
- Specified Family: P(y∣X)
Inference by Enumeration
- Procedure: Join all factors, eliminate all hidden variables, normalize
- ex:
- R (rain) → T (traffic) → L (late)
- we have table for R, T|R and L|T
- and we wish to query P(L)
- Join all factors: similar to JOIN clause
- Build a new table from conditional table and independent
- P(R)×P(T∣R)=P(R,T) then P(R,T)×P(L∣T)=P(R,T,L)
- Eliminate all hidden variables:
- Marginalize the joint table, ∑RP(R,T,L)=P(T,L)→∑TP(T,L)=P(L)
- Normalize to sum of 1
- ∑T∑RP(L∣t)P(r)P(t∣r), from right to left: joint r, joint t, eliminate r, eliminate t
- This way, it is join up the whole joint distribution before summing out the hidden variables, huge space complexity
- We can normalize early, which is Variable Elimination
Variable Elimination:
Procedure:
- Similar to inference by enumeration but we marginalize early to save space
- Join new variable from joint distribution, marginalize it and join new, marginalize,…
- With the same example:
- Join P(R)×P(T∣R)=P(R,T)→∑RP(R,T)=P(T)
- Join P(T)×P(L∣T)=P(T,L)→∑TP(T,L)=P(L)
- While there are still hidden variables, pick the hidden variable and join it and sum it out (eliminate)
- If there is evidence in the query, start with factors that join with the evidence first and only select the positive when joining
- ex: query P(L∣+r): P(+r)→P(+r)×P(T∣+r)=P(T,+r)→P(T,+r)×P(L∣T)=P(L,T+r)
- →∑TP(L,T,+r)=P(L,+r)
- If the final table is not P(L∣+r) but P(L,+r), use Bayes’ theorem
- P(L,+r)→P(+l∣+r)=P(+l,+r)/P(+r)=P(+l,+r)/1
- Normalize then we have the answer
- ex: normalize between L such ∑LP(L∣+r)=1
Variable Elimination Ordering:
- Ordering which variable to eliminate can reduce much of complicity
- ex:
- With query P(Xn∣y1,...,yn), we join and marginalize right away
- first way: top down, Z×X1×X2×...→Z.∏Xi then times each Yi to sum out X
- at Xn, we have all Xi and Z, therefore, 2n+1
- second way: bottom up, join all pairs Xi×Yi first, then one by one join Z to have Yn×Z
- at Xi×Yi steps, we only have 22 entries, then times Z and normalize to 22
- Complexity is determined by the largest factor
- There is no order that always result to small factors