Unexpected failures of equipment often have severe consequences and costs. Such unexpected failures can be prevented by performing preventive replacement. We study a single component that deteriorates according to a compound Poisson process and fails when the degradation exceeds the failure threshold. Through an online sensor, the degradation can be measured in realtime, but we can only replace the component during planned system downtimes. The degradation parameters vary from one component to the next but cannot be observed directly. They must therefore be learned by observing the realtime degradation signal. We model this situation as a partially observable Markov decision process (POMDP) so that decision making and learning are integrated. We show how all relevant information in the degradation signal can be represented by a three dimensional vector and use that to collapse the original highdimensional state space of the POMDP.
This allows us to tractably compute optimal policies and prove that the optimal policy is a state dependent control limit. The optimal control limit increases with the age of a component, but may decrease as a result of other information in the degradation signal. We demonstrate the value of realtime integrated learning and decision making on a large degradation data set of filaments of interventional Xray machines and find that integration leads to cost reductions of 10.50% when compared to approaches that do not learn from the realtime signal and 4.28% relative to approaches that separate learning and decision making.
