The insurer has been monitoring the pilot for issues such as hallucinated content or misinterpretation of claim facts. So far, Dearsley said Hollard has not observed hallucination problems and, in some situations, the tool has flagged potential leakage. “In a few cases we’ve identified leakage where an excess wasn’t applied or potentially should have and vice versa – a customer had an excess applied that shouldn’t have,” Dearsley said, describing these as “almost unintended benefits.” He said the system’s limits in reading tone and context remain a concern. “With no hallucinated content or incorrect information, the accuracy is there, but on the usefulness side, there’s still some opportunity [for improvement]. The inability to pick up on sentiment is probably the biggest piece of feedback we have identified leading to, in some cases, missing potential vulnerabilities,” he said. The pilot points to a model where AI-generated summaries sit alongside existing case management processes, supporting file review, and leakage detection while human staff continue to handle judgments about vulnerability and complex customer interactions.
