At this year's AI for Finance Symposium - the 2nd Workshop on LLMs and Generative AI for Finance (ACM ICAIF '25) - the focus was squarely on turning AI from prototypes into reliable workflow tools. Speakers from BlackRock, Fidelity, Lion Global, CLSA, IBM, and MIT Sloan converged on three themes: evaluation discipline, randomness control, and context engineering. Together, these themes captured how buy-side, sell-side, and research teams are tackling the practical challenges of bringing AI into production.
(1) Evaluation is becoming the core of AI deployment
Across buy- and sell-side teams, the focus has shifted from testing individual models to evaluating entire workflows. Speakers from Fidelity and Lion Global emphasized that answers must stay tied to evidence, behave consistently across multi-turn questions, and be testable with point-in-time data. Many firms described evaluation frameworks that combine reproducibility checks, backtestable retrieval, and clearer documentation of assumptions. The takeaway was consistent: the ability to evaluate models rigorously - not new model capability - determines whether AI reaches production in the investment process.