Speaker
Description
Observational and simulation-based data provide ample evidence for the existence of strong links between the formation and evolution of a galaxy, and the properties of its progenitor dark matter halo. The strongest and most well-documented of these relationships is between the galaxy and $M_{200}$, the total mass of all matter within the virial radius of the dark matter halo. However, when observing other galaxy properties at fixed $M_{200}$, there is a clear scatter observed. This paper investigates the possible driving factors of the scatter in the circumgalactic mass fraction ($f_{\rm CGM}$) in galaxies by deploying machine learning techniques on the flagship EAGLE and IllustrisTNG simulations at z=0. We use an XGBoost (eXtreme Gradient Boosting) model in tandem with SHAP (SHapley Additive exPlanations), a powerful explainable AI method for retroactively determining feature importances in machine learning models by utilising probabilistic game theory. The symbiotic pairing of XGBoost and SHAP exposes discrepancies between the strength of correlations of the input galaxy properties to $f_{\rm CGM}$ and their actual influence as drivers of $f_{\rm CGM}$. Crucially, SHAP allows us to uncover the relative importance of the galaxy features in driving $f_{\rm CGM}$ at fixed halo mass ranges. We show that the relationships are indeed influenced by the halo mass range in which the galaxies inhabit, and that the importance of each relationship evolves with halo mass.