A better metric might be “lumen-hours” - lumens multiplied by how long you get them for. For a given LED, cell and flashlight form factor, the best light is the one that gives you the most lumen-hours in the mode you expect to use most often. That way, you’re accounting for all efficiency factors like LEDs, drivers, spring resistances, cell models and whatnot.
Unfortunately, manufacturers don’t generally supply that information, but some flashlight reviewers do runtime graphs of output over time, and you can figure it out from those - you’re basically working out the area under the line. You also get the bonus of being able to see how well-regulated the light output is - flatter lines are better regulated.