Replications & Static plots

Overview

To generate static plots that account for replications, I adapted a script originally developed by @ctena so that it can produce images directly from the profiler output paths.

The summarize_and_plot.py script is designed to provide a clear and robust analysis of performance logs. Instead of working interactively through the dashboard, it runs in batch mode: you set it up once, execute it, and it generates a full set of plots and summaries in a chosen directory. This makes it especially handy when you need figures for reports or publications.

A key strength of the script is that it uses a trimmed mean to combine results from multiple replications. This helps smooth out random fluctuations or outliers (like an unusually slow run caused by a system hiccup), giving a more reliable picture of typical performance. The script also supports optional mapping files to simplify the often detailed section, subsection labels into higher-level groups, making results easier to interpret. You can even mark levels as ignore in the mapping file to exclude them from the analysis.

What you need before running

The profiler outputs structured like this:

runs/
 ├── profiler_output_1/
 │    └── ....
 ├── profiler_output_2/
 │    └── ....

profiler_output_<replication> → one folder per replication.
Inside each, profiler_data_time_size_<processes>.csv files with logs for each process count.

Mapping file (optional): a CSV that renames sections and subsections into human-readable categories.

If a section/subsection is not found in the mapping, it will appear as Unknown.
If explicitly mapped to "ignore", that section will be excluded from results.

How to run

At the bottom of the script, you can configure the paths inside the __main__ block:

if __name__ == '__main__':

    time_log_pattern = r'/path/to/runs/profiler_output_<replication>/profiler_output_time/profiler_data_time_size_<processes>.csv'
    mem_log_pattern = r'/path/to/runs/profiler_output_<replication>/profiler_output_time/profiler_data_time_size_<processes>.csv'

    image_directory = r'/path/to/output/images'

    mapping_path_for_time = "/path/to/mapping.csv"
    mapping_path_for_mem = "/path/to/mapping.csv"

    # Initialize the analysis object
    analysis_tool = HermesLogsAnalysis(
        time_log_pattern,
        image_directory,
        mapping_path_for_time,
        mapping_path_for_mem,
        replications=2   # Number of replications
    )

    # Summarize results
    simple_times, simple_mem = analysis_tool.summarize()

    # Generate plots
    analysis_tool.plot_times(simple_times)
    analysis_tool.plot_mem(simple_mem)
    analysis_tool.plot_speedup(simple_times)
    analysis_tool.plot_efficiency(simple_times)

Key points to adapt:

time_log_pattern / mem_log_pattern: paths to profiler CSVs, where and will be replaced automatically.
image_directory: folder where plots will be saved.
mapping_path_for_time / mapping_path_for_mem: optional CSVs with section mappings.
replications: how many profiler replications you want to average over.

After editing, run the script located in the visualization package with:

python summarize_and_plot.py

This will generate the following plots in the output directory:

Times.png → execution time vs processes
Memory.png → memory usage vs processes
Speedup.png → scaling performance
Efficiency.png → parallel efficiency

Edited Sep 10, 2025 by bgravalo