Why should you use it?

Switching to uhepp as an intermediate format between your analysis framework and plotting code can boost your productivity.

Today, data analysis in HEP takes place in a team. Intermediate results are discussed in regular meetings. If you have presented your work in such an environment, you’ve probably received follow-up questions from your peers: Can you …?. Here are some examples.

  • Can you zoom in on the ratio plot axis to see the difference clearer?

  • On slide X, you are showing the proportion of process A and B. Can you also create the plot showing the ratio of process A and C?

  • Do you also have the plot with a finer/coarser/non-equidistant binning?

  • If you want to show this plot at the conference, you need to change the branding labels.

  • You could merge processes A, B, and C into other to make the plot a bit cleaner.

  • Can you also show the alternative Monte Carlo event generator B’s outline and compare it to generator A?

  • Have you looked at systematic uncertainty A? You could compare the variation to the nominal event yields to see if the effect is relevant/an artifact.

These are all fictional examples, but I am sure some sound familiar to you. Depending on your analysis framework, it might be rather tedious to satisfy these requests. In the worst case, you need to reprocess the full event dataset.

In the optimal case with uhepp, you don’t need to turn to the analysis framework anymore. You should have all the necessary data already in uhepp format. You can accomplish the requests in just a couple of lines of Python code or, in some cases, with a text editor. The best thing here is you can either change the settings only for a single plot or loop over all plots and apply a change universally.

Besides the optimization of your daily workflow, uhepp can also have a significant impact if you are a Ph.D. student. Typically, as a Ph.D. student in high-energy physics, you produce results over several years. Once you’ve reached a particular outcome, or after a predetermined time, you start writing the actual thesis. To document all your results of the last years, you need the plots from back then. Naturally, settings like the color schemes evolve. In your thesis, however, you want to present the material in a uniform way (something about a corporate identity).

If you are a Ph.D. student, ask yourself right now, how much time would it take to remake your old plots with a different color scheme. If the answer is more than 30 seconds, you should consider using uhepp.

The most prevailing storage and data format for histograms in high-energy physics is ROOT’s TH1. Although there is also a THStack object contain styling options, you still need a significant amount of plotting code to go from stored histograms to the actual graphics file. The plotting code often contains hard-coded histogram names. The bad thing about this is that you need both the binary data and the plotting code to reproduce old plots. You might track your plotting code with version control software (e.g., git), however, you probably don’t track the binary ROOT files (you shouldn’t). It’s not unlikely that the plotting code has evolved and is incompatible with the root files’ content or the naming conventions.

As a member of a large collaboration, for example, ATLAS or CMS, you need to adhere to labeling and branding rules. Internal plots that have not been reviewed need to contain the label Internal. Plots presented by students at a national conference need to include the label Work in progress. Plots that end up in a publication should not contain any labels beside the collaboration branding. Plots used in a thesis that are not taken from a publication must not have any label or branding. It is, therefore, a common task to change these labels. A single plot might be used in these different contexts with four different labels.

A secret “trick” circulating among students is to save the plots in EPS format. When you need to change the labels, open the file in a text editor, and replace the text. Although this might work1, you should not lock yourself into a graphics format to have the flexibility to change the labels. With Uhepp, labels can be revised naturally, either with a text editor in JSON or YAML, or with the uhepp Python package.

If you are not yet convinced, there is more to discover in the uhepp ecosystem: the central hub at https://uhepp.org/. The central hub offers a REST API. You can upload new plots or download existing plots in uhepp format via the API. At a superficial level, you can view the hub as a simple centralized storage service.

In high-energy physics, the dataset sizes often goes beyond what a single computer can handle. It is customary to employ a local computing cluster or the Worldwide LHC computing grid. Analysis results can be scattered across many different computers and file systems. Pushing results from the computing nodes to a centralized hub gives you a better overview of the real-time results.

On the other hand, the centralized hub serves as an archive for plots. If you intend to use uhepp during the whole period as a Ph.D. student, you have a single place where your plots are stored. You don’t need to worry that the plots might be scattered over several locations (the computing cluster, your laptop, the desktop machine in your office, your resource center). You do the same with software. With every new commit or revision, you push the changes to a central hosting service.

The web interface at https://uhepp.org/ probably creates the most significant impact. The service previews an interactive version of each plot. Sharing plots with your colleagues is an essential task in a large collaboration. With uhepp, sharing plots becomes a trivial task. All you need to do is send them a link. You can even link to collections of plots which makes classical plot books in PDF format obsolete. Do you remember the Can you …? follow up questions from above? If your peers know uhepp, they could satisfy their curiosity themself. This creates a lot more transparency and trust in the results.

If you are convinced, or even if you are still not convinced, head over to the Getting started guide to get a more hands-on feeling of what it is like to work with uhepp.

1

EPS files frequently contain a low-resolution preview of the rendered vector graphic. Changing the label in plain text will not update the preview.