Detecting anomalies in the LHC ATLAS detector, Polytechnique MAP583 2016/17
Description
James Catmore (UOslo), Imad Chaabane (LRI/UPSud), Sergei Gleyzer (UFlorida), C├ęcile Germain (LRI/UPSud), Isabelle Guyon (LRI/UPSud), Victor Estrade (LRI/UPSud), Balazs Kegl (LAL/CNRS), Edouard Leicht (LAL/CNRS), Gilles Louppe (NYU), David Rousseau (LAL/CNRS), Jean-Roch Vlimant (CalTech)

Introduction

Anomaly detection, where we seek to identify events or datasets that deviate from those normally encountered, is a common task in experimental particle physics. For example, two runs recorded on the same day with identical accelerator and detector conditions and the same trigger menu should not be distinguishable statistically. If they are, some unexpected systematic effect must be present which acts to skew each event or a subset of the events, leading to a collective anomaly. There are many ways in which such problems can arise: for instance, the data acquisition or reconstruction software might be misconfigured, or some subcomponent of the detector might be malfunctioning. Conversely, an otherwise normal dataset may contain individual events which are somehow unusual. These point anomalies may arise due to a problem with the detector, data acquisition, trigger or reconstruction that only occur in very rare circumstances. For both cases it would be highly desirable to devise a mechanism that could automatically scan all new datasets, detect any anomalous features, and alert a human being to enable detailed investigation. This is the subject of today's RAMP.

The prediction task

The nature of the challenge is to devise a classifier that can distinguish the anomalous cases from the bulk of the data in a test dataset, having first trained the classifier on a test dataset. Whilst the anomalous events are labelled in the training set, no distinction is made between the different types of distortion. The challenge in this RAMP is to Separate a skewed data point from a original data point

Data

A version of the HiggsML dataset (used in the Kaggle Challenge in 2014) is provided. It contains a mixture of Higgs particles decaying into tau pairs and the principal background processes. Half of the events are unchanged, but the other half has been artificially distorted or corrupted in some way. The detail of these distortions will be revealed during the RAMP. The full dataset contains approximately 800K events. We are giving you 100k events to build models and will use the rest to test them.
Rules
  • Submissions will open at (UTC) 2000-01-01 00:00:00
  • When you submit, your submission is sent to be trained automatically. The jobs may wait some time in a queue before being run so be patient.
  • Pending (untrained) and failing submissions can be resubmitted under the same name at an arbitrary frequency.
  • Once your submission is trained, it cannot be deleted or replaced.
  • After each succesfully trained submission, you have to wait 900s to resubmit.
  • The leaderboard is in "hidden" mode until (UTC) 2017-01-09 19:00:00 which means that all scores are visible, but the links pointing to the code of the participants are hidden. After (UTC) 2017-01-09 19:00:00, all submitted codes are public. You will be encouraged to look at and reuse each other's code.