Interplanetary Coronal Mass Ejections (ICMEs) result from magnetic instabilities occurring in the Sun atmosphere, and interact with the planetary environment and may result in intense internal activity such as strong particle acceleration, so-called geomagnetic storms and geomagnetic induced currents. These effects have serious consequences regarding space and ground technologies and understanding them is part of the so-called space weather discipline.

ICMEs signatures as measured by in-situ spacecraft come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc. Although well visible by expert eyes, these patterns have quite variable characteristics which make naive automatization of their detection difficult.

The goal of this RAMP is to detect Interplanetary Coronal Mass Ejections (ICMEs) in the data measured by in-situ spacecraft.

ICMEs are the interplanetary counterpart of Coronal Mass Ejections (CMEs), the expulsion of large quantities of plasma and magnetic field that result from magnetic instabilities occurring in the Sun atmosphere (Kilpua et al. (2017) and references therein). They travel at several hundred or thousands of kilometers per second and, if in their trajectory, can reach Earth in 2-4 days.

ICMEs interact with the planetary environment and may result in intense internal activity such as strong particle acceleration, so-called geomagnetic storms and geomagnetic induced currents. These effects have serious consequences regarding space and ground technologies and understanding them is part of the so-called space weather discipline. ICMEs signatures as measured by in-situ spacecraft thus come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc. Although well visible by expert eyes, these patterns have quite variable characteristics which makes naive automatization of their detection difficult. To overcome this problem, Lepping et al. (2005) proposed an automatic detection method based on manually set thresholds on a set of physical parameters. However, the method allowed to detect only 60 % of the ICMEs with a high percentage of false positives (60%). Moreover, because of the subjectivity induced by the manually set threshold, the method had difficulties to create a reproducible and constant ICME catalog.

This challenge proposes to design the best algorithm to detect ICMEs from the most complete ICME catalog containing 657 events. We propose to give to the users a subset of this large dataset in order to test and calibrate their algorithm. We provide in-situ data measurement by the WIND spacecraft between 1997 and 2016, that we sampled to a 10 minutes resolution and for which we computed three additional features that proved to be useful in the visual identification of ICMEs. Using an appropriate metric, we will compare the true solution to the estimation. The goal is to provide an ICME catalog containing less than 10% of false positives while recording as much existing event as possible.

Formally, each instance will consist of a measurement of various physical parameters in the interplanetary medium. The training set will contain data measurement from 1997 to 2010 and the beginning and ending dates of the 438 ICMEs that were measured in this period : tstart and tend.

**To download and run this notebook**: download the full starting kit, with all the necessary files.

This starting kit requires the following dependencies:

`numpy`

`pandas`

`pyarrow`

`scikit-learn`

`matplolib`

`jupyter`

`imbalanced-learn`

We recommend to install those using `conda`

(using the `Anaconda`

distribution).

In addition, `ramp-workflow`

is needed. This can be installed from the master branch on GitHub:

`python -m pip install https://api.github.com/repos/paris-saclay-cds/ramp-workflow/zipball/master`

The public train and test data can be downloaded by running from the root of the starting kit:

`python download_data.py`

In [1]:

```
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```

We start with inspecting the training data:

In [2]:

```
from problem import get_train_data
data_train, labels_train = get_train_data()
```

In [3]:

```
data_train.head()
```

Out[3]:

B | Bx | Bx_rms | By | By_rms | Bz | Bz_rms | Na_nl | Np | Np_nl | ... | Range F 8 | Range F 9 | V | Vth | Vx | Vy | Vz | Beta | Pdyn | RmsBob | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1997-10-01 00:00:00 | 6.584763 | 3.753262 | 2.303108 | 0.966140 | 2.602693 | -5.179685 | 2.668414 | 2.290824 | 23.045732 | 24.352797 | ... | 2.757919e+09 | 2.472087e+09 | 378.313934 | 80.613098 | -351.598389 | -138.521454 | 6.956387 | 7.641340 | 5.487331e-15 | 0.668473 |

1997-10-01 00:10:00 | 6.036456 | 0.693559 | 1.810752 | -0.904843 | 2.165570 | -1.944006 | 2.372931 | 2.119593 | 23.000492 | 20.993362 | ... | 3.365612e+09 | 3.087122e+09 | 350.421021 | 69.919327 | -331.012146 | -110.970787 | -21.269474 | 9.149856 | 4.783776e-15 | 0.753848 |

1997-10-01 00:20:00 | 5.653682 | -4.684786 | 0.893058 | -2.668830 | 0.768677 | 1.479302 | 1.069266 | 2.876815 | 20.676191 | 17.496399 | ... | 1.675611e+09 | 1.558640e+09 | 328.324493 | 92.194435 | -306.114899 | -117.035202 | -13.018987 | 11.924199 | 3.719768e-15 | 0.282667 |

1997-10-01 00:30:00 | 5.461768 | -4.672382 | 1.081638 | -2.425630 | 0.765681 | 1.203713 | 0.934445 | 2.851195 | 20.730188 | 16.747108 | ... | 1.589037e+09 | 1.439569e+09 | 319.436859 | 94.230705 | -298.460938 | -110.403969 | -20.350492 | 16.032987 | 3.525211e-15 | 0.304713 |

1997-10-01 00:40:00 | 6.177846 | -5.230110 | 1.046126 | -2.872561 | 0.635256 | 1.505010 | 0.850657 | 3.317076 | 20.675701 | 17.524536 | ... | 1.812308e+09 | 1.529260e+09 | 327.545929 | 89.292595 | -307.303070 | -111.865845 | -12.313167 | 10.253789 | 3.694283e-15 | 0.244203 |

5 rows × 33 columns

The data consist of 30 primary input variables: the bulk velocity and its components $V,V_{x}, V_{y}, V_{z} $, the thermal velocity $V_{th}$, the magnetic field, its components and their RMS : $B, B_{x}, B_{y}, B_{z}, \sigma_{B_x}, \sigma_{B_y}, \sigma_{B_z}$, the density of protons and $\alpha$ particles obtained from both moment and non-linear analysis : $N_{p}, N_{p,nl}$ and $N_{a,nl}$ as well as 15 canals of proton flux between 0.3 and 10 keV.

The data are resampled to a 10 minute resolution.

In addition to the 30 input variables, we computed 3 additional features that will also serve as input variables : the plasma parameter $\beta$, defined as the ratio between the thermal and the magnetic pressure, the dynamic pressure $P_{dyn} = N_{p}V^{2}$ and the normalized magnetic fluctuations : $\sigma_{B} = \sqrt{(\sigma_{B_x}^{2}+\sigma_{B_y}^{2}+\sigma_{B_z}^{2}})/B$.

In [4]:

```
data_train.info()
```

In [5]:

```
labels_train.head()
```

Out[5]:

1997-10-01 00:00:00 0 1997-10-01 00:10:00 0 1997-10-01 00:20:00 0 1997-10-01 00:30:00 0 1997-10-01 00:40:00 0 Name: label, dtype: int64

ICMEs signatures as measured by in-situ spacecraft thus come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc.

Let's visualize a typical event to inspect the patterns.

In [6]:

```
def plot_event(start, end, data, delta=36):
start = pd.to_datetime(start)
end = pd.to_datetime(end)
subset = data[
(start - pd.Timedelta(hours=delta)) : (end + pd.Timedelta(hours=delta))
]
fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(10, 15), sharex=True)
# plot 1
axes[0].plot(subset.index, subset["B"], color="gray", linewidth=2.5)
axes[0].plot(subset.index, subset["Bx"])
axes[0].plot(subset.index, subset["By"])
axes[0].plot(subset.index, subset["Bz"])
axes[0].legend(
["B", "Bx", "By", "Bz (nT)"], loc="center left", bbox_to_anchor=(1, 0.5)
)
axes[0].set_ylabel("Magnetic Field (nT)")
# plot 2
axes[1].plot(subset.index, subset["Beta"], color="gray")
axes[1].set_ylim(-0.05, 1.7)
axes[1].set_ylabel("Beta")
# plot 3
axes[2].plot(subset.index, subset["V"], color="gray")
axes[2].set_ylabel("V(km/s)")
# axes[2].set_ylim(250, 500)
# plot 4
axes[3].plot(subset.index, subset["Vth"], color="gray")
axes[3].set_ylabel("$V_{th}$(km/s)")
# axes[3].set_ylim(5, 60)
# add vertical lines
for ax in axes:
ax.axvline(start, color="k")
ax.axvline(end, color="k")
ax.xaxis.grid(True, which="minor")
return fig, axes
```

In [7]:

```
plot_event(
pd.Timestamp("2001-10-31 22:00:00"), pd.Timestamp("2001-11-02 05:30:00"), data_train
);
```

Not all events will be "text-book" examples and don't always exhibit all typical characteristics.

Visualizing some more, randomly drawn events:

In [8]:

```
from problem import turn_prediction_to_event_list
```

In [9]:

```
events = turn_prediction_to_event_list(labels_train)
```

In [10]:

```
rng = np.random.RandomState(1234)
for i in rng.randint(0, len(events), 3):
plot_event(events[i].begin, events[i].end, data_train)
```