Interplanetary Coronal Mass Ejections (ICMEs) result from magnetic instabilities occurring in the Sun atmosphere, and interact with the planetary environment and may result in intense internal activity such as strong particle acceleration, so-called geomagnetic storms and geomagnetic induced currents. These effects have serious consequences regarding space and ground technologies and understanding them is part of the so-called space weather discipline.
ICMEs signatures as measured by in-situ spacecraft come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc. Although well visible by expert eyes, these patterns have quite variable characteristics which make naive automatization of their detection difficult.
The goal of this RAMP is to detect Interplanetary Coronal Mass Ejections (ICMEs) in the data measured by in-situ spacecraft.
ICMEs are the interplanetary counterpart of Coronal Mass Ejections (CMEs), the expulsion of large quantities of plasma and magnetic field that result from magnetic instabilities occurring in the Sun atmosphere (Kilpua et al. (2017) and references therein). They travel at several hundred or thousands of kilometers per second and, if in their trajectory, can reach Earth in 2-4 days.
ICMEs interact with the planetary environment and may result in intense internal activity such as strong particle acceleration, so-called geomagnetic storms and geomagnetic induced currents. These effects have serious consequences regarding space and ground technologies and understanding them is part of the so-called space weather discipline. ICMEs signatures as measured by in-situ spacecraft thus come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc. Although well visible by expert eyes, these patterns have quite variable characteristics which makes naive automatization of their detection difficult. To overcome this problem, Lepping et al. (2005) proposed an automatic detection method based on manually set thresholds on a set of physical parameters. However, the method allowed to detect only 60 % of the ICMEs with a high percentage of false positives (60%). Moreover, because of the subjectivity induced by the manually set threshold, the method had difficulties to create a reproducible and constant ICME catalog.
This challenge proposes to design the best algorithm to detect ICMEs from the most complete ICME catalog containing 657 events. We propose to give to the users a subset of this large dataset in order to test and calibrate their algorithm. We provide in-situ data measurement by the WIND spacecraft between 1997 and 2016, that we sampled to a 10 minutes resolution and for which we computed three additional features that proved to be useful in the visual identification of ICMEs. Using an appropriate metric, we will compare the true solution to the estimation. The goal is to provide an ICME catalog containing less than 10% of false positives while recording as much existing event as possible.
Formally, each instance will consist of a measurement of various physical parameters in the interplanetary medium. The training set will contain data measurement from 1997 to 2010 and the beginning and ending dates of the 438 ICMEs that were measured in this period : tstart and tend.
To download and run this notebook: download the full starting kit, with all the necessary files.
This starting kit requires the following dependencies:
numpy
pandas
pyarrow
scikit-learn
matplolib
jupyter
imbalanced-learn
We recommend to install those using conda
(using the Anaconda
distribution).
In addition, ramp-workflow
is needed. This can be installed from the master branch on GitHub:
python -m pip install https://api.github.com/repos/paris-saclay-cds/ramp-workflow/zipball/master
The public train and test data can be downloaded by running from the root of the starting kit:
python download_data.py
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
We start with inspecting the training data:
from problem import get_train_data
data_train, labels_train = get_train_data()
data_train.head()
B | Bx | Bx_rms | By | By_rms | Bz | Bz_rms | Na_nl | Np | Np_nl | ... | Range F 8 | Range F 9 | V | Vth | Vx | Vy | Vz | Beta | Pdyn | RmsBob | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1997-10-01 00:00:00 | 6.584763 | 3.753262 | 2.303108 | 0.966140 | 2.602693 | -5.179685 | 2.668414 | 2.290824 | 23.045732 | 24.352797 | ... | 2.757919e+09 | 2.472087e+09 | 378.313934 | 80.613098 | -351.598389 | -138.521454 | 6.956387 | 7.641340 | 5.487331e-15 | 0.668473 |
1997-10-01 00:10:00 | 6.036456 | 0.693559 | 1.810752 | -0.904843 | 2.165570 | -1.944006 | 2.372931 | 2.119593 | 23.000492 | 20.993362 | ... | 3.365612e+09 | 3.087122e+09 | 350.421021 | 69.919327 | -331.012146 | -110.970787 | -21.269474 | 9.149856 | 4.783776e-15 | 0.753848 |
1997-10-01 00:20:00 | 5.653682 | -4.684786 | 0.893058 | -2.668830 | 0.768677 | 1.479302 | 1.069266 | 2.876815 | 20.676191 | 17.496399 | ... | 1.675611e+09 | 1.558640e+09 | 328.324493 | 92.194435 | -306.114899 | -117.035202 | -13.018987 | 11.924199 | 3.719768e-15 | 0.282667 |
1997-10-01 00:30:00 | 5.461768 | -4.672382 | 1.081638 | -2.425630 | 0.765681 | 1.203713 | 0.934445 | 2.851195 | 20.730188 | 16.747108 | ... | 1.589037e+09 | 1.439569e+09 | 319.436859 | 94.230705 | -298.460938 | -110.403969 | -20.350492 | 16.032987 | 3.525211e-15 | 0.304713 |
1997-10-01 00:40:00 | 6.177846 | -5.230110 | 1.046126 | -2.872561 | 0.635256 | 1.505010 | 0.850657 | 3.317076 | 20.675701 | 17.524536 | ... | 1.812308e+09 | 1.529260e+09 | 327.545929 | 89.292595 | -307.303070 | -111.865845 | -12.313167 | 10.253789 | 3.694283e-15 | 0.244203 |
5 rows × 33 columns
The data consist of 30 primary input variables: the bulk velocity and its components $V,V_{x}, V_{y}, V_{z} $, the thermal velocity $V_{th}$, the magnetic field, its components and their RMS : $B, B_{x}, B_{y}, B_{z}, \sigma_{B_x}, \sigma_{B_y}, \sigma_{B_z}$, the density of protons and $\alpha$ particles obtained from both moment and non-linear analysis : $N_{p}, N_{p,nl}$ and $N_{a,nl}$ as well as 15 canals of proton flux between 0.3 and 10 keV.
The data are resampled to a 10 minute resolution.
In addition to the 30 input variables, we computed 3 additional features that will also serve as input variables : the plasma parameter $\beta$, defined as the ratio between the thermal and the magnetic pressure, the dynamic pressure $P_{dyn} = N_{p}V^{2}$ and the normalized magnetic fluctuations : $\sigma_{B} = \sqrt{(\sigma_{B_x}^{2}+\sigma_{B_y}^{2}+\sigma_{B_z}^{2}})/B$.
data_train.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 509834 entries, 1997-10-01 00:00:00 to 2007-12-31 23:50:00 Data columns (total 33 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 B 509834 non-null float32 1 Bx 509834 non-null float32 2 Bx_rms 509834 non-null float32 3 By 509834 non-null float32 4 By_rms 509834 non-null float32 5 Bz 509834 non-null float32 6 Bz_rms 509834 non-null float32 7 Na_nl 509834 non-null float32 8 Np 509834 non-null float32 9 Np_nl 509834 non-null float32 10 Range F 0 509834 non-null float32 11 Range F 1 509834 non-null float32 12 Range F 10 509834 non-null float32 13 Range F 11 509834 non-null float32 14 Range F 12 509834 non-null float32 15 Range F 13 509834 non-null float32 16 Range F 14 509834 non-null float32 17 Range F 2 509834 non-null float32 18 Range F 3 509834 non-null float32 19 Range F 4 509834 non-null float32 20 Range F 5 509834 non-null float32 21 Range F 6 509834 non-null float32 22 Range F 7 509834 non-null float32 23 Range F 8 509834 non-null float32 24 Range F 9 509834 non-null float32 25 V 509834 non-null float32 26 Vth 509834 non-null float32 27 Vx 509834 non-null float32 28 Vy 509834 non-null float32 29 Vz 509834 non-null float32 30 Beta 509834 non-null float64 31 Pdyn 509834 non-null float64 32 RmsBob 509834 non-null float32 dtypes: float32(31), float64(2) memory usage: 72.0 MB
The target labels consists of an indicator for each time step (O for background solar wind, 1 for solar storm, the event to detect):
labels_train.head()
1997-10-01 00:00:00 0 1997-10-01 00:10:00 0 1997-10-01 00:20:00 0 1997-10-01 00:30:00 0 1997-10-01 00:40:00 0 Name: label, dtype: int64
ICMEs signatures as measured by in-situ spacecraft thus come as patterns in time series of the magnetic field, the particle density, bulk velocity, temperature etc.
Let's visualize a typical event to inspect the patterns.
def plot_event(start, end, data, delta=36):
start = pd.to_datetime(start)
end = pd.to_datetime(end)
subset = data[
(start - pd.Timedelta(hours=delta)) : (end + pd.Timedelta(hours=delta))
]
fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(10, 15), sharex=True)
# plot 1
axes[0].plot(subset.index, subset["B"], color="gray", linewidth=2.5)
axes[0].plot(subset.index, subset["Bx"])
axes[0].plot(subset.index, subset["By"])
axes[0].plot(subset.index, subset["Bz"])
axes[0].legend(
["B", "Bx", "By", "Bz (nT)"], loc="center left", bbox_to_anchor=(1, 0.5)
)
axes[0].set_ylabel("Magnetic Field (nT)")
# plot 2
axes[1].plot(subset.index, subset["Beta"], color="gray")
axes[1].set_ylim(-0.05, 1.7)
axes[1].set_ylabel("Beta")
# plot 3
axes[2].plot(subset.index, subset["V"], color="gray")
axes[2].set_ylabel("V(km/s)")
# axes[2].set_ylim(250, 500)
# plot 4
axes[3].plot(subset.index, subset["Vth"], color="gray")
axes[3].set_ylabel("$V_{th}$(km/s)")
# axes[3].set_ylim(5, 60)
# add vertical lines
for ax in axes:
ax.axvline(start, color="k")
ax.axvline(end, color="k")
ax.xaxis.grid(True, which="minor")
return fig, axes
plot_event(
pd.Timestamp("2001-10-31 22:00:00"), pd.Timestamp("2001-11-02 05:30:00"), data_train
);
Not all events will be "text-book" examples and don't always exhibit all typical characteristics.
Visualizing some more, randomly drawn events:
from problem import turn_prediction_to_event_list
events = turn_prediction_to_event_list(labels_train)
rng = np.random.RandomState(1234)
for i in rng.randint(0, len(events), 3):
plot_event(events[i].begin, events[i].end, data_train)