Aircraft classification from radar trajectories
Current events on this problem
Keywords
radar_trajectories_starting_kit

Paris Saclay Center for Data Science

Radar trajectories RAMP: classifying flying objects using radar trajectories

Balázs Kégl (LAL/CNRS), Akin Kazakci (Mines ParisTech), Silvère Bonnabel (Mines ParisTech), Sami Jouaber (Mines ParisTech)

Introduction

The first goal of a traffic control center is to yield a clear vision of the state of the air traffic. This vision is based on various information about the flying aircrafts, namely radar measurements and other information provided by the planes themselves about their position, as well as some of their characteristics, through the Automatic Dependence Surveillance Broadcast (ADSB) system. To provide air traffic controllers with efficient decision making tools, one should be able to automatically detect a number of features regarding an aircraft, solely based on the radar measurements (that do not depend on whether the aircraft is cooperating or not). In particular, it is desirable to establish whether the aircraft is civilian or military, if its behavior is aggressive or not, what kind of maneuvers it is capable of performing, and which category of airplane it is (helicopter, fighter, liner, drone, etc.). Such information may prove useful for instance to adapt the tracking algorithm of the radars to the type of aircraft (highly maneuvering aircrafts may require more radar resource to be tracked), and more generally for decision making supporting tools.

Type of data used and prediction task

Radar measurements are classified since they contain information about French military operations and trips. Hence, we have collected aircraft trajectories (i.e., position over time) using the publicly available ADSB data. Those trajectories serve as a publicly available alternative to the radar position measurements. The goal of the classification task is to recognize the type of flying objects. There are 19 types, labeled by 4-digit strings: '1111', '1112', '1121', '1122', '1132', '1222', '1224', '1231', '1232', '1233', '1234', '1324', '1332', '1333', '1334', '4111', '4121', '4122', '4222'. Each digit denotes an attribute

  1. Species: Kind of aircraft      - 1 = Airplane - Liner      - 4 = Helicopter
  2. WTC: Wake Turbulence Category      - 1 = Light      - 2 = Middle      - 3 = Heavy
  3. EngType: Kind of engine      - 1 = Piston      - 2 = Turboprop      - 3 = Jet
  4. Engines: Number of engines

The models will be evaluated by cross entropy a.k.a negative log likelihood (NLL) $$

  • \frac{1}{19} \sum_{i=1}^{19} \log \hat{p}_i $$ where $\hat{p}_i$ is the predicted probability of the $i$th class given the input (radar trajectories, see below). Besides this official score, we will also display the accuracy (number of correctly classified types divided by the size of the test set) which is more human readable. The NLL score has the advantage that it incentivizes you to come up with unbiased probability estimates which can then be aggregated across aircraft types if we change the task (e.g., to predict the number of engines).

Requirements

  • numpy>=1.10.0
  • matplotlib>=1.5.0
  • pandas>=0.19.0
  • scikit-learn>=0.18
  • ramp-workflow
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split

Exploratory data analysis

Let's first import the problem.py script which has data reader routines.

In [2]:
import imp
module_name = str(int(1000000000 * np.random.rand()))
problem = imp.load_source('', 'problem.py')

Let's read the training data and labels from data/train.pkl into a pandas dataframe.

In [3]:
X_df, y_array = problem.get_train_data()
X_df
Out[3]:
data trajectory_id
index
0 [[142.3, 13065.2072311, -37.3482725781, 7.6629... 10
1 [[178.1, 12293.3296794, -41.8313170148, -2.782... 10
2 [[528.1, -2603.7653628, -173.591179302, 4.0816... 10
3 [[204.0, 11468.8976453, -75.9557784458, -3.710... 10
4 [[335.0, 6668.3636331, -107.36545794, 0.927818... 10
5 [[38.9, 17597.3725596, -47.7188174498, 8.40393... 10
6 [[264.2, 9455.57780146, -76.1110502916, -2.119... 10
7 [[238.6, 9664.36696588, 29.6750335695, 8.19662... 10
8 [[363.7, 5592.22368649, -125.457647118, -0.083... 10
9 [[49.2, 17759.6881089, -5.62268972554, -5.7678... 10
10 [[472.1, -42.1417064816, 7.0713644971, 1.87018... 12
11 [[291.0, 2298.736692, -0.0323523516254, 0.4857... 12
12 [[675.4, -4260.21887764, -13.2879657703, 0.545... 12
13 [[556.4, -1825.59914711, -8.14625975095, 0.900... 12
14 [[330.8, 2426.90138763, 3.87205271099, -0.1364... 12
15 [[685.9, -4395.5911207, -22.1951759656, -1.736... 12
16 [[14.9, 4911.55669731, 12.1206841511, 0.370553... 12
17 [[357.8, 2686.60159488, 21.9511101768, -0.1748... 12
18 [[132.2, 4480.04507975, -16.5359879127, -1.786... 12
19 [[659.1, -3976.53311388, -3.88350480198, 0.680... 12
20 [[106.3, 1146.38431717, -8.18094992124, 2.1219... 14
21 [[106.8, 1142.22162909, -7.19163728452, 2.7712... 14
22 [[461.8, 187.279251201, -21.6533741143, 0.8091... 14
23 [[373.2, 198.704394999, -15.093281687, 2.11122... 14
24 [[55.3, -940.90046099, 26.9039372166, -2.71493... 14
25 [[176.9, 2250.71107565, 14.3212124757, -1.0962... 14
26 [[314.5, -52.1320242234, -12.4979824227, 2.874... 14
27 [[6.4, -3425.34755694, 21.9551620197, -5.30540... 14
28 [[117.9, 1669.27922476, 86.3650682046, -3.1277... 14
29 [[378.3, 98.9739328194, -3.80143135981, 6.1554... 14
... ... ...
4530 [[148.6, 2783.40869038, -46.6481307732, -2.041... 47
4531 [[237.4, -525.434142389, -47.5379049754, 0.496... 47
4532 [[192.1, 945.589811744, -29.4108783189, 1.0554... 47
4533 [[287.9, -2341.72297706, -35.260872666, -0.594... 47
4534 [[53.8, 3533.5143775, -12.6013107342, 1.739942... 47
4535 [[209.9, 510.906686786, -24.5684797162, -1.280... 47
4536 [[76.3, 3601.35436204, 4.34863452767, 0.206743... 47
4537 [[454.6, -3171.62984681, 5.00043957379, -0.212... 47
4538 [[307.5, -3004.67800843, -26.8507987693, 0.030... 47
4539 [[240.3, -654.899098708, -43.4519449136, 1.235... 47
4540 [[386.5, -1387.77040902, -31.8813172193, 2.091... 6
4541 [[354.8, 400.292189388, -68.3890927356, -1.006... 6
4542 [[248.6, 1062.99894428, 14.3608892973, -1.2425... 6
4543 [[11.8, 2582.63360695, -8.71923928381, -2.7321... 6
4544 [[470.3, -2022.66027478, 27.1378827479, 2.3485... 6
4545 [[396.4, -1587.48940415, -12.248539134, 1.1476... 6
4546 [[322.5, 1671.31871288, -5.84098864689, -1.815... 6
4547 [[488.8, -1166.00487404, 56.2008798405, 0.3148... 6
4548 [[378.2, -1048.35005812, -48.8377903704, 1.522... 6
4549 [[372.4, -746.457891409, -53.1925133702, 1.095... 6
4550 [[98.1, 4955.26590254, -4.73412492723, 0.04181... 9
4551 [[56.4, 5127.94332756, -4.54195074342, 0.04850... 9
4552 [[229.4, 3424.07912631, -16.2357814001, -0.672... 9
4553 [[161.6, 4300.92124025, -14.1158033421, 0.0712... 9
4554 [[62.6, 5103.2111739, -3.59358707414, 0.022763... 9
4555 [[71.5, 5069.3993689, -4.30435198872, -0.02487... 9
4556 [[131.3, 4751.98564098, -13.6974594605, -0.527... 9
4557 [[142.3, 4572.96214542, -16.6859366302, 0.1635... 9
4558 [[239.3, 3219.52936855, -24.8032633276, -0.753... 9
4559 [[104.4, 4927.94947637, -4.02758534422, 0.0481... 9

4560 rows × 2 columns

In [4]:
y_array
Out[4]:
array(['1111', '1111', '1111', ..., '4222', '4222', '4222'], dtype=object)

The training data has two columns. trajectory_id is the id of the long trajectory from which we sampled 1000 consecutive trajectory points at random starting times. To have enough training data, we took 10 (possibly overlapping) samples from each long trajectory. The trajectory itself is found in the data column. Each instance is represented by a 2D numpy array where each column is a trajectory feature and each row is a time point. Let us convert a single instance into a pandas table to visualize it.

In [5]:
columns = [
    'T', 'X', 'Vx', 'Ax', 'Jx', 'Y', 'Vy','Ay', 'Jy',
    'Z', 'Vz', 'Az', 'Jz', 'U2', 'C2', 'U3', 'C3', 'T3']
i = 0
trajectory_df = pd.DataFrame(X_df['data'][i], columns=columns)
trajectory_df
Out[5]:
T X Vx Ax Jx Y Vy Ay Jy Z Vz Az Jz U2 C2 U3 C3 T3
0 142.3 13065.207231 -37.348273 7.662910 0.558759 10648.565303 -81.879076 13.799308 1.480220 278.248441 0.275643 0.042042 -0.014298 89.994869 1.708234e-06 89.995291 0.000154 -0.000016
1 142.4 13061.548323 -35.959268 7.698060 0.532016 10640.499794 -79.613038 13.917953 1.442313 278.276444 0.283573 0.040485 -0.014462 87.357340 1.929820e-06 87.357800 0.000169 -0.000015
2 142.5 13058.059018 -34.561718 7.729593 0.505136 10632.705141 -77.324428 14.031347 1.404058 278.305394 0.291147 0.038914 -0.014616 84.696987 2.190796e-06 84.697487 0.000186 -0.000014
3 142.6 13054.741148 -33.156487 7.757482 0.478135 10625.185351 -75.014413 14.139425 1.365475 278.335239 0.298362 0.037330 -0.014762 82.015333 2.499874e-06 82.015876 0.000206 -0.000014
4 142.7 13051.596381 -31.744444 7.781706 0.451030 10617.944223 -72.684172 14.242129 1.326588 278.365926 0.305219 0.035735 -0.014899 79.313924 2.868085e-06 79.314511 0.000228 -0.000013
5 142.8 13048.626221 -30.326460 7.802246 0.423839 10610.985345 -70.334893 14.339404 1.287418 278.397402 0.311716 0.034130 -0.015027 76.594330 3.309504e-06 76.594965 0.000254 -0.000012
6 142.9 13045.832000 -28.903414 7.819086 0.396578 10604.312091 -67.967776 14.431199 1.247987 278.429614 0.317856 0.032515 -0.015146 73.858147 3.842247e-06 73.858831 0.000284 -0.000012
7 143.0 13043.214882 -27.476185 7.832214 0.369263 10597.927615 -65.584024 14.517467 1.208315 278.462509 0.323637 0.030893 -0.015256 71.106997 4.489832e-06 71.107733 0.000320 -0.000011
8 143.1 13040.775853 -26.045654 7.841621 0.341912 10591.834848 -63.184851 14.598164 1.168423 278.496031 0.329061 0.029265 -0.015359 68.342531 5.283088e-06 68.343323 0.000362 -0.000010
9 143.2 13038.515727 -24.612703 7.847303 0.314539 10586.036493 -60.771474 14.673252 1.128332 278.530128 0.334128 0.027631 -0.015452 65.566434 6.262841e-06 65.567285 0.000411 -0.000010
10 143.3 13036.435139 -23.178215 7.849256 0.287160 10580.535029 -58.345115 14.742694 1.088062 278.564745 0.338842 0.025994 -0.015538 62.780428 7.483758e-06 62.781343 0.000471 -0.000009
11 143.4 13034.534547 -21.743073 7.847482 0.259792 10575.332698 -55.906999 14.806461 1.047633 278.599828 0.343202 0.024353 -0.015616 59.986280 9.019915e-06 59.987261 0.000542 -0.000009
12 143.5 13032.814229 -20.308156 7.841987 0.232449 10570.431514 -53.458353 14.864525 1.007063 278.635325 0.347213 0.022712 -0.015685 57.185808 1.097299e-05 57.186862 0.000629 -0.000008
13 143.6 13031.274284 -18.874343 7.832777 0.205147 10565.833255 -51.000405 14.916862 0.966371 278.671181 0.350875 0.021069 -0.015747 54.380899 1.348449e-05 54.382031 0.000735 -0.000008
14 143.7 13029.914631 -17.442509 7.819866 0.177900 10561.539464 -48.534384 14.963453 0.925577 278.707345 0.354192 0.019428 -0.015801 51.573516 1.675442e-05 51.574733 0.000866 -0.000008
15 143.8 13028.735006 -16.013525 7.803267 0.150724 10557.551445 -46.061519 15.004284 0.884699 278.743764 0.357168 0.017789 -0.015848 48.765731 2.107014e-05 48.767039 0.001029 -0.000007
16 143.9 13027.734968 -14.588257 7.783001 0.123634 10553.870271 -43.583035 15.039343 0.843754 278.780386 0.359805 0.016152 -0.015888 45.959745 2.685214e-05 45.961154 0.001236 -0.000007
17 144.0 13026.913896 -13.167565 7.759087 0.096643 10550.496772 -41.100156 15.068623 0.802760 278.817160 0.362107 0.014520 -0.015920 43.157938 3.472815e-05 43.159457 0.001501 -0.000006
18 144.1 13026.270988 -11.752305 7.731553 0.069766 10547.431546 -38.614101 15.092119 0.761734 278.854037 0.364079 0.012893 -0.015945 40.362922 4.565620e-05 40.364564 0.001845 -0.000006
19 144.2 13025.805267 -10.343322 7.700427 0.043017 10544.674953 -36.126085 15.109834 0.720695 278.890967 0.365724 0.011273 -0.015963 37.577631 6.113466e-05 37.579410 0.002300 -0.000006
20 144.3 13025.515575 -8.941457 7.665740 0.016410 10542.227118 -33.637318 15.121770 0.679657 278.927902 0.367047 0.009659 -0.015975 34.805442 8.357160e-05 34.807378 0.002912 -0.000006
21 144.4 13025.400582 -7.547539 7.627530 -0.010040 10540.087934 -31.149003 15.127937 0.638639 278.964794 0.368053 0.008054 -0.015980 32.050362 1.169561e-04 32.052475 0.003753 -0.000005
22 144.5 13025.458782 -6.162389 7.585836 -0.036321 10538.257057 -28.662335 15.128347 0.597656 279.001598 0.368746 0.006458 -0.015979 29.317307 1.681237e-04 29.319626 0.004934 -0.000005
23 144.6 13025.688498 -4.786818 7.540699 -0.062418 10536.733917 -26.178501 15.123016 0.556723 279.038267 0.369133 0.004872 -0.015971 26.612545 2.492355e-04 26.615105 0.006639 -0.000005
24 144.7 13026.087883 -3.421624 7.492167 -0.088318 10535.517711 -23.698680 15.111963 0.515858 279.074759 0.369217 0.003298 -0.015957 23.944414 3.828477e-04 23.947260 0.009175 -0.000005
25 144.8 13026.654922 -2.067595 7.440289 -0.114007 10534.607414 -21.224040 15.095212 0.475074 279.111031 0.369006 0.001735 -0.015938 21.324512 6.127281e-04 21.327704 0.013076 -0.000005
26 144.9 13027.387436 -0.725504 7.385118 -0.139474 10534.001771 -18.755736 15.072791 0.434388 279.147040 0.368504 0.000184 -0.015912 18.769763 1.027875e-03 18.773380 0.019305 -0.000004
27 145.0 13028.283084 0.603889 7.326710 -0.164703 10533.699312 -16.294915 15.044730 0.393815 279.182748 0.367718 -0.001352 -0.015882 16.306101 1.817243e-03 16.310247 0.029643 -0.000004
28 145.1 13047.789648 7.549537 5.934406 -0.564544 10571.103173 -2.429443 12.313257 -0.406600 278.931022 0.279053 0.017832 -0.010012 7.930807 2.714192e-02 7.935714 0.214992 0.000023
29 145.2 13049.703302 8.922351 5.836189 -0.595064 10572.926944 0.188983 12.198431 -0.460802 278.951377 0.275495 0.017113 -0.009806 8.924352 1.698451e-02 8.928603 0.151447 0.000023
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
970 239.3 9693.412351 36.407598 7.225480 -0.883832 6854.578037 36.305154 7.688828 -0.823186 329.776813 -0.251085 -0.067414 0.005963 51.415731 2.519794e-06 51.416344 0.000130 0.000112
971 239.4 9697.852807 37.297321 7.082587 -0.901375 6859.003217 37.239862 7.552525 -0.840559 329.742808 -0.259720 -0.066207 0.006159 52.705763 2.324094e-06 52.706403 0.000123 0.000110
972 239.5 9702.362928 38.169822 6.938616 -0.918740 6863.502840 38.158050 7.415149 -0.857759 329.708087 -0.268217 -0.064983 0.006356 53.971957 2.153209e-06 53.972623 0.000116 0.000107
973 239.6 9706.941762 39.025339 6.793548 -0.935947 6868.076022 39.059956 7.276681 -0.874803 329.672661 -0.276577 -0.063742 0.006552 55.214647 2.003297e-06 55.215340 0.000111 0.000104
974 239.7 9711.588388 39.864099 6.647360 -0.953010 6872.721906 39.945803 7.137099 -0.891708 329.636540 -0.284800 -0.062483 0.006747 56.434152 1.871221e-06 56.434871 0.000106 0.000102
975 239.8 9716.301914 40.686316 6.500030 -0.969949 6877.439667 40.815806 6.996378 -0.908490 329.599734 -0.292888 -0.061208 0.006943 57.630776 1.754397e-06 57.631520 0.000101 0.000099
976 239.9 9721.081473 41.492198 6.351530 -0.986779 6882.228503 41.670170 6.854492 -0.925167 329.562252 -0.300843 -0.059915 0.007138 58.804809 1.650687e-06 58.805579 0.000097 0.000096
977 240.0 9725.926229 42.281940 6.201833 -1.003517 6887.087641 42.509089 6.711412 -0.941756 329.524104 -0.308664 -0.058604 0.007334 59.956527 1.558313e-06 59.957321 0.000093 0.000094
978 240.1 9730.835369 43.055728 6.050908 -1.020180 6892.016334 43.332748 6.567109 -0.958271 329.485299 -0.316354 -0.057276 0.007530 61.086191 1.475781e-06 61.087010 0.000090 0.000091
979 240.2 9735.808108 43.813739 5.898723 -1.036782 6897.013858 44.141322 6.421548 -0.974731 329.445843 -0.323913 -0.055930 0.007727 62.194051 1.401839e-06 62.194895 0.000087 0.000089
980 240.3 9740.843687 44.556140 5.745244 -1.053341 6902.079518 44.934978 6.274696 -0.991149 329.405745 -0.331342 -0.054565 0.007923 63.280343 1.335425e-06 63.281211 0.000085 0.000086
981 240.4 9745.941372 45.283089 5.590432 -1.069872 6907.212644 45.713874 6.126515 -1.007542 329.365011 -0.338643 -0.053182 0.008121 64.345291 1.275636e-06 64.346182 0.000082 0.000084
982 240.5 9751.100457 45.994738 5.434249 -1.086390 6912.412592 46.478157 5.976965 -1.023926 329.323648 -0.345815 -0.051779 0.008319 65.389104 1.221704e-06 65.390019 0.000080 0.000081
983 240.6 9756.320264 46.691225 5.276653 -1.102911 6917.678745 47.227968 5.826005 -1.040316 329.281662 -0.352861 -0.050358 0.008518 66.411983 1.172969e-06 66.412920 0.000078 0.000079
984 240.7 9761.600141 47.372685 5.117601 -1.119450 6923.010514 47.963439 5.673591 -1.056726 329.239058 -0.359780 -0.048916 0.008718 67.414113 1.128864e-06 67.415073 0.000076 0.000076
985 240.8 9766.939466 48.039240 4.957046 -1.136022 6928.407340 48.684693 5.519677 -1.073173 329.195842 -0.366574 -0.047455 0.008919 68.395671 1.088900e-06 68.396654 0.000074 0.000074
986 240.9 9772.337647 48.691006 4.794940 -1.152643 6933.868691 49.391845 5.364214 -1.089672 329.152016 -0.373243 -0.045973 0.009121 69.356820 1.052653e-06 69.357824 0.000073 0.000072
987 241.0 9777.794121 49.328092 4.631232 -1.169327 6939.394068 50.085002 5.207149 -1.106236 329.107586 -0.379788 -0.044470 0.009325 70.297710 1.019754e-06 70.298736 0.000072 0.000070
988 241.1 9783.308358 49.950594 4.465867 -1.186090 6944.983003 50.764262 5.048431 -1.122882 329.062555 -0.386210 -0.042945 0.009530 71.218482 9.898833e-07 71.219529 0.000071 0.000068
989 241.2 9788.879862 50.558603 4.298789 -1.202946 6950.635060 51.429716 4.888000 -1.139625 329.016926 -0.392508 -0.041397 0.009737 72.119263 9.627609e-07 72.120331 0.000069 0.000066
990 241.3 9794.508168 51.152202 4.129939 -1.219911 6956.349838 52.081445 4.725799 -1.156479 328.970700 -0.398684 -0.039827 0.009945 73.000169 9.381419e-07 73.001257 0.000068 0.000063
991 241.4 9800.192849 51.731461 3.959254 -1.236999 6962.126972 52.719521 4.561764 -1.173459 328.923880 -0.404738 -0.038234 0.010155 73.861303 9.158115e-07 73.862412 0.000068 0.000062
992 241.5 9805.933513 52.296446 3.786669 -1.254226 6967.966134 53.344009 4.395830 -1.190581 328.876466 -0.410669 -0.036616 0.010367 74.702755 8.955813e-07 74.703883 0.000067 0.000060
993 241.6 9811.729806 52.847210 3.612114 -1.271606 6973.867032 53.954964 4.227927 -1.207860 328.828460 -0.416479 -0.034973 0.010582 75.524603 8.772855e-07 75.525751 0.000066 0.000058
994 241.7 9817.581412 53.383796 3.435518 -1.289155 6979.829414 54.552428 4.057984 -1.225310 328.779862 -0.422166 -0.033305 0.010798 76.326909 8.607785e-07 76.328077 0.000066 0.000056
995 241.8 9823.488056 53.906240 3.256806 -1.306888 6985.853068 55.136438 3.885925 -1.242948 328.730672 -0.427732 -0.031610 0.011017 77.109723 8.459319e-07 77.110910 0.000065 0.000054
996 241.9 9829.449502 54.414564 3.075898 -1.324821 6991.937826 55.707017 3.711670 -1.260788 328.680888 -0.433175 -0.029888 0.011239 77.873080 8.326329e-07 77.874285 0.000065 0.000053
997 242.0 9835.465559 54.908782 2.892711 -1.342969 6998.083558 56.264180 3.535137 -1.278847 328.630509 -0.438496 -0.028137 0.011463 78.616997 8.207825e-07 78.618220 0.000065 0.000051
998 242.1 9841.536079 55.388896 2.707158 -1.361348 7004.290183 56.807928 3.356238 -1.297139 328.579534 -0.443694 -0.026358 0.011690 79.341480 8.102936e-07 79.342721 0.000064 0.000049
999 242.2 9847.660952 55.854891 2.519150 -1.379973 7010.557658 57.338250 3.174882 -1.315681 328.527960 -0.448768 -0.024548 0.011921 80.046510 8.010903e-07 80.047768 0.000064 0.000048

1000 rows × 18 columns

Here is the meaning of the columns:

  • T: time
  • X, Vx, Ax, Jx: X coordinate of the resp. position, velocity, acceleration, and jerk (i.e. first to third order derivative) of the aircraft
  • Y, Vy, Ay, Jy: Y coordinate of the resp. position, velocity, acceleration, and jerk (i.e. first to third order derivative) of the aircraft
  • Z, Vz, Az, Jz: Z coordinate of the resp. position, velocity, acceleration, and jerk (i.e. first to third order derivative) of the aircraft- U2: absolute value of velocity vector projected onto the ground plane
  • C3: curvature of the 3D trajectory: $\frac{||P'\times P''||}{||P'||^3}$ where $P = (X, Y, Z)$
  • T3: torsion of the 3D trajectory: $\frac{\det(P',P'',P''')}{||P'\times P''||^2}$
  • C2: curvature of the trajectory projected onto the ground plane: $\frac{||\tilde P'\times \tilde P''||}{||\tilde P'||^3}$ where $\tilde P=(X,Y,0)$
  • U3: absolute value of the 3D velocity vector

The pipeline

For submitting at the RAMP site, you will have to write two classes, saved in two different files:

  • the class FeatureExtractor, which will be used to extract features for classification from the dataset and produce a numpy array of size (number of samples $\times$ number of features).
  • a class Classifier to predict aircraft type

Feature extractor

The feature extractor implements a transform member function. It is saved in the file submissions/starting_kit/feature_extractor.py. It receives the pandas dataframe X_df defined at the beginning of the notebook. It should produce a numpy array representing the extracted features, which will then be used for the classification. The following simple feature extractor simply computes the mean of each feature and produces an 18 dimensional feature vector for each trajectory.

In [6]:
import numpy as np

class FeatureExtractor():
    def __init__(self):
        pass

    # use this if you need to learn something at training time that depends on the labels
    # this will not be called on the test instances
    def fit(self, X_df, y):
        pass

    # this will be called both on the training and test instance
    def transform(self, X_df):
        data_matrix = np.asarray(list(X_df['data'].values))
        # taking the mean of each data matrix variable, averaging over axis 1
        means = data_matrix.mean(axis=1)
        return means

Let's try it on the training data.

In [7]:
fe = FeatureExtractor()
fe.fit(X_df, y_array)  
X_array = fe.transform(X_df)
X_array.shape
Out[7]:
(4560, 18)

Classifier

The classifier follows a classical scikit-learn classifier template. In its simplest form it takes a scikit-learn pipeline, assigns it to self.clf in __init__, then calls its fit and predict_proba functions in the corresponding member funtions.

In [8]:
from sklearn.base import BaseEstimator
from sklearn.ensemble import RandomForestClassifier


class Classifier(BaseEstimator):
    def __init__(self):
        pass

    def fit(self, X, y):
        self.clf = RandomForestClassifier(
            n_estimators=2, max_leaf_nodes=2, random_state=61)
        self.clf.fit(X, y)

    def predict_proba(self, X):
        return self.clf.predict_proba(X)

Let's try it on the training data, transformed into features. The output is a probability array, each row is 19 dimensional since we have 19 classes.

In [9]:
clf = Classifier()
clf.fit(X_array, y_array)
y_proba_array = clf.predict_proba(X_array)
y_proba_array.shape
Out[9]:
(4560, 19)

Scoring

To score, we forst convert the probability array into a label array by taking the label that received the highest probability.

In [10]:
y_pred_array = [problem._prediction_label_names[y] for y in np.argmax(y_proba_array, axis=1)]

Then we can compute the accuracy using accuracy_score of scikit learn.

In [11]:
from sklearn.metrics import accuracy_score 
accuracy_score(y_array, y_pred_array)
Out[11]:
0.10482456140350878

The following cell will evaluate the workflow on the test data.

In [12]:
X_test_df, y_test_array = problem.get_test_data()
X_test_array = fe.transform(X_test_df)
y_test_proba_array = clf.predict_proba(X_test_array)
y_test_pred_array = [problem._prediction_label_names[y] for y in np.argmax(y_test_proba_array, axis=1)]
accuracy_score(y_test_array, y_test_pred_array)
Out[12]:
0.10478468899521531

Local testing (before submission)

You can start playing with the cells above, modify the feature extractor and the classifier and evaluate the accuracy of the workflow. However, it is important that you test your submission files before submitting them. First you should save your feature extractor and classifier classes into submissions/starting_kit/feature_extractor.py and submissions/starting_kit/classifier.py, respectively. Then run the ramp_test_submission script either at the command line or here. This script executes a more robust cross validation on the training set, defined in problem.py. If this test runs without error, you can submit your feature extractor and classifier to ramp.studio.

In [13]:
!ramp_test_submission
Testing Aircraft classification from radar trajectories
Reading train and test files from ./data ...
Reading cv ...
Training ./submissions/starting_kit ...
CV fold 0
	score    nll    acc
	train  2.489  0.107
	valid  2.554  0.098
	test   2.600  0.104
CV fold 1
	score    nll    acc
	train  2.490  0.104
	valid  2.496  0.109
	test   2.547  0.104
CV fold 2
	score    nll    acc
	train  2.506  0.104
	valid  2.477  0.109
	test   2.538  0.104
CV fold 3
	score    nll    acc
	train  2.505  0.104
	valid  2.542  0.109
	test   2.694  0.104
CV fold 4
	score    nll    acc
	train  2.490  0.104
	valid  2.519  0.109
	test   2.539  0.104
CV fold 5
	score    nll    acc
	train  2.508  0.104
	valid  2.537  0.109
	test   2.670  0.104
CV fold 6
	score    nll    acc
	train  2.498  0.107
	valid  2.486  0.098
	test   2.545  0.104
CV fold 7
	score    nll    acc
	train  2.493  0.107
	valid  2.483  0.114
	test   2.541  0.111
----------------------------
Mean CV scores
----------------------------
	score             nll             acc
	train  2.497 ± 0.0074  0.105 ± 0.0014
	valid  2.512 ± 0.0282  0.107 ± 0.0054
	test   2.584 ± 0.0598  0.105 ± 0.0024
----------------------------
Bagged scores
----------------------------
	score    nll
	valid  2.515
	test   2.535

You can also edit and test other submissions saved into submissions/<submission_name>. For example, there is a submission in submissions/more_features using a bigger random forest and more features. You can test it using the following command.

In [14]:
!ramp_test_submission --submission more_features
Testing Aircraft classification from radar trajectories
Reading train and test files from ./data ...
Reading cv ...
Training ./submissions/more_features ...
CV fold 0
	score    nll    acc
	train  1.042  0.780
	valid  1.739  0.429
	test   1.707  0.450
CV fold 1
	score    nll    acc
	train  1.067  0.757
	valid  1.529  0.486
	test   1.737  0.462
CV fold 2
	score    nll    acc
	train  1.065  0.766
	valid  1.677  0.427
	test   1.693  0.447
CV fold 3
	score    nll    acc
	train  1.070  0.752
	valid  1.565  0.495
	test   1.660  0.457
CV fold 4
	score    nll    acc
	train  1.061  0.748
	valid  1.559  0.472
	test   1.646  0.462
CV fold 5
	score    nll    acc
	train  1.051  0.776
	valid  1.592  0.453
	test   1.724  0.447
CV fold 6
	score    nll    acc
	train  1.059  0.776
	valid  1.640  0.385
	test   1.711  0.444
CV fold 7
	score    nll    acc
	train  1.046  0.774
	valid  1.597  0.440
	test   1.630  0.476
----------------------------
Mean CV scores
----------------------------
	score             nll             acc
	train  1.058 ± 0.0097  0.766 ± 0.0113
	valid  1.612 ± 0.0648  0.448 ± 0.0336
	test   1.688 ± 0.0362    0.456 ± 0.01
----------------------------
Bagged scores
----------------------------
	score    nll
	valid  1.585
	test   1.607

Submitting to ramp.studio

Once you found a good feature extractor and classifier, you can submit them to ramp.studio. First, if it is your first time using RAMP, sign up, otherwise log in. Then find an open event on the particular problem, for example, the event radar_trajectories for this RAMP. Sign up for the event. Both signups are controled by RAMP administrators, so there can be a delay between asking for signup and being able to submit.

Once your signup request is accepted, you can go to your sandbox and copy-paste (or upload) feature_extractor.py and classifier.py from submissions/starting_kit. Save it, rename it, then submit it. The submission is trained and tested on our backend in the same way as ramp_test_submission does it locally. While your submission is waiting in the queue and being trained, you can find it in the "New submissions (pending training)" table in my submissions. Once it is trained, you get a mail, and your submission shows up on the public leaderboard. If there is an error (despite having tested your submission locally with ramp_test_submission), it will show up in the "Failed submissions" table in my submissions. You can click on the error to see part of the trace.

After submission, do not forget to give credits to the previous submissions you reused or integrated into your submission.

The data set we use at the backend is usually different from what you find in the starting kit, so the score may be different.

The usual way to work with RAMP is to explore solutions, add feature transformations, select models, perhaps do some AutoML/hyperopt, etc., locally, and checking them with ramp_test_submission.

More information

You can find more information in the wiki of the ramp-workflow library.

Contact

Don't hesitate to contact us.