Classification of patterns on the atmosphere of extra solar hot Jupiter¶

Martial Mancip (Maison de la Simulation, Saclay), François Caud, Thomas Moreau (DATAIA, Univ. Paris-Saclay)

Introduction¶

In-order to understand the results of recent observations of exoplanets, physics-based models have become increasingly complex. Unfortunately this increases both the computational cost and output size of said models. We intend to explore if AI-image-recognition can alleviate this burden. DYNAMICO was used to run a series of HD209458-like simulations with different orbital-radii. Simulations fall into two regimes: simulations with a shorter orbital-radii exhibit significant global mixing which shapes the entire atmospheres dynamics. Whereas, simulations with longer orbital-radii exhibit negligible mixing except at mid-pressures. It is believed that image-classification may play an important role in future, computational, atmospheric studies. However special care must be paid to the data feed into the model, from the colourmap, to training the CNN on features with enough breadth and complexity that the CNN can learn to detect them. However, by using preliminary-studies and prior-models, this should be more than achievable for future exascale calculations, allowing for a significant reduction in future workloads and computational resources.

Patterns detected and labeled in simulated temperature maps are:

  • day-side hot-spots in which the zonal winds have caused significant horizontal thermal advection (and whose shape is typically referred to as a butterfly in the hot Jupiter community - e.g. Figure b/c) -> butterfly
  • longitudinally homogenised and latitudinally symmetric thermal bands (e.g. Figure d) -> banded
  • day-side hot-spots in which radiative affects dominate over advective dynamics (i.e. an irradiative hot-spot which has not been significantly advected by horizontal winds - e.g. Figure e and Figure a to a lesser extent) -> locked
  • latitudinally asymmetric thermal structures (see, e.g. Figure h) -> asymetric

The simulation data-set comprises examples with orbital radii between 0.012au and 0.334au.

The goal of this challenge is to find ML models capable of classifying those patterns on images from physics-based simulations.

patterns.png
Examples of patterns in simulations of the atmosphere of an extra hot Jupiter (from [this article](https://arxiv.org/abs/2309.10640); see for more details)

Requirements¶

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torchvision import transforms, models
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.metrics import classification_report
from sklearn.model_selection import LeaveOneGroupOut

Data exploration¶

Data consist of simulated grey-scale images of temperature maps on the surface of Hot Jupiter extra solar planets. We have five target labels corresponding to the four pattern classes plus a negative class (no pattern) :

In [2]:
CLASSES = ["asymetric", "banded", "locked", "butterfly", "no_pattern"]

Here is the mapping between integers and categories:

In [3]:
# Mapping int to categories
int_to_cat = {
    0: "asymetric",
    1: "banded",
    2: "locked",
    3: "butterfly",
    4: "no_pattern",
}
In [4]:
list(int_to_cat)
Out[4]:
[0, 1, 2, 3, 4]

Train set¶

The training set has 620 examples:

In [5]:
X_train = np.load("./data/X_train.npy")
X_train.shape
Out[5]:
(620, 90, 180)

Images are 90 x 180

In [6]:
y_train_df = pd.read_csv("./data/y_train.csv")
y_train_df
Out[6]:
simulation category cat_num
0 Hot_0036_Locked banded 1
1 Hot_0036_Locked banded 1
2 Hot_0021_match banded 1
3 Hot_0021_Locked banded 1
4 Hot_0012_Locked banded 1
... ... ... ...
615 Cool_0334_Locked no_pattern 4
616 Hot_0021_Locked no_pattern 4
617 Hot_0036_Locked no_pattern 4
618 Hot_0012_Locked banded 1
619 Hot_0021_Locked no_pattern 4

620 rows × 3 columns

Columns in y_df:

  • 'simulation': correspond to simulations with different physical parameters (orbital radii...)
  • 'category': correspond to the 5 target classes to classify examples from
  • 'cat_num': classes coded in integers (0 to 4)

Let's plot some of the data with their label:

In [7]:
plt.imshow(X_train[200], cmap='gray');
print(f"class: {y_train_df.category.iloc[200]}")
class: asymetric
In [8]:
plt.imshow(X_train[617], cmap='gray');
print(f"class: {y_train_df.category.iloc[617]}")
class: no_pattern
In [9]:
plt.imshow(X_train[72], cmap='gray');
print(f"class: {y_train_df.category.iloc[72]}")
class: banded
In [10]:
plt.imshow(X_train[349], cmap='gray');
print(f"class: {y_train_df.category.iloc[349]}")
class: butterfly
In [11]:
plt.imshow(X_train[222], cmap='gray');
print(f"class: {y_train_df.category.iloc[222]}")
class: locked

Distribution of classes in the train set:¶

In [12]:
class_counts = y_train_df['category'].value_counts()

plt.figure(figsize=(10, 6))
class_counts.plot(kind='bar')
plt.title('Class Distribution')
plt.xlabel('Class')
plt.ylabel('Number of Examples')
plt.xticks(rotation=0)
plt.show()

Classes are imbalanced with the no_pattern one dominating.

Pixel intensity distribution¶

Pixel intensity across all images in the train set:

In [13]:
all_pixels = X_train.flatten()

plt.figure(figsize=(10, 6))
plt.hist(all_pixels, bins=50, color='gray')
plt.title('Pixel Intensity Distribution Across All Images')
plt.xlabel('Pixel Intensity')
plt.ylabel('Frequency')
plt.show()

Pixel intensity distribution across images of each class:

In [14]:
unique_categories = y_train_df['category'].unique()
num_categories = len(unique_categories)
fig, axes = plt.subplots(num_categories, 1, figsize=(8, 6 * num_categories))

for i, category in enumerate(unique_categories):
    # Get the indices for the current category
    category_indices = y_train_df[y_train_df['category'] == category].index
    # Select the images belonging to the current category
    category_images = X_train[category_indices]
    # Flatten the images to a single array of pixel values
    category_pixels = category_images.flatten()

    axes[i].hist(category_pixels, bins=50, color='gray')
    axes[i].set_title(f'Pixel Intensity Distribution for Category: {category}')
    axes[i].set_xlabel('Pixel Intensity')
    axes[i].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

Plot of the average image for each class:¶

In [15]:
unique_categories = y_train_df['category'].unique()
num_categories = len(unique_categories)
fig, axes = plt.subplots(1, num_categories, figsize=(5 * num_categories, 5))

for i, category in enumerate(unique_categories):
    # Get the indices for the current category
    category_indices = y_train_df[y_train_df['category'] == category].index
    # Select the images belonging to the current category
    category_images = X_train[category_indices]
    # Calculate the average image
    average_image = np.mean(category_images, axis=0)

    ax = axes[i]
    ax.imshow(average_image, cmap='gray')
    ax.set_title(f'Average Image for Category: {category}')
    ax.axis('off')

plt.tight_layout()
plt.show()

Test set¶

Testing set has 68 examples:

In [16]:
X_test = np.load("./data/X_test.npy")
X_test.shape
Out[16]:
(68, 90, 180)
In [17]:
y_test_df = pd.read_csv("./data/y_test.csv")
y_test_df
Out[17]:
simulation category cat_num
0 Hot_0036_match banded 1
1 Cool_0192_Locked asymetric 0
2 Cool_0192_Locked asymetric 0
3 Cool_0192_Locked no_pattern 4
4 Cool_0192_Locked no_pattern 4
... ... ... ...
63 Cool_0192_Locked no_pattern 4
64 Hot_0036_match banded 1
65 Hot_0036_match no_pattern 4
66 Cool_0192_Locked no_pattern 4
67 Cool_0192_Locked no_pattern 4

68 rows × 3 columns

Distribution of classes in test set:¶

In [18]:
class_counts = y_test_df['category'].value_counts()

plt.figure(figsize=(10, 6))
class_counts.plot(kind='bar')
plt.title('Class Distribution')
plt.xlabel('Class')
plt.ylabel('Number of Examples')
plt.xticks(rotation=0)
plt.show()

Base model¶

We will use a pretrained CNN model (MobileNet trained on ImageNet dataset). The model is loaded and all the weights are freezed. We just replace the last fully-connected layer to be trained.

In [19]:
def create_modified_mobilenet(num_classes):
    # Load a pretrained MobileNet model
    model = models.mobilenet_v2(weights='IMAGENET1K_V1')

    # Freeze all layers in the network
    for param in model.parameters():
        param.requires_grad = False

    # Replace the last fully connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    num_features = model.classifier[1].in_features
    model.classifier[1] = nn.Linear(num_features, num_classes)

    return model
In [20]:
# labels
y_train = y_train_df.cat_num.values
print(y_train.shape)
y_test = y_test_df.cat_num.values
print(y_test.shape)
(620,)
(68,)
In [21]:
# Reshape data to add channel dimension
X_train = X_train.reshape(-1, 1, 90, 180)
X_test = X_test.reshape(-1, 1, 90, 180)
print(X_train.shape)
print(X_test.shape)
(620, 1, 90, 180)
(68, 1, 90, 180)
In [22]:
# Conversion from numpy arrays to Pytorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.int64)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.int64)

We will need to convert images from grayscale to RBG format (1 to 3 channel dim) in order to accomodate with MobileNet inputs. We simply repeat gray image 2 more times.

In [23]:
class GrayscaleToRgb:
    def __call__(self, tensor):
        # tensor is a 1-channel grayscale image
        return tensor.repeat(3, 1, 1)

Furthermore, images have to be resized to (224, 224) and normalized:

In [24]:
transform = transforms.Compose([
    GrayscaleToRgb(),
    transforms.Resize((224, 224), antialias=True),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
In [25]:
X_train_transformed = torch.stack([transform(x) for x in X_train_tensor])
X_train_transformed.shape
Out[25]:
torch.Size([620, 3, 224, 224])

Here is a function to visualize transformed images:

In [26]:
def custom_imshow(img):
    # Unnormalize the image
    img = img.numpy().transpose((1, 2, 0))  # Convert from Tensor image
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    img = std * img + mean
    img = np.clip(img, 0, 1)  # Clip to ensure it's between 0 and 1

    plt.imshow(img)
    plt.axis('off')
    plt.show()
In [27]:
custom_imshow(X_train_transformed[0])
In [28]:
custom_imshow(X_train_transformed[349])

On image 349 for example, we recognize the 'butterfly' pattern.

In [29]:
# Transform on test images:
X_test_transformed = torch.stack([transform(x) for x in X_test_tensor])
X_test_transformed.shape
Out[29]:
torch.Size([68, 3, 224, 224])
In [30]:
# Labels:
print(y_train_tensor.shape)
print(y_test_tensor.shape)
torch.Size([620])
torch.Size([68])

We create DataLoader objects in order to train and predict on batches of images:

In [31]:
train_dataset = TensorDataset(X_train_transformed, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
In [32]:
test_dataset = TensorDataset(X_test_transformed, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=32)
In [33]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = create_modified_mobilenet(num_classes=5).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
In [34]:
# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
Epoch [1/10], Loss: 0.6839
Epoch [2/10], Loss: 0.7206
Epoch [3/10], Loss: 0.6356
Epoch [4/10], Loss: 0.4827
Epoch [5/10], Loss: 0.4240
Epoch [6/10], Loss: 0.3566
Epoch [7/10], Loss: 0.4369
Epoch [8/10], Loss: 0.4515
Epoch [9/10], Loss: 0.4223
Epoch [10/10], Loss: 0.8125
In [35]:
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Accuracy on Test Data: {accuracy:.3f}')
Accuracy on Test Data: 0.735

Let's predict on the whole test dataset with the fitted model (we can do that because we only have 68 examples in the train set).

In [36]:
X_test_to_predict = X_test_transformed.to(device)

# Model in evaluation mode
model.eval()

with torch.no_grad():
    # Forward pass
    outputs = model(X_test_to_predict)
    _, y_pred = torch.max(outputs, 1)

# Convert predictions to a numpy array
y_pred = y_pred.cpu().numpy()

Accuracy score:

In [37]:
accuracy_score(y_test_tensor, y_pred)
Out[37]:
0.7352941176470589

Balanced accuracy score:

In [38]:
balanced_accuracy_score(y_test_tensor, y_pred)
Out[38]:
0.5771428571428572
In [39]:
print(classification_report(y_test_tensor, y_pred))
              precision    recall  f1-score   support

           0       1.00      0.10      0.18        10
           1       1.00      0.83      0.91         6
           2       0.67      1.00      0.80         4
           3       0.00      0.00      0.00         6
           4       0.71      0.95      0.82        42

    accuracy                           0.74        68
   macro avg       0.68      0.58      0.54        68
weighted avg       0.72      0.74      0.66        68

/home/frcaud/anaconda3/envs/ramp-hotjupiter/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1471: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/frcaud/anaconda3/envs/ramp-hotjupiter/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1471: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/frcaud/anaconda3/envs/ramp-hotjupiter/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1471: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Confusion matrix:

In [40]:
disp = ConfusionMatrixDisplay.from_predictions(y_test_tensor, y_pred)
fig = disp.figure_
fig.set_figwidth(7)
fig.set_figheight(7)  

The model has difficulties to predict 'asymetric' and 'butterfly' classes and instead tag them as 'no_pattern'. Classes 'banded' and 'locked' seem to be well predicted (on this small test dataset).

In order to better evaluate the model, RAMP uses cross-validation. Because data are grouped by simulations and those simulations comprise big chunck of data, we designed a CV strategy with meta groups that contains different simulations (group splitting) and the better repartition of examples from all the classes (stratified splitting). We will use LeaveOneGroupOut strategy on those meta groups.

Cross-validation evaluation:¶

In [41]:
y_train_df
Out[41]:
simulation category cat_num
0 Hot_0036_Locked banded 1
1 Hot_0036_Locked banded 1
2 Hot_0021_match banded 1
3 Hot_0021_Locked banded 1
4 Hot_0012_Locked banded 1
... ... ... ...
615 Cool_0334_Locked no_pattern 4
616 Hot_0021_Locked no_pattern 4
617 Hot_0036_Locked no_pattern 4
618 Hot_0012_Locked banded 1
619 Hot_0021_Locked no_pattern 4

620 rows × 3 columns

Groups for cross-val:¶

In [42]:
# Create meta groups as a new column in y_train_df
# Simulations to groups dictionary
sim_to_group = {
    "Cool_0060_Locked": 0,
    "Cool_0110_match": 0,
    "Hot_0012_match": 0,
    "Hot_0036_Locked": 0,
    "Cool_0334_Locked": 1,
    "Hot_0012_Locked": 1,
    "Cool_0060_match": 1,
    "Cool_0110_Locked": 2,
    "Cool_0192_match": 2,
    "Cool_0334_match": 2,
    "Hot_0021_Locked": 2,
    "Hot_0021_match": 2,
}
# Function to apply to each row in the 'simulation' column
def get_group(simulation):
    return sim_to_group.get(simulation, None)
# Create a new column 'group' based on the 'simulation' column
y_train_df["group"] = y_train_df["simulation"].apply(get_group)
# Gobal variable groups
global groups
groups = y_train_df.group.to_numpy()
In [43]:
print(groups.shape)
groups
(620,)
Out[43]:
array([0, 0, 2, 2, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 2, 2,
       2, 2, 0, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 0, 2, 1, 0, 0, 0, 0, 0,
       0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1,
       0, 0, 0, 1, 1, 1, 2, 1, 0, 2, 1, 0, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1,
       1, 1, 2, 2, 2, 0, 0, 2, 2, 0, 1, 1, 1, 1, 1, 0, 1, 2, 2, 2, 1, 1,
       2, 0, 1, 0, 0, 2, 0, 1, 1, 2, 2, 0, 0, 1, 1, 1, 1, 0, 2, 2, 1, 1,
       2, 1, 0, 2, 1, 0, 2, 2, 0, 0, 2, 1, 2, 1, 0, 2, 2, 2, 2, 2, 2, 2,
       1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 0, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 2, 2, 2, 2, 1, 1, 2, 1, 2, 1, 0, 1, 0, 0, 2, 2, 2, 1, 0, 1,
       1, 2, 2, 1, 1, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2,
       0, 1, 0, 2, 2, 0, 1, 0, 1, 1, 1, 1, 0, 2, 2, 2, 1, 2, 0, 1, 2, 2,
       2, 0, 2, 0, 2, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 1, 1, 1,
       1, 0, 2, 1, 1, 1, 0, 2, 2, 0, 0, 0, 2, 0, 2, 2, 0, 0, 1, 2, 1, 1,
       2, 0, 1, 1, 1, 1, 0, 0, 2, 0, 0, 1, 1, 2, 1, 2, 1, 2, 1, 2, 0, 2,
       2, 1, 1, 1, 0, 2, 1, 0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 2, 1, 1, 0, 2,
       2, 2, 0, 0, 2, 0, 1, 1, 2, 2, 1, 1, 1, 2, 1, 0, 0, 1, 2, 0, 2, 2,
       0, 2, 0, 0, 2, 1, 2, 0, 1, 1, 0, 1, 2, 2, 1, 0, 1, 0, 2, 1, 2, 1,
       0, 2, 0, 2, 2, 1, 2, 0, 0, 2, 2, 0, 1, 0, 2, 1, 1, 1, 0, 2, 0, 2,
       0, 1, 2, 1, 0, 2, 0, 1, 0, 0, 0, 2, 0, 1, 0, 0, 0, 2, 2, 2, 2, 2,
       2, 0, 2, 1, 2, 1, 0, 0, 2, 2, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       2, 0, 1, 0, 1, 1, 0, 0, 2, 0, 0, 2, 0, 2, 1, 2, 0, 0, 0, 2, 1, 0,
       0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 1, 1, 0, 0, 1, 0, 1, 2,
       0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 2, 2, 2, 0, 2, 2, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 2, 0, 2, 2, 1, 0, 2, 0, 0, 1, 1, 1,
       2, 2, 2, 0, 0, 2, 2, 2, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 2, 0, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 0, 0, 1,
       2, 0, 1, 2])
In [44]:
y_train_df
Out[44]:
simulation category cat_num group
0 Hot_0036_Locked banded 1 0
1 Hot_0036_Locked banded 1 0
2 Hot_0021_match banded 1 2
3 Hot_0021_Locked banded 1 2
4 Hot_0012_Locked banded 1 1
... ... ... ... ...
615 Cool_0334_Locked no_pattern 4 1
616 Hot_0021_Locked no_pattern 4 2
617 Hot_0036_Locked no_pattern 4 0
618 Hot_0012_Locked banded 1 1
619 Hot_0021_Locked no_pattern 4 2

620 rows × 4 columns

In [45]:
# Class count by group:
group_class_counts = y_train_df.groupby('group')['cat_num'].value_counts().unstack(fill_value=0)
group_class_counts
Out[45]:
cat_num 0 1 2 3 4
group
0 41 28 16 9 109
1 31 29 18 36 73
2 23 26 22 9 150

Here we see that we couldn't better spread examples from class 3 because the 36 examples belong to one simulation:

In [46]:
y_train_df.groupby('simulation')['cat_num'].value_counts().unstack(fill_value=0)
Out[46]:
cat_num 0 1 2 3 4
simulation
Cool_0060_Locked 41 2 1 0 27
Cool_0060_match 0 1 8 0 14
Cool_0110_Locked 8 1 17 0 58
Cool_0110_match 0 0 5 0 18
Cool_0192_match 5 0 1 0 16
Cool_0334_Locked 31 0 9 0 39
Cool_0334_match 7 0 2 0 12
Hot_0012_Locked 0 28 1 36 20
Hot_0012_match 0 4 1 9 9
Hot_0021_Locked 3 18 1 0 58
Hot_0021_match 0 7 1 9 6
Hot_0036_Locked 0 22 9 0 55
In [47]:
# labels
y = y_train
print(y.shape)
(620,)
In [48]:
# Data
X = X_train
print(X.shape)
(620, 1, 90, 180)
In [49]:
# Initialize cross-validation
logo = LeaveOneGroupOut()
In [50]:
for train, test in logo.split(X, y, groups=groups):
    print("%s %s" % (len(train), len(test)))
417 203
433 187
390 230
In [51]:
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.int64)
In [52]:
X_transformed = torch.stack([transform(x) for x in X_tensor])
In [53]:
# Placeholder for cross-validation results
accuracy_scores = []
balanced_accuracy_scores = []

# Iterate over each train/test split
for i, (train_idx, test_idx) in enumerate(logo.split(X_transformed, y_tensor, groups)):
    X_train, X_test = X_transformed[train_idx], X_transformed[test_idx]
    y_train, y_test = y_tensor[train_idx], y_tensor[test_idx]

    # DataLoader for training and testing
    train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=32, shuffle=True)
    test_loader = DataLoader(TensorDataset(X_test, y_test), batch_size=32)

    # Initialize and train the model
    model = create_modified_mobilenet(num_classes=5).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # Training loop
    for epoch in range(10):  # Adjust the number of epochs if needed
        model.train()
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

    # Evaluate the model
    model.eval()
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    # Calculate metrics
    accuracy = accuracy_score(all_labels, all_preds)
    balanced_acc = balanced_accuracy_score(all_labels, all_preds)
    print(f"Fold {i}:")
    disp = ConfusionMatrixDisplay.from_predictions(all_labels, all_preds)
    plt.title(f'Confusion Matrix for Fold {i}')
    plt.show()
    #fig = disp.figure_
    #fig.set_figwidth(7)
    #fig.set_figheight(7)

    # Store results
    accuracy_scores.append(accuracy)
    balanced_accuracy_scores.append(balanced_acc)

# Calculate and print average metrics
average_accuracy = np.mean(accuracy_scores)
average_balanced_accuracy = np.mean(balanced_accuracy_scores)
print(f'Average Accuracy: {average_accuracy:.3f}')
print(f'Average Balanced Accuracy: {average_balanced_accuracy:.3f}')
Fold 0:
Fold 1:
Fold 2:
Average Accuracy: 0.701
Average Balanced Accuracy: 0.594

These are the scores you will have to (and will most certainly!) beat during this challenge.

Submitting to the online challenge: ramp.studio ¶

Once you found a good model, you can submit them to ramp.studio to enter the online challenge. First, if it is your first time using the RAMP platform, sign up, otherwise log in. Then sign up to the event hotjupiter. Both signups are controled by RAMP administrators, so there can be a delay between asking for signup and being able to submit.

Once your signup request is accepted, you can go to your sandbox and write the code for your classifier directly on the browser. You can also create a new folder my_submission in the submissions folder containing classifier.py and upload this file directly. You can check the starting-kit (classifier.py) for an example. The submission is trained and tested on our backend in the similar way as ramp-test does it locally. While your submission is waiting in the queue and being trained, you can find it in the "New submissions (pending training)" table in my submissions. Once it is trained, your submission shows up on the public leaderboard. If there is an error (despite having tested your submission locally with ramp-test), it will show up in the "Failed submissions" table in my submissions. You can click on the error to see part of the trace.

The data set we use at the backend is usually different from what you find in the starting kit, so the score may be different.

The usual way to work with RAMP is to explore solutions, add feature transformations, select models, etc., locally, and checking them with ramp-test. The script prints mean cross-validation scores.

The official score in this RAMP (the first score column on the leaderboard) is the balenced accuracy score (bal_acc). When the score is good enough, you can submit it at the RAMP.

Here is the script proposed as the starting_kit:

In [54]:
import numpy as np
import random
from sklearn.base import BaseEstimator


class Classifier(BaseEstimator):
    def __init__(self):
        return

    def fit(self, X, y):
        return

    def predict(self, X):
        return

    def predict_proba(self, X):
        # Create an array of zeros
        y_pred = np.zeros((X.shape[0], 5), dtype=int)
        # Set one random index per row to 1
        for i in range(X.shape[0]):
            random_index = random.randint(0, 4)
            y_pred[i, random_index] = 1
        return np.array(y_pred)

You can test your solution locally by running the ramp-test command followed by --submission . Here is an example with the starting_kit submission:

In [55]:
!ramp-test --submission starting_kit
Testing Hot Jupiter atmospheric pattern classification
Reading train and test files from ./data/ ...
Reading cv ...
Training submissions/starting_kit ...
CV fold 0
	score  bal_acc    acc      time
	train    0.214  0.209  0.007614
	valid    0.257  0.251  0.000699
	test     0.221  0.221  0.000097
CV fold 1
	score  bal_acc    acc      time
	train    0.213  0.176  0.006831
	valid    0.211  0.182  0.000787
	test     0.215  0.206  0.000118
CV fold 2
	score  bal_acc    acc      time
	train    0.206  0.213  0.006035
	valid    0.194  0.217  0.000530
	test     0.212  0.162  0.000059
----------------------------
Mean CV scores
----------------------------
	score         bal_acc             acc       time
	train  0.211 ± 0.0034  0.199 ± 0.0167  0.0 ± 0.0
	valid   0.22 ± 0.0265  0.217 ± 0.0283  0.0 ± 0.0
	test   0.216 ± 0.0036   0.196 ± 0.025  0.0 ± 0.0
----------------------------
Bagged scores
----------------------------
	score  bal_acc    acc
	valid    0.213  0.218
	test     0.179  0.147

More information¶

See the online documentation for more details.

Questions¶

Questions related to the starting kit should be asked on the issue tracker.