Discovering exo-planets using AI

The Data

We've used data from NASA's Kepler telescope, designed to discover Earth-like planets orbiting other stars. The dataset consisted of labeled time-series data, containing over 3,000 light flux values per solar system from the host star. These Light flux values represent the amount of light received from a star over time, and significant dips in these values can indicate the presence of an exoplanet as it passes in front of the star, temporarily blocking some of its light.

Each star in our dataset was given a label indicating whether it had at least one exoplanet in its orbit. This labeling allowed for training our machine learning model to distinguish between systems with - and without exoplanets based on light flux patterns.

The Process

Before creating or even starting to train our model, we needed to ensure our data was in great condition. This involved several preprocessing steps to address potential issues:

1. Handling Missing Values:
We checked for any NaN (Not a Number) values in the dataset, as these could disrupt the training process. Luckily, there weren't any, due to the high-quality and accuracy of NASA's data.
2. Balancing the Dataset:
One significant challenge was the imbalance in our data. Exoplanets are rare, and thus the number of stars with detected exoplanets was significantly lower than those without. To prevent the model from becoming biased towards the majority class (stars without exoplanets), we used a tool to balance the data. This balancing process involved techniques such as oversampling the minority class to give both target classes equal weight in the training process.
3. Data Scaling:
Scaling the data was another essential step. Since the range of light flux values could vary widely, we normalized these values to ensure that the model could learn effectively from the data without being influenced by the magnitude differences.

With our data preprocessed, we were ready to train our machine-learning model. The training process involved feeding the model the labeled time-series data and allowing it to learn patterns associated with the presence of exoplanets. We used a portion of the data for training and reserved a separate portion as a test set to evaluate the model's performance after we finished training.

Finally, we tested our trained model on the test set, which it had not seen during training. This step allowed us to ensure that our model could generalize well to new, unseen data. We were so happy to see that our model performed amazingly, accurately predicting the presence of exoplanets during our testing using the test set.

The Impact

Our trained model can now predict whether a solar system contains an exoplanet based on over 3,000 light flux values. This advancement is particularly exciting for the teams at NASA, as it provides a powerful tool to quickly scan through millions of star systems for potential exoplanets. The ability to automate this initial screening process can save researchers there, countless hours and help to focus their efforts on the most promising candidates for further study and analysis.

By using the vast amounts of data collected by the Kepler telescope over the years, and applying advanced machine learning techniques, we are contributing to the ongoing search for exoplanets and the broader understanding of our universe. It shows that with a little help, we can solve problems from a new perspective, and maybe broaden our knowledge in the process as well.

Discovering exo-planets using AI

The Data

The Process

The Impact

Join the conversation

Alert