Fire Detection from NASA Worldview Snapshots

with Python, Ray, Keras, and PIL

Mikemoschitto
7 min readDec 10, 2021

By Michael Moschitto

California: 2–22–2016

Abstract

With droughts on the rise, our world is being ravaged by fires, this year more than any. At the beginning of this project, California was in the midst of the second largest fire in state history, with numerous others wreaking havoc. Utilizing satellite imagery and machine learning, I set out to create an early identification system as a mechanism to identify potential fires before they become uncontrollable.

Introduction

The idea for this project came in September 2021 as California was suffering through its third month of the Dixie fire. Although the biggest that year, and the second largest in state history at the time, Californian’s experienced further destruction at the mercy of the Caldor and Monument fires. But they to weren’t the last. In fact, as of November 2021 8,367 fires had burned over 3,083,507 acres in California alone.

And thus I began to brainstorm how software could be applied to this issue; both in our state and around the world. What I came up with was a detection system to recognize potential fires and alert authorities before they become enormous.

Challenges

As I was researching potential solutions to this problem, I stumbled upon NASA Worldview Snapshots, a site where users can input a date, format, and pair of (latitude, longitude) tuples and get access to satellite images from a MODIS (Moderate Resolution Imaging Spectroradiometer) onboard both of NASA’s Terra and Aqua satellites. This data source presented multiple challenges:

  1. What is the ideal resolution/size of images to pass to a machine learning model?
  2. Will images passed to a model mean anything visually to humans?
  3. Web requests (for image collection) and image processing are slow. How will my pipeline address speed when gathering many images at one time?
  4. How accurately will a machine learning model be able to distinguish clouds and smoke to predict if a given image contains a fire?

Methods

Step 1: How to distinguish between useful images to humans and useful images to a neural network.

In order to do detection, my program needed to cover a wide area (mapped out using a (lat, long) bounding box) of land even though the network need to use much much smaller images. Let's call the larger area in which the detection is done the search area and each of the smaller individual images passed to the network snapshots.

To solve this, I divided the search area into smaller 1° x 1° snapshots. In the figure below the blue outline is the search area, and the orange squares are snapshots where the search area is defined by its bottom left and top right corners.

And the code:

Step 2: Get NASA Worldview Snapshot Images.

Once the search area had been divided into smaller snapshots and stored in a dataframe of coordinates, the next step was to query the NASA API to retrieve these snapshots.

Here is an example of an individual snapshot:

Snapshot

These individual images were passed to the neural network and used to detect fires in a small subset of the larger search area.

Searching through the grid and saving snapshots locally using PIL was quite slow and a bottleneck for the program. The solution to this was to distribute these tasks using Ray. Ray is “an open source project that makes it simple to scale any compute-intensive Python workload” and dramatically decrease execution time.

I first wrote the function to search through the grid, and the only change to distribute with Ray was to initialize a ray server, and call the function used to retrieve images using a Ray remote. This creates an array of futures, or reference to the eventual output, which can then be accessed using ray.get(). A great overview of Ray, futures, and distributed Python in general can be found here.

Step 3: Pass snapshots to neural network to detect fires.

Once saved, the snapshots could be run through a neural network trained to identify whether a fire is present in a given image. I won’t go into all of the details of the model architecture but here are some of the layer highlights:

  • Conv2D: A convolution can be thought of as a filter that when applied many times to an input produces a feature map of activations that can be used to identify the strength and location of a feature.
  • MaxPooling2D: Pooling layers reduce the dimensionality of feature maps effectively summarizing the features generated by convolutional layers decreasing the number of parameters and computation needed to learn.
  • Flatten: Simply converts a multidimensional convolution output to a one dimensional array
  • Dense: Dense layers are fully connected and receive input from each of the previous neurons and is what is used to classify images based on the previous convolutional layers.

I’ll get into the results of this model later, but first here is the code…

Step 4: Stitch the snapshots into a larger human recognizable image.

Even though the snapshot above was the right size for the network, could a human have understood what it was? If you couldn’t tell that it was an image of the California coast just by looking, the answer it was probably no. Additionally, I reasoned that the most effective detection would be one in which a human could study a large search area and quickly identify multiple fires. Therefore the fourth step involved stitching images together into a larger one.

And now thanks to Ray, the larger images could be made of more snapshots, with higher resolution because the distributed processing was so much faster.

The transformation from snapshot to full image occured in 3 steps:

  1. Obtain the original snapshot
Individual Snapshot

2. Use longitudes to create rows of individual snapshots

Thirteen snapshots pasted together into a single row

3. Stitch rows together by latitude to create a full image

153 snapshots -> 13 rows -> 1 full image

Below is the function to create a row from an array of individual snapshots.

Prior to distributing this task with Ray, gathering and stitching these images together was a very slow process and only feasible for a small area of land.

After implementing a distributed solution, I was able to do this same process over a much larger area each day for an entire month to create a neat intermediary result!

Sadly the original quality had to be diminished to fit into an article, but the original image was made of over 450 snapshots and comprised of over 1,000,000 pixels!

Results

Timing results:

As mentioned above, using Ray to distribute the workload of retrieving and storing images drastically decreased execution time of the project. In order to quantitatively measure this, I ran the code on search grids ranging from 10 to 500 images in increments of 10. This was done in both a linear and distributed fashion and the resultant lines are displayed below.

I considered this a great success of the project because it allowed for full images/gifs that were much larger and a higher resolution (due to # images used) than had they not been created in parallel.

Model Results:

One of the other goals of the project was to create an automated system for fire detection. Unfortunately, that was less successful (at least at the writing of this paper) as it became apparent that the model was trained on satellite images that used different spectroradiometer wavelengths and resulted in a validation accuracy of .4783.

And so the model did not produce output that looked like this..

Thomas Fire December 2017

but rather like this.

The CNN thinking everything (even the ocean) is on fire…

Conclusion

Overall, I believe this project was a success. I set out to demonstrate the how distributed computing could be applied in order to aid the problem of detecting wildfires, and I think that was accomplished. However, was disheartening to have such poor results in automating that task.

Part of the challenge of training a model to recognize smoke in these images is the need for a large training set, extensive training, and tuning. This is planned in future work for this project as I intend to use AWS Sagemaker Groundtruth to create a large dataset of labeled images used to better train my CNN. Ray could again be applied in here to accelerate the training workload, increase hyperparameter tuning, and even deploy once trained.

Thanks for reading and have a fantastic day!

--

--