An Introduction to Automating Image Processing with Deep Learning

An overview of Convolutional Neural Networks.

Image processing is essential in a myriad of projects: creating autonomous vehicles, analyzing bioimaging data, developing facial recognition software. With new advancements in deep learning, researchers and corporations are taking advantage of the new automated methods utilized for image processing.

Convolutional Neural Networks (CNNs)

CNNs are a category of neural networks used when applying deep learning with visual data. They require much more computational power than most other neural network frameworks due to the sheer amount of data points and the dimensionality of the input data. CNNs essentially consist of multilayer perceptrons that are regularized to prevent overfitting. CNNs can be easily implemented through TensorFlow or PyTorch. Any code referred to in this article will use TensorFlow-based Keras.

Image Credits:

CNNs Step-By-Step

CNNs are most often characterized by their use of Convolutional layers and Pooling layers.

Convolutional layers convolve the input data to develop feature maps. Multiple convolutional layers are used in CNNs, and these layers include filters (often called kernels). Convolution generates a activation map of the filter, and the network learns which filters activate when a specific feature is detected. Through the use of these filters, the CNN is able to learn to detect a certain feature at a specific spatial position in the input. In TensorFlow, convolutional layers can be developed using the following function:

You can customize the parameters for the function.

Pooling layers are utilized for non-linear downsampling of the output of convolutional layers. Both TensorFlow and PyTorch provide various pre-packaged functions used for pooling, but the most often used pooling technique is Max Pooling. It creates non-overlapping rectangles that are subsets of the input image, and then outputs the maximum value of these rectangles. Max Pooling reduces the amount of computation needed by decreasing the number of overall parameters in the model, which will already have millions of parameters. Consequently, Max Pooling also helps in preventing overfitting.

Max Pooling Example

In TensorFlow, Max Pooling can be implemented as follows:

You can customize the parameters for the function.

As you can see, it is not too difficult to implement these Convolutional and Max Pooling layers. There are other layers also used in CNNs such as Batch Normalization, ReLU activation layers, concatenation layers, and transposing layers. These layers can be used to further better your network architecture or when you are using a popular architecture such as U-Net.

Popular CNN Architectures

Now that you understand the fundamentals of CNNs, we can dive into the different available CNN architectures for image processing.

Image processing includes various tasks such as classification, thresholding, segmentation etc. Specific deep learning architectures have been developed by researchers which have gone through hundreds of trials with several distinct datasets. Since so many existing model frameworks are available, it is most logical to research more on available architectures to see if they meet your computational needs. If not, then you can consider modifying these models, or worst case, creating your own model architecture from scratch.

Below is a list the most popular CNN architectures and their most popular use cases. I have also attached research papers associated with these models. I plan to update this list in the future.

  1. VGGNet → used for large-scale image classification tasks
  2. ResNet → used for image recognition and training extremely deep networks
  3. U-Net → used for image segmentation or feature segmentation
  4. Inception Net (Inception V3) → used for image recognition + lowering computational costs of training


This article is a basic guide to learning more about automating image processing tasks with deep learning. Many jobs require tedious image processing which can take long amounts of time to perform manually. Deep learning provides new avenues for automating these tasks. I will not be going over the implementation of each architecture since these architectures are open source.

To implement any of these architectures, you must first understand the process occurring in each layer. The implementation is relatively easy and can be done by following the steps detailed in the research papers linked above. Either TensorFlow or PyTorch can be used for the implementation, and I suggest that you use the framework you are most comfortable with.

By understanding the fundamentals of CNNs, you can also develop your own architectures that you believe will best address your image processing task. Implementing each layer requires usually one specific function with a few hyperparameters. After initial testing, you can fine tune these hyperparameters to boost model accuracy.

Final Remarks

I hope you were able to gain a basic fundamental understanding of CNNs and how they can be used to automate image processing tasks. Deep learning holds lots of potential for image processing and computer vision projects and is utilized widely across the tech industry. If you have any questions, feel free to contact me at and I will try my best to help.

I’m a freshman studying CS and Statistics at UC Berkeley. Feel free to contact me at