The Evolution of Neural Networks: Perceptron, Backpropagation, and GPT

Frank Rosenblatt (1957) introduced the Perceptron, the first implemented artificial neural network. It was able to learn simple classification tasks, and initially generated strong enthusiasm.
Minsky and Papert (1969) published Perceptrons, where they demonstrated the mathematical limitations of single-layer perceptrons. Problems such as the XOR function or the recognition of global structures could not be solved. This marked the beginning of the first AI winter for neural networks.

Alexey Ivakhnenko (1965) developed the Group Method of Data Handling (GMDH), considered one of the first multilayer networks.
Paul Werbos (1974) introduced the algorithm of error backpropagation in his doctoral dissertation. However, his contribution was not immediately recognized.
The situation changed in 1986, when David Rumelhart, Geoffrey Hinton, and Ronald Williams demonstrated that backpropagation could effectively train multilayer networks. This marked the relaunch of neural networks as a research field.

Yann LeCun (1998) developed LeNet-5, a convolutional neural network (CNN) for handwritten digit recognition. This was a crucial step in computer vision.
In 2006, Geoffrey Hinton and colleagues proposed methods for training deep belief networks, which contributed to the establishment of the term deep learning and opened the way to networks with many hidden layers.

AlexNet (2012), developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, achieved a decisive breakthrough in the ImageNet competition, surpassing traditional methods and demonstrating the power of deep neural networks.
Ian Goodfellow (2014) introduced Generative Adversarial Networks (GANs), a new paradigm for generative modeling.
In 2017, Vaswani et al. proposed the Transformer architecture in the paper Attention Is All You Need. This model soon became the foundation for modern large language models (LLMs).

OpenAI GPT series (2018–2020) scaled the Transformer architecture to unprecedented sizes. GPT-2 demonstrated surprising fluency in text generation, while GPT-3 (175B parameters) introduced few-shot learning capabilities.
In 2022, ChatGPT (GPT-3.5) brought conversational AI to the mainstream.
In 2023, GPT-4 introduced multimodality, enabling the model to process both text and images.
By 2025, the state of the art consists of multimodal LLMs, integrating text, images, audio, and external tools, with increasing applications in education, science, and society.

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences (Doctoral dissertation, Harvard University).
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS, 25, 1097–1105.
Goodfellow, I., et al. (2014). Generative adversarial nets. NeurIPS, 27, 2672–2680.
Vaswani, A., et al. (2017). Attention is all you need. NeurIPS, 30, 5998–6008.
Brown, T. B., et al. (2020). Language models are few-shot learners. NeurIPS, 33, 1877–1901.

Related Posts