Annually, dozens of research articles in AI are published by academics all around the world, but only a few of them reach a broad audience and have a global influence. The top ten most influential research papers published in premier AI conferences over the last five years are shown below. The list comprises prominent AI conferences and journals and is based on the number of citations.
5. Attention is all you need by Ashish Vaswani, Noam Shazeer in 2017, Cited by 18178
In the paper Attention Is All You Need, the Transformers architecture was proposed. We can observe an encoding and decoding component, as well as relationships between them. The encoding component is a collection of encoders. The decoding component is a stack of identical decoders. The encoders are all structurally identical. Each one is divided into two sub-layers. The encoder's inputs pass through a self-attention layer, which allows the encoder to look at other words in the input sentence while encoding a single word. Later in this essay, we'll take a closer look at self-attention.
A feed-forward neural network receives the outputs of the self-attention layer. Each position uses the exact same feed-forward network.Both of those layers are included in the decoder, but there is also an attention layer that aids the decoder in focusing on relevant parts of the input phrase (similar what attention does in seq2seq models).
Transformer has been effectively used to English constituency parsing with both large and small training data, indicating that it generalises well to different tasks.
4. Language Models are Few-Shot Learners (GPT-3) by Tom B. Brown, Benjamin Mann in 2020, Cited by 3796
GPT-3 is a deep-learning model for Natural Language Processing that has 175 billion parameters (!!! ), which is 100 times more than the previous version, GPT-2. Without fine-tuning, the model achieves SOTA performance on various NLP benchmarks after being pre-trained on roughly half a trillion words. Paper introduces transformers, which are wonderful routing machines that can learn dependencies between any tokens in the input and then move through a series of sophisticated routing systems that are learned as part of the initial recurrent neural network training process. According to the OpenAI researchers, larger models employ in-context information more efficiently. The steeper "in-context learning curves" for big models show better ability to learn from contextual information, as shown in the graph below.
3. ImageNet classification with deep convolutional neural networks by Alex Krizhevsky,Ilya Sutskever in 2017, Cited by 108312
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton published and won the ILSVRC-2012 competition in 2012 by developing a unique Deep Convolutional Neural Network that was able to classify 1.3 million high-resolution images into 1000 different classes with a 15.3 percent test error rate, far better than the second-place model (26.2 percent )
This feat means much more to the entire planet than we realise. Thanks to the creators of AlexNet the study in deep learning not just explode, the tech companies started to invest more in technologies of classification using AlexNet at that moment and creating other models to create realistic solutions to problems such as detection images, classification of images and more. This revolution in AI and Deep Learning has also resulted in a revolution in machine performance. We now have better computers with which to train these models. Manufacturers' GPU development is not limited to the gaming market. You now know what else these incredible machines are capable of besides playing Red Dead Redemption 2.
The optimization algorithms, regularisation methods, and activation functions all played important roles in the AlexNet design. That means that if you want to train a good model in order to solve a problem yo have to try ans test with different options and to see the behavior of your model to get a good accuracy and avoid overfitting in your model
2. Adam: A Method for Stochastic Optimization by Diederik P. Kingma, Jimmy Ba in 2017, Cited by 104886
The Adam optimization approach is a stochastic gradient descent extension that has lately gained traction in computer vision and natural language processing applications. In their 2015 ICLR article (poster) titled "Adam: A Method for Stochastic Optimization," Diederik Kingma from OpenAI and Jimmy Ba from the University of Toronto presented Adam. Unless otherwise mentioned, I shall freely quote from their paper in this piece. Adam is the name of the algorithm. It's not an acronym, and it's not spelled "ADAM."
Adam, according to the authors, combines the benefits of two prior stochastic gradient descent enhancements. Particularly:
Adaptive Gradient Algorithm (AdaGrad) increases performance on problems with sparse gradients by maintaining a per-parameter learning rate (e.g. natural language and computer vision problems).
Root Mean Square Propagation (RMSProp), which also keeps per-parameter learning rates that are adjusted depending on the average of recent gradient magnitudes for the weight (e.g. how quickly it is changing). This indicates that the technique is effective for both online and non-stationary problems (e.g. noisy).
Adam understands the value of AdaGrad and RMSProp.Rather than modifying parameter learning rates based on the average first moment (the mean) as RMSProp does, Adam uses the average of the second moment.
1. Generative Adversarial Networks (GANs) by Ian J. Goodfellow, Jean Pouget-Abadie in 2014, Cited by 44363
GANs, or Generative Adversarial Networks, are a type of generative modelling that employs deep learning techniques such as convolutional neural networks.In machine learning, generative modelling is an unsupervised learning job that entails automatically detecting and learning regularities or patterns in input data so that the model may be used to produce or output new examples that could have been drawn from the original dataset.
GANs are a clever way to train a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model, which we train to generate new examples, and the discriminator model, which tries to classify examples as real (from the domain) or fake (from the domain) (generated)
GANs accomplish this level of realism by combining a generator that learns to produce the desired output with a discriminator that learns to differentiate true data from the generator's output. The generator attempts to deceive the discriminator, while the discriminator attempts to avoid being deceived.