OpenAI released Generative Pre-trained Transformer 4 or GPT-4 on 14 March 2013. It is the fourth version of the transformer-based GPT series of large language models and has newer capabilities and functionalities that surpass GPT-3 and GPT-3.5.
Microsoft confirmed that its Bing search engine has been using GPT-4 even before its release. This large language model is now available to other developers via an API and it now powers the subscription-based version of ChatGPT.
A Look Into the Capabilities of GPT-4
The most defining characteristic of GPT-4 is that it is a multimodal large language model and a foundation model. Note that the previous versions of GPT were not multimodal models. A multimodal model combines different modalities of data to include texts and imaging data. This means that the capabilities of GPT-4 and its applications are more expansive than its predecessors.
Large language models consist of artificial neural networks. They are also trained using deep learning. It is also worth mentioning that GPT-4 was specifically trained using large datasets comprising of public data and data licensed from third-party providers. It was then fined-tuned with reinforcement learning from human feedback.
A blog post from OpenAI stated that this new model can handle more nuanced instructions than the previous GPT version. It is also more reliable and more creative. It can even read, analyze, or generate up to 25000 words of text. These improve and expand the capabilities of different generative artificial intelligence applications based on language models.
Remember that GPT-4 is a large language model for natural language processing. It can read, analyze, and generate natural language based on textual input. However, apart from its NLP capabilities, it also has some computer vision capabilities. It can recognize, analyze, and describe an image using imaging data. This sets it apart from its predecessors.
Below is the rundown of the capabilities of Generative Pre-trained Transformer 4 and its possible applications:
• Multimodal Model: It is a large language model trained using different modalities of data to include textual data and imaging data using different artificial intelligence algorithms. This text-image modeling provides this model with the capabilities to analyze and describe texts and images.
• Image Description: Another notable feature or capability of GPT-4 is that it has some degree of computer vision capabilities. It specifically has an image recognition function for image description generation. This means that its specific application can include analyzing and even describing a photograph, other graphic images, or visualized data and information such as graphs, tables, and infographics.
• Generative Applications: The main application of GPT-4 is in generative AI. The previous GPT-3 and GPT 3.5 have powered different generative AI apps and services like ChatGPT and Dall-E from OpenAI and Copilot from Microsoft. This newer model has the same applications but it expands the features and capabilities of generative AI apps and services because it is trained using a larger set of data.
• Specific Capabilities: More specific capabilities include answering questions, summarizing and generating long-form texts, describing an uploaded image, debugging and writing codes, creative writing, decoding and generating formulas, detailed data analysis, and following text-based instructions, among others.
• Foundation Model: Another notable characteristic of GPT-4 is that it is a foundation model that can be fine-tuned or adapted to create other models and more specific AI applications or use cases. The entire GPT family of LLMs are foundation models but the added multimodal capabilities of GPT-4 make it a better foundation model.
Take note that GPT-1 has around 120 million parameter counts while GPT-2 has 1.5 billion parameter counts. GPT-3 is larger with 175 billion parameter counts. Parameters are variables in AI and specific machine learning and deep learning modeling. These variables are adjusted during training to establish how inputs transform into desired outputs.
GPT-4 has a larger parameter. Some have considered it to be trillions in parameter count. However, citing the existing competitive landscape and the different security implications of large-scale models, OpenAI refrained from providing information about the size of its newer Generative Pre-trained Transformer.
Nevertheless, considering that it is a multimodal model and the aforementioned capabilities and applications, GPT-4 is arguably more powerful. Testers showed that it passed several standardized tests. Others have also used it in drug discovery research in which the model was able to generate a similar albeit modified formula of a known drug formulation.
It still has several limitations. Testers showed that it is still prone to hallucination or a phenomenon in which an AI model provides a confident response or output that is not based on its training data. It cannot also replace high-level programming or coding as it scored low in different tests related to software development and software engineering.