What is an LLM and how is it architected?

What’s as LLM?

LLM stands for Large Language Models. Think of them like robots that have read a ton of stuff, like articles, encyclopaedias, code repositories, and so on, on the internet. They learn how words and sentences fit together to make sense. Those words are called tokens in the language of a developer.

How Do They Work?

You can ask a question in a search engine. Then, the LLM looks back at all the things it read and figures out the best way to answer. The transformer neural network is particularly well-suited to training LLMs. They can read vast amounts of text, spot patterns in how words and phrases relate, and predict what kinds of comments should come next. In a way, LLMs are similar to “autofill” engines. They don’t know anything themselves, but they’re good at predicting the next step in a sequence. It seems they got it all from BERT (Bidirectional Encoder Representations from Transformers).

Companies where LLMs have been widely used:

a) ChatGPT

b) Gemini

c) Claude

d) Perplexity AI

Code Architecture of a Simple LLM (the aftermath of Deep Learning)

  1. Embedding Layer:

    • Converts input tokens into high-dimensional vectors (embeddings).
  2. Positional Encoding:

    • Adds positional information to the embeddings, helping the model understand the order of words in a sequence.
  3. Transformer Encoder Blocks:

    • The core building blocks of the architecture.

    • Composed of multiple layers (commonly 12 or more).

    • Each layer has two sub-layers:

      • Multi-Head Self-Attention Mechanism: Allows the model to weigh different parts of the input sequence differently when making predictions.
      • Position-wise Fully Connected Feed-Forward Network: Applies non-linear transformations to the output of the attention mechanism.
  4. Layer Normalisation and Residual Connections:

    • Used after each sub-layer to stabilise and speed up training.
    • Residual connections allow information to flow through the network more efficiently.
  5. Decoder Blocks:

    • If it’s a decoder, additional layers for generating the output sequence.
  6. Final Output Layer:

    • Produces the model’s prediction for the next word or token in the sequence.
  7. Training Loop:

    • The model is trained using a large dataset with a task-specific objective, such as predicting the next word in a sentence or filling in masked words.

Real Example of an LLM input and output

User Input:

Generate an engaging introduction paragraph for an article about the impact of technology on modern society.

LLM output:

The digital era has ushered in an unprecedented transformation, weaving technology into the very fabric of our daily lives. From revolutionising communication to reshaping industries, its impact resonates across every corner of modern society. In this article, we delve into the profound effects technology has on our world, exploring both the opportunities it unfolds and the challenges it presents.

Ethics behind LLMs

A study by Oxford University and global experts discusses the ethical concerns tied to large language models (LLMs).

  1. The focus is on responsibility for outputs from LLMs like ChatGPT, moving beyond conventional AI discussions on harm.
  2. The study suggests updating our idea of responsibility due to the emergence of LLMs.
  3. Users of LLMs are not fully credited for positive results but are held responsible for harmful outputs.
  4. This creates an “achievement gap” where users can’t get full recognition for positive outcomes.
  5. Guidelines on authorship, disclosure, and intellectual property are deemed essential by the study’s senior author, Julian Savulescu.
  6. The interdisciplinary team recommends transparency norms to track responsibility and assign praise or blame.
  7. The impact of LLMs on education, publishing, intellectual property, and misinformation is explored.
  8. In education, guidelines for LLM use and responsibility are urgently needed, including disclosure statements in article submissions.
  9. Rights in generated text, like intellectual property and human rights, require swift adaptation to protect creators and users in the fast-evolving LLM landscape.

LLMs in Social Media Content Generation

Large Language Models (LLMs) demonstrate proficiency in formulating engaging social media captions by leveraging extensive datasets, including articles and newspapers, for training. Their meticulous analysis of successful captions across diverse platforms enables the generation of contextually relevant and aesthetically appealing text. This capability enhances the impact of visual content by balancing brevity, humor, and adherence to brand voice. LLMs play a vital role in converting passive scrollers into actively engaged followers and advocates, thereby positively influencing social media presence. Their training on varied datasets equips them to create content that resonates with diverse audiences, contributing to heightened overall engagement.

How AdVentur.ai employs LLMs effectively

AdVentur.ai leverages Large Language Models (LLMs) effectively by employing them to analyze images and generate content along with relevant hashtags. The process involves utilizing the LLMs’ capabilities to understand the visual elements within images and subsequently creating contextually appropriate and engaging textual content. By integrating these language models into our platform, AdVentur.ai ensures a seamless fusion of visual and textual components, enhancing the overall quality and relevance of generated content. This approach not only streamlines content creation but also optimises the inclusion of hashtags, contributing to a more effective and targeted social media strategy for our small business users.

For more info about how LLMs work and their utility in AdVentur.ai, feel free to email us at: contact@adventur.ai