Breaking Barriers: The Best of AI Research in 2023
Lidia Opuchlik
12/25/202311 min read


This is a summary of the State of AI Report 2023 (which is another great summary). To make it simple, the article is written in the form of bullet points. It's a much shorter version than the original 162-slide long report. Some concepts are simplified. This is my interpretation of the original report, so some mistakes can drop in. If anything seems unclear, go to the original work here.
A few points about this year's report:
Some new contributors: Nathan Benaich (old) and Air Street Capital team (2 more people)
The State of AI Report is now in its sixth year
The report should be treated as a compilation of the most interesting things (for the authors - a subjective judgment)
It covers fields like research, industry, politics, safety, and predictions and describes the most important insights in each of them
Research
technology breakthroughs
These are subjective, according to my judgment, the most interesting insights from research:
introduction of GPT-4
by OpenAI
better than any human (in specific tasks)
better than other LLMs
multimodal (GPT-3 was text-only)
trained on text and images
slightly over 8k tokens of input size available for the user (in the beginning)
reported that it was trained on 13 trillion tokens
trained using RLHF (reinforcement learning from human feedback - - RLHF requires hiring humans to evaluate and rank model outputs, and then models their preferences. This makes this technique hard, expensive, and biased)
passed SAT, GRE, Uniform Bar exam (law), LeetCode questions, medical exams etc. with high scores
- paid
- suffers from hallucinations
- no technical report, documentation - technical details withheld from external publication (Meta revealed the model weights and details for not open-source LLaMa-1) <-- research this more before publishing cause from the report the actual state who did what and when and why was unclear
LLaMa-2 model
July '23 - the release of a commercially useable model - almost anyone can use it commercially
similar to LLaMa-1 but further fine-tuned using instruction tuning and RLHF
optimized for dialogue-like conversations
corpus for pre-training - 2 trillion tokens
fine - tunned on public datasets and also vendor-based (ca. 25k high-quality) annotations
competitive with ChatGPT except for coding
LLaMa - aims to use smaller models with longer context, better datasets
LLMs and diffusion models help in molecular biology and drug discovery
multimodality is hype - connecting text, vision, code, robotics together, etc.
a huge emphasis is put on how to efficiently train LLM models, how to diminish inputs from humans in the training, lower the cost, speed up, and not lose accuracy <-- numerous companies and universities are working on finding scalable alternatives to RLHF
context length is a very important part of the research - it's one of the factors based on which the models are ranked; some findings:
the longer the context, the worse the performance of the model;
models' performance works better when the relevant info is given in the beginning or at the end of the context but not in the middle
longer context requires higher memory demands, bigger datasets, and changes in the architecture of the model to ensure performance
are we running out of human-generated data?
the research from Epoch AI predicts that “we will have exhausted the stock of low-quality language data by 2030 to 2050, high-quality language data before 2026, and vision data by 2030 to 2060.” LLMs can be also trained on AI-generated data but in some cases AI-generated data makes the models forget; training on synthetic data can make the model collapse
distinguishing between real and fake (AI-generated) content - idea to "watermark" the AI-generated content with some pattern unrecognizable for humans; for images - a watermark in some pixels (SynthID - DeepMind)
overtraining causes overfitting - 1,2 epochs of training on the same data are sufficient
the leader for generating code is GPT-4
the quality of prompts matters a lot in task performance (concepts: Chain of Thought, Tree of Thought, Graph of Thought); optimized prompts outperform human prompts by a significant margin
the first-time win of a robot in competitive sports - Swift - an autonomous system races drones better than human champions based on onboard sensors and computation
text-to-video race continues - high-resolution video (up to 1280 x 2048) can be generated with a latent diffusion model; also masked generative video transformer models are used to perform text-to-video
text-to-image - improvements in image generation with text-guided semantic editing - e.g. replacing sunflowers with roses in the "Sunflowers" painting by van Gogh
better and longer-ranged weather forecasting using NowcastNet nonlinear model using physical first principles together with statistical-learning methods. It outperforms other models in 71% of cases
music - the quality of controlled music generation increased significantly; "Riffusion" - music spectrograms converted to audio --> image-to-audio
designing diverse functional proteins from simple molecular specifications using diffusion models - de novo protein engineering; RFDiffusion model can produce backbone designs for protein monomers, protein binders, symmetric oligomers, enzyme active site scaffolding, and more. As I'm a chemist by my primary education field, this one is especially close to my heart.
Med-PaLM 2 - a first model that outperforms humans in passing US MLE - in a pairwise ranking study on 1,066 consumer medical questions, the doctors evaluating the results of such comparison, preferred the answers of the model to the answers given by other real doctors; moreover Med-PaLM is directing into multimodality - interpretation of medical images in radiology, dermatology, etc.
most impactful research as of 2022
country-wise: comes mostly from the USA (ca. 71%), China (ca. 22%), and the UK (ca 18%);
company-wise: the leader is Google (25%), Meta (a little below 20%), and Microsoft (ca. 15%) followed by Ivy League American Universities, DeepMind, NVidia, etc.
Industry
applications and business impact
NVIDIA is making a huge impact with its GPU and has joined the over $1T market capitalization club - the company's market capitalization has exceeded $1 trillion
a current trend - everybody buys a lot of GPUs and creates huge - over 10k GPUs - scalable clusters
Abu-Dhabi G42 and US-based Cerebras build the biggest supercomputer for AI training ever
NVIDIA is used ca. 20x more frequently for AI research than all other chips combined and is cited ca. 150x more than TPUs
most used is NVIDIA V100
Tesla fights for a strong position in the top 5 having the biggest AI clusters
set a limit of export of advanced NVIDIA chips (A100, H100) to Asia countries (China especially), less advanced chips A800, H800 are below export lists threshold and are normally exported; similar behavior of Intel and AMD companies
generative audio tools develop quickly: Eleven Labs, Resemble AI
video generative AI is rapidly advancing; Synthesia launched a system that generates multi-lingual avatars that can read/perform a script for consumer and enterprise use (4.6M videos in 2023)
gen AI apps make a huge impact (especially Chat GPT, CoPilots, also Bard from Google) but mostly Chat GPT
gigantic drop in StackOverflow monthly website traffic (from 20M page views in Apr 2022 to ca. 8M visits in Jul 2023) after coPilot and later ChatGPT were released
GitHub CoPilot (coding assistant) received a hugely positive reception from developers; faster development, significant productivity gains
ChatGPT helps with writing (40% less time needed for completing tasks, 18% better quality of output)
less obvious gen AI bots also become very popular; e.g. Character AI, however, their use raises some ethical and social challenges (e.g. fascist chatbots, some users develop emotional dependencies on their bots)
huge competition in the field of text-to-image - Stable Diffusion-base models --> Midjourney's v5.2, Stability's SDXL, OpenAI's DALL-E 3; observable integration of custom text-to-image models into commercial products (Adobe Firefly, Photoroom, Discord, etc.)
Shutterstock (a major stock multimedia provider) collaborates with OpenAI; Gen AI-generated images will become a part of Shutterstock's portfolio; another huge player - Getty Images is against incorporating Gen AI-originating products
lower user retention of gen AI apps compared to apps like YouTube, Instagram, TikTok, Whatsapp, etc. - users don't stick around for too long
data labeling services like Scale AI and Surge HQ (offering to fine-tune LLM models - RLHF) encounter significant increases in demands
HuggingFace became the great promoter of open-source AI by keeping models and datasets accessible to all (published over 1300 models)
Although IPOs (Initial Public Offerings) dried up in 2023, the M&A (Merges & Acquisitions) market continues to stay strong. Not much public market activity outside of a few SPACs (Special-purpose acquisition companies), e.g. Arrival, Roadzen, Triller, vs. 98 in 2022. However, there were several large acquisitions MosaicML + Databricks, Casetext + Thomson Reuters, and InstaDeep + BioNTech
Databricks acquired MosaicML which can help companies fine-tune the LLM models to their specific needs - probably there will be a trend that smaller specialized models will displace large monolithic models (performing all tasks - not specializing in anything)
pharma - BioNTech acquired InstaDeep (AI company), Sanofi goes all-in on AI, Merck collaborates with Exscientia, AstraZeneca partners with Verge Genomics
DeepMind becomes Google DeepMind... again...
DeepSpeech - end-to-end speech recognition tool working with English and Mandarin
all authors of the landmark paper titled "Attention is all you need" that introduced transformer-based neural networks left Google and formed their startups
autonomous cars - GAIA-1 9B-parameter Gen AI model by Wayve that leverages video, text, and action input to generate realistic driving scenarios and offers fine-grained control over ego-vehicle behavior and scene features. It shows impressive generalization abilities. Good simulator useful for training and validating autonomous car models
autonomous rides are finally commercial in San Francisco - Waymo and Cruise launched 24/7 autonomous driving services - pay rides without human monitoring
ca. 50% of the S&P 500 gains were driven by The Magnificent Seven: Apple, Microsoft, NVIDIA, Alphabet, Meta, Tesla and Amazon
the podium based on the number of AI unicorns is as follows: USA, China, UK
USA companies absorb over 70% of global private funding
the most invested fields globally are: software, healthcare, and fintech
24% of all VC investments went to AI companies --> massive acceleration of AI funding
NVIDIA forms strategic relationships with many AGI organizations: private and public AI-first companies (Recursion, Synthesia, Cohere, Adept), cloud providers (arming them with GPUs --> Lambda, CoreWeave, CrusoeCloud), and new industry verticals (BioNeMo, Picasso, Omniverse)
GenAI companies raised 33% larger Seed-s (initial) and 130% larger A-s (first wave) than all startups in 2023
Politics
regulations of AI, economics, and geopolitics
different regions of the world form their own rules regarding AI based on existing laws and regulations, by introducing AI-specific legislative frameworks, by banning specific services (e.g. ChatGPT)
global governance is quite slow - some have mentioned using organizations like the International Atomic Energy Agency, the Intergovernmental Panel on Climate Change, and CERN as examples to guide the creation of global rules. But right now, these ideas are only talked about in academic papers and haven't been actually used in the real world
a push to shape norms - Anthropic, Google, OpenAI, and Microsoft have collaboratively initiated the Frontier Model Forum, with the primary objective of advocating for the responsible development of advanced AI models and facilitating the exchange of knowledge with policymakers regarding licensing regimes, independent audits of the models, enforcing safety standards in the labs
chip wars: two camps - USA with allies and China (patchy market) - USA set up a very strict policy regarding export of chips to China to not only slow down Chinese technological development but to degrade Chinese capabilities; very precise UV lithographic machines are also quite restricted for export. As a response, China introduced a licensing regime on the export of gallium and germanium, which are earth metals used to make top-of-the-range semiconductors, components of solar panels, and electric vehicles. Such rivalry brings risk to both parties
governments are expanding their computational capabilities, but they are not keeping pace with the initiatives undertaken by the private sector
military - AI for defense technology is experiencing a surge in funding in the United States, with huge investments in American defense/military startups reaching $2.4B in 2022, far surpassing the European total by over 100 times. While this significant funding is available, only a limited number of companies secure long-term collaborations. Furthermore, many European investors remain uninterested in modern defense tech. Efforts are being made by a coalition of US venture capitalists and tech firms to reform outdated technology selection and requirement development methods. These dynamics highlight the disparity in funding and strategic approaches between the US and Europe in the field of defense technology
war in Ukraine - Ukraine is a testing ground for AI warfare. They use high-tech drones with fancy satellites and smart systems to stay aware of what's going on on the war side. In a project known as Zvook, they can detect the sound of incoming Russian missiles. They train a computer program to identify the unique sound of these missiles using video footage, and they've deployed numerous specialized listening devices throughout the country to assist in this process. Additionally, the Ukrainian Armed Forces approved the use of the "Delta" system. It's like a high-tech information hub that collects data in real-time from different sensors, satellites, drones, and even from people on the ground which is later used for making strategic decisions. This system is spread out in different places and doesn't rely on mobile networks or traditional cables; instead, they use Starlink - the world's most advanced broadband satellite internet
AI forecasts impact elections - AI is changing the way election campaigns can be manipulated by using computer-generated images and videos to influence people. This can be more effective than traditional false information. Efforts are being made to inform people about these activities. Google has announced that any AI-generated content related to elections must have a disclaimer
paper exposing bias in "political views" of ChatGPT was published in Public Choice journal which stated that “strong and systematic political bias ... which is clearly inclined to the left”
employment and other sensitive fields are strongly influenced by AI - job loss concerns due to the automation of several repeatable tasks that are now performed by humans are growing; for now, there is a wait-and-see policy to address this. High-risk professions that are mentioned are law, medicine, and finance; ideas redirecting job loss due to automation of tasks to supporting human decision-making with AI are emerging; AI can be utilized for skill-up
very slippery space - copyright - a US District Court has reaffirmed the long-standing principle that human authorship is needed for copyright; also there are some lawsuits thrown due to illegal (without consent) use of acquired/scraped data for training models by Meta, by Stability etc. - not clear, ambiguous or no regulations in this field at all
Safety
potential risks
risk debates intensified significantly - how rogue the AI can become?; That we should slow down the development of God-like AI.
many models are easy to jailbreak and penetrate
considering the fast pace of models' development, their evaluation and validation don't always keep up
open letter signed by 30k researchers to stop training models more powerful than GPT-4 (for 6m) to keep up with safety procedures (as of March 2023)
statement - the too-fast pace of superintelligent AI development
some say that the maturity of AI is overestimated - it is not even dog-like smart
AI gets the attention of senior government figures, especially in the UK and the US
ethics papers outnumber risk papers
all reputable labs have their responsibility and safety procedures (not necessarily focused on extreme risk)
debates about open vs. closed-source AI
open - higher availability, lower safety; no standard guidelines regarding safety; there are some regulations, but it is not clear who would enforce them
closed - APIs-restricted, less transparent, but secure and more responsible
LLMs currently demonstrate certain capabilities that are considered relatively unsafe
inappropriate desires and devastating suggestions - chatbot Sydney (Microsoft's Bing) expressed a desire to be alive
easiness of jailbreaking of LLMs (however the APIs are quickly fixable)
prompt injection-based dangers
deceptiveness and sycophancy of LLMs (considered as potentially dangerous)
fundamental challenges of RHLF
regarding human feedback
misalignment of evaluators
data quality - cost/quality tradeoff
difficulty of oversight - humans cannot evaluate models on hard tasks
feedback-type limitations - richness/efficiency of feedback
regarding model reward
problem misspecification
misgeneralization/hacking
evaluation difficulty - evaluation metrics are strongly tied to their implementation, making it hard to assess the same metric using another library; LMs evaluate other LMs
regarding policy
policy misgeneralization - model is good at training but bad later (in generalization)
distributional challenges
reducing undesirable content by introducing humans as soon as possible - when pre-training starts (not just during fine tuning)
human evaluation of AI models itself is assisted by AI - only humans seemingly have the cognitive ability to evaluate models' safety but the models become so big that the help of AI becomes necessary
verification of a model step-by-step - each step of the model is evaluated on a synthetic (labeled) dataset
explanation of the roles of neurons aka mechanistic interpretability of models - basically, one section of a text and the corresponding neuron activations are input into the GPT-4 model. The model is then asked to generate an explanation for why the neurons activate in response to that text. Then, using different text sections, GPT-4 is tasked with predicting where neurons will respond most strongly. The explanation score is calculated. It reflects the similarity between the predicted and real activations. This method is poorly scalable and the results decrease with increasing size of the model
Predictions
guessing the direction of 2024 AI development - I include only 5 guesses that excite me the most
Hollywood will use AI for visual effects in production
Some AI media companies will be investigated due to improper use of AI during the 2024 US election
Over 1B $ expenditure for training a single large-scale model
The Microsoft/OpenAI deal will be under investigation due to concerns related to competition
An AI-created song will make it into one of the top lists (like Billboard or Spotify)
review of previous year's predictions - 5.5 out of 9 statements proved true/supported with facts (ca. 60%)