How do you think RWKV, Mamba and other architectures challenge Transformer?

閱讀全文

2 Answer

King Of Kings 2024-01-17 22:48:41

擅長：AI

Yuanshi Intelligence, the company behind the open source LLM RWKV, completed its seed round of financing on January 16th. The company changed its business, with investment from Qiji Chuangtan, founded by Lu Qi in 2018, and an anonymous investor. At present, Yuanshi Intelligence has begun to continue to raise funds for the second round. wheel.

RWKV is the first domestic open source large language model with non-Transformer architecture. It has been iterated to the sixth generation RWKV-6. Its author, Bloomberg, started training RWKV-2 in May 2022, when it only had a parameter scale of 100 million (100M), and subsequently trained a 14 billion (14B) parameter version of RWKV-4 in March 2023.

The 1.5 billion and 3 billion parameter versions of RWKV-5 have been released, and the 7 billion parameter version will be released in January 2024. The 1.5 billion and 3 billion parameter versions of RWKV-6 will be released in February 2024, and then the 7 billion and 14 billion parameter versions will continue to be trained.

The RWKV-5 and RWKV-6 series are characterized by supporting 100+ languages around the world and dozens of programming languages. Currently, you can experience the online demo through the link on the www.rwkv.com page.

Bloomberg graduated from the Department of Physics at the University of Hong Kong. He had previously done quantitative trading in Hong Kong hedge funds for many years and also worked on intelligent hardware in Shenzhen.

In 2020, because he was interested in AIGC novel generation, he designed RWKV in the process of optimizing GPT. In terms of model architecture, RWKV innovatively rewrites GPT Transformer into a faster inference RNN form, while still maintaining the Transformer's parallel training capabilities and performance.

Bloomberg started programming at the age of 6 and has more than 30 years of programming experience. So far, the base model of RWKV has been trained by him alone. He believes that if large models are monopolized by a few companies, there will be risks to humanity, so after training RWKV, it was open sourced in order to create a more open model ecosystem.

Bloomberg’s world view of the AI universe is that humans are now at a point where they are gradually beginning to compete with AI. Bloomberg believes from the perspective of quantum physics that humans may be just a tool for the universe to achieve a higher goal. Therefore, if AI is more suitable for the goals of the universe than humans, the universe will eventually choose AI to replace humans, which is a danger to humans.

Currently, RWKV's Discord community has more than 8,000 developers overseas, from the United States, Europe, Asia, the Middle East, etc. There are five QQ groups in the domestic open source community with tens of thousands of people.

Luo Xuan, co-founder of RWKV Yuanshi Intelligence, told AI Technology Review why Qiji Venture Capital invested in them - mainly because it believed that RWKV with a non-Transformer architecture might bring more innovations and breakthroughs to large models.

Today, when Transformer dominates the world of large models, some people think that changing to a different architecture can break the current bottleneck of Transformer.

International leading technology companies are also pursuing different paths. In February 2022, OpenAI noticed RWKV and Bloomberg and sent him an invitation letter for an interview.

Bloomberg had not established a commercial company at the time, but he immediately wrote back and declined. He believes that the current OpenAI is too closed, and he hopes to do more open things, so his reply is "If OpenAI is willing to make large open source models in the future, cooperation is welcome."

Luo Xuan said that the basic model RWKV will always be open source, and it has been placed in the LF AI & Data incubation under the Linux Foundation (https://lfaidata.foundation/projects/rwkv/), so that RWKV can be seen by more people.

Currently, the RWKV team has nearly ten people and is continuing to recruit, with a target of 15 to 20 people. One person from Bloomberg is responsible for the training of the base model, while others will do model application, fine-tuning, optimization, multi-modality, ecological construction, etc.

For the RWKV team, Bloomberg hopes to optimize the model's architecture to the best before training a 100-billion-level model, so that computing resources can be better utilized. "The RWKV-6 architecture now represents the cutting-edge level of non-transformer architecture, and the architecture of the 7th generation model is being designed."

After maximizing the architecture, since the performance improvement curve (scaling law) of RWKV from 100 million to 14 billion parameters is stable, and the training process is stable, the training of the 100 billion model can be completed only with computing power.

The team will focus on three things in the future: 1. Training a 100 billion model; 2. Doing infra, taking the efficient operation of the terminal side as an entry point for infra, and cooperating with major chip manufacturers such as Qualcomm, Intel, and MediaTek (they will launch infra by the end of 2023) The press conference stated the cooperation with RWKV) to promote the implementation of models on end-side devices, such as mobile phones, PCs, and vehicles; 3. Incubate applications and ecology.

Some companies in China are already trying to use RWKV to train models. According to Luo Xuan, more than 10 foreign companies have used RWKV as an open source to start businesses and received financing.

In the past year, RWKV has been implemented in To C and To B. To C is mainly in the fields of agents, games, music generation, and role-playing; To B includes banks, law firms, etc.

According to Luo Xuan, the money raised this time is mainly used for tool stack construction, incubation ecology, and application incubation. Model training mainly relies on sponsorship and cooperation. The biggest obstacle now is the need for more computing power, so training a 100 billion model is still a challenge. Their most challenging mission yet.

展開

King Of Kings

King Of Kings 2024-01-22 00:27:40

擅長：AI

RWKV, Mamba, or archectures challenge Trsformer model by ferg alternative approaches mache learng that aim overcome some limations Trsformer.

One major challenge faced byr mod

s high computational requirements due s self-attention mechm. Th mechm allows Trr process put data parallel, but comes wh quadratic computat complexy, makg less efficient for long sequences. RWKV, Mamba, orrchectures aim ddress th sue by proposg alternative attentionms thatre computatally more efficient, such as kernelized self-at or sparse at. These mems reduce overall computatl cost whout sacrificg performce significtly.

Anor challenge whr model bily capture long-rge dependencies effectively. The self-atm rs attends all poss equally, which may not be optimal for capturg long-term

depe

ndencies. RWKV, Maectures fer solutions th problem bytroducg various modificatm. For example, RWKV troduces wdow-based self-at, which lims at scope reduce computatl cost while capturg megful context wh wdow.

Furrmore, seectures also explore different techniques improve generalizatbily r models. Forstce, Matroduces a Meta-Attentionm, whichdds ata-at module daptively combe different at heds for each example durg trag. Th allows model learn ocate at resources more effectively efficiently.

Overall, RWKV, Maectures challenge r model by providive app

roac

heddress limats m, reducg computatl requirements, enhcg long-rge dependency capture, improvg generalizaty. These advcements a create more efficient effective models for various natural lguage processg tasks.

展開

How do you think RWKV, Mamba and other architectures challenge Transformer?

2 Answer

Related questions