GuMorming

[Paper Reading] A Survey on Efficient Inference for Large Language Models

发表于2024-07-19|Paper Reading|LLM•Survey•Inference

Paper Source: A Survey on Efficient Inference for Large Language Models PRELIMINARIES Transformer-Styles LLMs 主流LLM是基于Transformer架构设计的，典型的Transformer架构模型由数个堆叠的（stacked）Transformer Block 组成。 Transformer Block： Attention-Linear(generate matrix Q, K, V) Multi-Head Self-Attention（MHSA） Feed Forward Network（FFN） Layer Norm 每个 Transformer Block 接收前一个 Transformer Block 的输出特征，并将其作为输入，并将特征串行送进每个子模块中，最后输出。特别的，在第一个 Transformer Block 前，需要用一个 tokenizer 将输入语句（prompts）转化为 token 序列，紧接 word embedding 及 p ...

[Paper Reading] LoongServe Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

发表于2024-07-11|Paper Reading|LLM•Distributed

Paper Source: LoongServe Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism Background LLM推理过程[^1] 模型参数加载到GPU 在CPU上对prompt分词（tokenizing），并将token tensor传输到GPU 输入token到Transformer，生成第一个token 将生成的token附加到输入token序列中，将其作为生成第二个token的新输入。然后，重复此过程，直到生成了停止符（EOS，end-of-sequence）或达到最大序列长度将完成的 tokens 从 GPU 获取到 CPU ，并对它们进行 detokenize（”detokenize“指的是将模型生成的 tokens 序列转换回原始文本或句子的过程。可能包括去除 tokens 之间的空格、添加标点符号、还原缩写等操作，以还原生成文本的自然语言形式。），以获取生成的文本。 Transformer LL ...

[Paper Reading] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KV Cache

发表于2024-07-02|Paper Reading|LLM•KV Cache

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KV Cache 本文主要研究云端/分布式环境下LLM对于长上下文任务的KV Cache管理问题 Challenges to LLM serving on Cloud(Motivation) Challenge 1: significant disparities in memory demands obstacles efficient model parallelism 如[^Table 1]所示。 [^Table 1]: LLaMA2-13B, KV Cache size with context legnth Context length 10k 100k 500k 1000k KV Cache size 8.19GB 81.9GB 409.6GB 819.2GB Misc size 26GB 26GB 26GB 26GB 为了满足长上下文任务所需的大量 KV 缓存，必须增加 ...

BlueArchive国服使用国际服/日服立绘的方法（Android）

发表于2024-06-27|GameBlueArchive|和谐•立绘

下载MT管理器注意不要使用自带商店，一般都是盗版这里贴一个 MT管理器 (mt2.cn)官网使用时允许文件访问权限更改LocalizeConfig.txt 文件路径:Android/data/com.RoamingStar.BlueArchive/files/LocalConfig.txt 访问路径过程同意权限要求文件内容更改如下 123Env=devIsLocalize=falseResUrls=http://mx.jvav.net.cn/asdf;http://mx.jvav.net.cn/asdf;http://mx.jvav.net.cn/asdf MuMu模拟器也可以使用，把MT管理器的APK文件直接拖入应用窗口就会自动安装啦更改后效果

Learn&Record: Why Multitasking Is Bad for You

发表于2024-06-21|Learn & Record|Time•English Reading

From: TIME, Apr 20, 2017 中文文本为机器翻译并非一一对应，仅供参考转载自：LearnAndRecord: 人要不要all in做任何事情近日，#人最好不要all in做任何事情#的话题引发热议。你同意吗？ For nearly all people, in nearly all situations, multitasking is impossible. When we think we’re multitasking, most often we aren’t really doing two things at once – but instead, individual actions in rapid succession. 对于绝大多数人来说，一心多用几乎是不可能的。当我们认为自己在进行多任务处理时，多数情况下我们并不是真的同时在完成两项任务，而只是快速地在单项任务之间切换。 multitask /ˌmʌl.tiˈtɑːsk/ 表示“同时做多件事”，英文解释为“to do more than one thing at a time” ...