AshSpace

Github Blog LinkedIn HuggingFace

Contents
📄Paper
- Paper List
- [2017] Attention is all you need
🔤Glossary
- Introduce
🔵HCI
- Introduce
✒️Preprocessing
🔪Tokenizing
📙Embedding
🧠Model
💯Evaluate
- Introduce
📝Methology
🌊Serving & Inference
🅰️Appendix
- Useful Reference
🎤Tech-Interview
- Interview List
🧑‍💻Playground
- Introduce
🇰🇷Korean Dataset
- Dataset

Powered by GitBook

Inference

모델을 이용한 추론을 하는 방법과 최적화 기법을 알아봅시다.

vLLM : A high-throughput and memory-efficient inference and serving engine for LLMs
- https://github.com/vllm-project/vllm
FlashAttention : Fast and memory-efficient exact attention
- https://github.com/Dao-AILab/flash-attention

PreviousPrompt Engineering NextServing (with. Prod)

Last updated 3 months ago

On this page

Page cover image