TRELLIS: 대규모 3D 생성 AI 모델

Kamil Lee

Uploaded image

문서 목표

TRELLIS의 개념과 구조를 이해한다.
설치 및 사용법을 단계별로 따라 한다.
사전학습 모델, 데이터셋, 트레이닝 방법을 확인한다.
실전 활용 팁과 참고자료를 얻는다.

TRELLIS란?

TRELLIS는 Microsoft가 개발한 대규모 3D 생성 AI 모델입니다. 텍스트 또는 이미지 프롬프트를 입력받아 다양한 3D 에셋(Radiance Fields, 3D Gaussians, Meshes 등)을 생성할 수 있습니다.

SLAT(Structured LATent): 다양한 3D 형식으로 디코딩 가능한 통합 구조적 잠재 표현
Rectified Flow Transformer: 강력한 3D 생성 백본
2B 파라미터, 50만 개 대규모 3D 데이터셋으로 학습
유연한 출력 포맷, 로컬 3D 편집 지원

주요 특징

고품질: 복잡한 형태와 텍스처의 다양한 3D 에셋 생성
다양성: 텍스트/이미지 입력, Radiance Fields, 3D Gaussians, Meshes 등 다양한 출력
유연한 편집: 생성된 3D 에셋의 변형, 부분 편집 가능

설치 방법

사전 준비

OS: Linux 권장 (Windows는 미완전 지원)
GPU: NVIDIA 16GB 이상 (A100, A6000, RTX 4090 등)
Python: 3.8 이상(3.10 권장)
CUDA Toolkit: 11.8 또는 12.2(Windows 기준 12.4 실행 확인): CUDA Toolkit 12.4
Build Tools for Visual Studio 2022: Build Tools for Visual Studio 2022

설치 단계

레포지토리 클론

git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git
cd TRELLIS

파이썬 가상환경 생성

python -m venv "trellis-venv"
.\trellis-venv\Scripts\activate

pip, wheel, setuptools 업데이트

python -m pip install --upgrade pip wheel
python -m pip install setuptools=75.8.2

의존성 1 생성 (requirements_1.txt)

--extra-index-url https://download.pytorch.org/whl/cu124
torch==2.5.1+cu124
torchvision==0.20.1+cu124
xformers==0.0.28.post3

의존성 2 생성 (requirements_2.txt)

pillow==10.4.0
imageio==2.37.0
imageio-ffmpeg==0.6.0
tqdm==4.67.1
easydict==1.13
opencv-python-headless==4.11.0.86
scipy==1.15.2
ninja==1.11.1.4
rembg==2.0.65
onnxruntime==1.21.0
trimesh==4.6.6
xatlas==0.0.10
pyvista==0.44.2
pymeshfix==0.17.0
igraph==0.11.8
transformers==4.50.3
open3d==0.19.0

git+https://github.com/EasternJournalist/utils3d.git@9a4eb15e4021b67b12c460c7057d642626897ec8

https://github.com/bdashore3/flash-attention/releases/download/v2.7.1.post1/flash_attn-2.7.1.post1+cu124torch2.5.1cxx11abiFALSE-cp310-cp310-win_amd64.whl; sys_platform == 'win32' and python_version == '3.10'
https://github.com/bdashore3/flash-attention/releases/download/v2.7.1.post1/flash_attn-2.7.1.post1+cu124torch2.5.1cxx11abiFALSE-cp311-cp311-win_amd64.whl; sys_platform == 'win32' and python_version == '3.11'
https://github.com/bdashore3/flash-attention/releases/download/v2.7.1.post1/flash_attn-2.7.1.post1+cu124torch2.5.1cxx11abiFALSE-cp312-cp312-win_amd64.whl; sys_platform == 'win32' and python_version == '3.12'
flash-attn; sys_platform != 'win32'

-e extensions/vox2seq

--find-links https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu124.html
kaolin

git+https://github.com/NVlabs/nvdiffrast.git
git+https://github.com/JeffreyXiang/diffoctreerast.git
git+https://github.com/autonomousvision/mip-splatting.git#subdirectory=submodules/diff-gaussian-rasterization/

spconv-cu120==2.3.6

gradio==4.44.1
gradio_litmodel3d==0.0.1
pydantic==2.10.6

의존성 설치

python -m pip install -r requirements_1.txt
python -m pip install -r requirements_2.txt

사용 중인 GPU에 따라 환경변수 set TORCH_CUDA_ARCH_LIST=8.9의 값이 달라집니다. CUDE GPU Compute Capability를 참조하세요.

설치가 오래 걸릴 수 있습니다. 에러 발생 시 의존성을 하나씩 설치하거나, CUDE GPU Compute Capability에서 해결 방법을 확인하세요.

사전학습 모델

모델	설명	파라미터 수	다운로드
TRELLIS-image-large	이미지→3D 대형 모델	1.2B	Download
TRELLIS-text-base	텍스트→3D 기본 모델	342M	Download
TRELLIS-text-large	텍스트→3D 대형 모델	1.1B	Download
TRELLIS-text-xlarge	텍스트→3D 초대형 모델	2.0B	Download

이미지 조건 모델이 일반적으로 더 좋은 품질을 제공합니다.

모델 로딩 예시

from trellis.pipelines import TrellisImageTo3DPipeline

pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()

사용법 (최소 예제)

from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline

pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()
image = Image.open("assets/example_image/T.png")
outputs = pipeline.run(image, seed=1)
# outputs['gaussian'], outputs['radiance_field'], outputs['mesh'] 등 다양한 3D 결과

결과: 3D Gaussian, Radiance Field, Mesh 비디오(mp4) 및 파일(glb, ply) 생성
자세한 예제: example.py

웹 데모 실행

Gradio 기반 웹 데모 제공:

python .\app.py

Hugging Face에서 온라인 데모도 체험 가능: Live Demo

데이터셋: TRELLIS-500K

Objaverse(XL), ABO, 3D-FUTURE, HSSD, Toys4k 등에서 50만 개 3D 오브젝트 수집
미적 품질 기반 필터링, 다양한 3D 형식 포함
상세 정보: DATASET.md

트레이닝

train.py로 트레이닝 실행, 구성 파일(configs/)로 하이퍼파라미터 설정

싱글/멀티 노드, 체크포인트 재시작, 프로파일링 등 다양한 옵션 지원

트레이닝 명령 예시

python train.py \
  --config configs/vae/slat_vae_dec_mesh_swin8_B_64l8_fp16.json \
  --output_dir outputs/slat_vae_dec_mesh_swin8_B_64l8_fp16_1node \
  --data_dir /path/to/your/dataset1,/path/to/your/dataset2

분산 트레이닝, 체크포인트 재시작, dry-run 등 다양한 옵션은 공식 문서참고

라이선스

코드 및 모델 대부분: MIT License
일부 서브모듈: 별도 라이선스 (diffoctreerast, Flexicubes 등)
상세: LICENSE

인용

이 프로젝트가 도움이 되었다면 논문을 인용해 주세요.


@article{xiang2024structured,
    title   = {Structured 3D Latents for Scalable and Versatile 3D Generation},
    author  = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong},
    journal = {arXiv preprint arXiv:2412.01506},
    year    = {2024}
}