Hugging Face에 대해 알아보자 (3) text generator, image generator, Audio generator

서론

지난 포스팅에서는 Hugging Face를 이용해 간단한 파이프라인으로 Question and Answer, Text Summarization, Translation에 대해서 알아봤다. 해당 포스팅에 대해서 알고 싶다면 아래 링크로 가면 된다. 이번 포스팅에서는 text, image, audio를 만드는 방법에 대해서 알아보고자 한다.

Hugging Face에 대해 알아보자 (2) Question and Answering, Text Summarization, Translation 기능

서론지난 포스팅에서는 Hugging Face API 등록, 감정 분석, 고유명사 식별 분석을 하는 방법을 알아보았다. 이번 포스팅에서는 Hugging Face를 사용해 Question and Answering,Text Summarization, Translation을 해볼 예

quiseol.com

Text Generator

Text Generator는 말 그대로 텍스트를 생성하는 것이다. 아래와 같이 일부 문장을 주면 text generator가 뒤의 문장을 마저 생성한다. "If there's one thing I want you to remember about using HuggingFace pipelines, it's how easy and powerful they are for rapid prototyping and deployment of NLP models." 이렇게 말이다.

generator = pipeline("text-generation", device="cuda")
result = generator("If there's one thing I want you to remember about using HuggingFace pipelines, it's")
print(result[0]['generated_text'])

Image Generator

image_gen = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
    ).to("cuda")

text = "A class of Data Scientists learning about AI, in the surreal style of Salvador Dali"
image = image_gen(prompt=text).images[0]
image

Image Generator는 말 그대로 이미지를 생성해주는 것이다. 텍스트를 입력하면 이미지를 생성해준다. text에 원하는 내용을 입력하면 그것에 맞게 이미지가 생성되는 것이다. 참고로 살바도르 달리 스타일로 해달라고하면 기기괴괴한게 나오니까 안하는거 추천. 그리고 당연히 이미지인 만큼 돈도 더 많이 나가고 시간도 더 많이 소비된다.

Audio generator

synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts", device='cuda')

embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

speech = synthesiser("Hi to an artificial intelligence engineer, on the way to mastery!",
forward_params={"speaker_embeddings": speaker_embedding})

sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])

pipeline에 "text-to-speech"를 입력하면 말 그대로 텍스트를 audio로 바꿔주는 시스템이다. 예전에 혼자 tts를 할때는 google에서 api를 가져다가 했는데 이렇게도 가능한지는 처음 알았다. 근데 이게 훨씬 더 간단한듯. api가져다쓸때 뭐가 주렁주렁 있던거 생각하면 훨씬 짧다.

끝으로

이 외에도 허깅페이스에는 다양한 파이프라인들이 있다. 만약에 여기에 대해 알고싶다면, 아래 링크에 대해 들어가는걸 추천한다. https://huggingface.co/docs/transformers/main_classes/pipelines

또한 Transformers 대신 Diffusion 모델을 사용하고 싶다면 다음 링크를 참고하면 된다. https://huggingface.co/docs/diffusers/en/api/pipelines/overview

'IT, Digital' 카테고리의 다른 글

티스토리 블로그 애드센스 수익 여정 포스팅 (9) 애드센스 승인 (1)	2025.05.21
Hugging Face에 대해 알아보자 (4) Quantization (0)	2025.05.20
Hugging Face에 대해 알아보자 (2) Question and Answering, Text Summarization, Translation 기능 (0)	2025.05.18
Hugging Face에 대해 알아보자 (1) API 등록, 감정 분석, 고유명사 식별 분석 (1)	2025.05.16
[GCP] 모델 평가 (Model Evaluation) (4) (0)	2025.05.10

서론

Text Generator

Image Generator

Audio generator

끝으로

'IT, Digital' 카테고리의 다른 글

티스토리툴바