[오류Error] Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

오류Error

[오류Error] Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

독립성이 강한 ISFP 2024. 11. 22. 12:24

728x90

텍스트 전처리 공부하는 중에 로컬에서 토크나이저를 수행하려고 하니 에러가 발생했다.

sent_text = sent_tokenize(content_text)

{
	"name": "LookupError",
	"message": "
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/Users/song/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/share/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
",
	"stack": "---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
Cell In[8], line 2
      1 # 입력 코퍼스에 대해서 NLTK를 이용하여 문장 토큰화를 수행.
----> 2 sent_text = sent_tokenize(content_text)
      4 # # 각 문장에 대해서 구두점을 제거하고, 대문자를 소문자로 변환.
      5 # normalized_text = []
      6 # for string in sent_text:
   (...)
     10 # # 각 문장에 대해서 NLTK를 이용하여 단어 토큰화를 수행.
     11 # result = [word_tokenize(sentence) for sentence in normalized_text]

File ~/opt/anaconda3/envs/song38/lib/python3.8/site-packages/nltk/tokenize/__init__.py:119, in sent_tokenize(text, language)
    109 def sent_tokenize(text, language=\"english\"):
    110     \"\"\"
    111     Return a sentence-tokenized copy of *text*,
    112     using NLTK's recommended sentence tokenizer
   (...)
    117     :param language: the model name in the Punkt corpus
    118     \"\"\"
--> 119     tokenizer = _get_punkt_tokenizer(language)
    120     return tokenizer.tokenize(text)

File ~/opt/anaconda3/envs/song38/lib/python3.8/site-packages/nltk/tokenize/__init__.py:105, in _get_punkt_tokenizer(language)
     96 @functools.lru_cache
     97 def _get_punkt_tokenizer(language=\"english\"):
     98     \"\"\"
     99     A constructor for the PunktTokenizer that utilizes
    100     a lru cache for performance.
   (...)
    103     :type language: str
    104     \"\"\"
--> 105     return PunktTokenizer(language)

File ~/opt/anaconda3/envs/song38/lib/python3.8/site-packages/nltk/tokenize/punkt.py:1744, in PunktTokenizer.__init__(self, lang)
   1742 def __init__(self, lang=\"english\"):
   1743     PunktSentenceTokenizer.__init__(self)
-> 1744     self.load_lang(lang)

File ~/opt/anaconda3/envs/song38/lib/python3.8/site-packages/nltk/tokenize/punkt.py:1749, in PunktTokenizer.load_lang(self, lang)
   1746 def load_lang(self, lang=\"english\"):
   1747     from nltk.data import find
-> 1749     lang_dir = find(f\"tokenizers/punkt_tab/{lang}/\")
   1750     self._params = load_punkt_params(lang_dir)
   1751     self._lang = lang

File ~/opt/anaconda3/envs/song38/lib/python3.8/site-packages/nltk/data.py:579, in find(resource_name, paths)
    577 sep = \"*\" * 70
    578 resource_not_found = f\"\
{sep}\
{msg}\
{sep}\
\"
--> 579 raise LookupError(resource_not_found)

LookupError: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/Users/song/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/share/nltk_data'
    - '/Users/song/opt/anaconda3/envs/song38/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
"
}

에러 메시지를 보면, nltk 라이브러리가 punkt_tab 리소스를 찾지 못해서 발생한 문제이다.

punkt는 NLTK의 문장 토큰화를 위해 필요한 데이터이기 때문에 nltk.download() 함수를 사용하여 필요한 데이터를 설치해야 한다.

해결 방법

import nltk

# punkt 데이터 다운로드
nltk.download('punkt')

728x90

저작자표시 (새창열림)

'오류Error' 카테고리의 다른 글

[오류Error] Failed to start project studioError invoking remote method 'up': Error: Docker not installed or not running: ExecaError: Command failed with exit code 1: which docker (0)	2025.04.18
[오류Error] VS Code Remote-SSH: "원격 호스트가 VS Code Server를 실행하기 위한 필수 구성 요소를 충족하지 않습니다" 오류 해결 후기 (7)	2025.04.11
[오류Error] FileNotFoundError: [Errno 2] JVM DLL not found: /Library/Java/JavaVirtualMachines/microsoft-11.jdk/Contents/Home/lib/jli/libjli.dylib" (1)	2024.11.15
[오류Error] RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same (0)	2024.04.17
[오류Error] exception: install mecab in order to use it: http://konlpy.org/en/latest/install/ (0)	2024.02.08

현재글[오류Error] Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

250x250

머신러닝 딥러닝과 친해지는중 🐥

머신러닝, Python, 자연어처리, nlp, Deep Learning, cnn, pytorch, 토큰화, machinelearning, 데이터분석, konlpy, 분류, 딥러닝, 오블완, Pandas, 인공지능, 텍스트전처리, deeplearning, Ai, 티스토리챌린지,

Today :
Yesterday :

resultofeffort

[오류Error] Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

해결 방법

'오류Error' 카테고리의 다른 글

'오류Error'의 다른글

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

[오류Error] Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource:

해결 방법

'오류Error' 카테고리의 다른 글

'오류Error'의 다른글

관련글

티스토리툴바