[python] 데이터 분석할때 필요한 코드 / 함수 정리

python

[python] 데이터 분석할때 필요한 코드 / 함수 정리

독립성이 강한 ISFP 2022. 2. 20. 03:04

728x90

from IPython.core.interactiveshell import InteractiveShell # 값을 연속적으로 출력해줌
InteractiveShell.ast_node_interactivity = 'all'

import warnings # 버전 차이로 인해 출력되는 에러 문구를 무시
warnings.filterwarnings("ignore")

from IPython.display import set_matplotlib_formats # 시각화 그래프 내의 글자를 선명하게 해줌
set_matplotlib_formats("retina")

pd.options.display.max_rows=100 # 데이터프레임의 행과 열을 몇 개까지 출력해서 확인 할 것인지
pd.options.display.max_columns=100

plt.rc("font", family="Malgun Gothic") # 그래프 시각화 한글 폰트
plt.rc("axes", unicode_minus=False) # 그래프에 마이너스를 표기

from matplotlib import font_manager, rc # matplotlib 한글 깨짐 방지
font_path = "C:/Windows/Fonts/NGULIM.TTF"
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)

##### csv 파일 read_csv #####

import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
submission = pd.read_csv('sample_submission.csv')

##### train_test_split #####

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 11)

##### OneHotEncoder #####

# sparse=False는 희소 행렬(sparse matrix)이 아닌 일반적인 배열(array) 형태로 변환된 결과를 반환하도록 지정
# handle_unknown='ignore'는 알 수 없는 카테고리가 들어왔을 때 에러를 발생시키지 않고 무시하도록 지정.  즉, 변환 시 학습 데이터에 없던 새로운 카테고리 값이 들어오는 경우, 해당 카테고리는 모두 0으로 처리

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(sparse=False, handle_unknown = 'ignore')
onehot_train = ohe.fit_transform(train_result[['Bid_class']])
onehot_frame = pd.DataFrame(onehot_train, columns = ohe.categories_[0])
train_result = pd.concat([train_result, onehot_frame], axis = 1)
train_result = train_result.drop(['Bid_class', '일괄'], axis = 1)

##### LabelEncoding #####

from sklearn.preprocessing import LabelEncoder
import numpy as np

label_col = ['Auction_class', 'addr_do', 'addr_san', 'Share_auction_YorN', 'Auction_results_left', 'Apartment_usage']

for col in label_col:
    le = LabelEncoder()
    train_result[col] = le.fit_transform(train_result[col])
    
    for label in np.unique(test_result[col]): 
        if label not in le.classes_: 
            le.classes_ = np.append(le.classes_, label)
    test_result[col] = le.transform(test_result[col])
    
##### Scaling ##### 

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train = scaler.fit_transform(train)
test= scaler.transform(test)

728x90

'python' 카테고리의 다른 글

[python] object 형 => numerical 형태로 변경해주는 to_numeric 함수 (0)	2022.03.22
[python] 결정 트리 시각화 Graphviz 설치 (0)	2022.03.20
[python] apply, lammda 함수의 활용 (0)	2022.03.18
[python] 카테고리형(categorical) => 수치형(numerical) 데이터로 변경하는 2가지 방법 (0)	2022.03.18
[python] 파이썬 클래스 python class (0)	2022.02.22

현재글[python] 데이터 분석할때 필요한 코드 / 함수 정리

250x250

머신러닝 딥러닝과 친해지는중 🐥

텍스트전처리, konlpy, nlp, cnn, Ai, 데이터분석, 분류, Python, deeplearning, 머신러닝, Deep Learning, Pandas, 인공지능, 딥러닝, pytorch, 오블완, 토큰화, machinelearning, 자연어처리, 티스토리챌린지,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

resultofeffort

[python] 데이터 분석할때 필요한 코드 / 함수 정리

'python' 카테고리의 다른 글

'python'의 다른글

티스토리툴바

[python] 데이터 분석할때 필요한 코드 / 함수 정리

'python' 카테고리의 다른 글

'python'의 다른글

관련글

티스토리툴바