feature engineering
非数值类型转成数值类型。使用sklearn中的LabelEncoder,Encode labels with value between 0 and n_classes-1.
注意先fit训练(输入所有字符串),然后再传入要转换的数据结构进行transform,得到最终结果。
>>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...)
数值类型转成二进制数字,消除潜在的邻近性。使用sklearn中的OneHotEncoder, Encode categorical integer features using a one-hot aka one-of-K scheme.
onehot = OneHotEncoder() X_teams = onehot.fit_transform(X_teams).todense()
Last updated
Was this helpful?