1-SVC

Score:0.90929

Top:91% 1671/1825

探索数据:

  • 取5000个样本进行训练。
  • 特征缩放:大于1的特征取1。
  • 使用SVC,提交得分 0.90929
In [1]:
import pandas as pd

train_df=pd.read_csv('train.csv')
train_df.head()
Out[1]:
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 785 columns

In [2]:
test_df=pd.read_csv('test.csv')
test_df.head()
Out[2]:
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 784 columns

In [3]:
train_df.shape
Out[3]:
(42000, 785)
In [4]:
# 使用全部数据训练慢,这里只取5000组数据。
labels=train_df['label'][:5000]
images=train_df.drop(['label'],axis=1)[:5000]
len(labels),images.shape
Out[4]:
(5000, (5000, 784))
In [5]:
from sklearn.model_selection import train_test_split

train_images,test_images,train_labels,test_labels=train_test_split(images,labels,train_size=0.8)
train_images.shape,test_images.shape,train_labels.shape,test_labels.shape
Out[5]:
((4000, 784), (1000, 784), (4000,), (1000,))
In [6]:
from sklearn import svm
clf=svm.SVC()
clf.fit(train_images,train_labels)
clf.score(test_images,test_labels)
Out[6]:
0.10000000000000001
In [7]:
# 特征缩放
train_images[train_images>0]=1
test_images[test_images>0]=1

clf.fit(train_images,train_labels)
clf.score(test_images,test_labels)
/usr/local/lib/python2.7/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
/usr/local/lib/python2.7/site-packages/pandas/core/frame.py:2454: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._where(-key, value, inplace=True)
/usr/local/lib/python2.7/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
Out[7]:
0.90800000000000003
In [8]:
test_df[test_df>0]=1
results=clf.predict(test_df)
In [13]:
results[:5]
Out[13]:
array([2, 0, 9, 9, 2])
In [16]:
submissions=pd.DataFrame({"ImageId": list(range(1,len(results)+1)),
                         "Label": results})
submissions.to_csv('submission_1.csv',index=False)