神经网络比较两个数字
2021年12月21日
比较两个数的大小
def f1(a, b):
if a > b:
return 1
else:
return 0
目标:在20 epochs内达到0.99的准确率。
根据《STEP-BY-STEP MODELING A NEURAL NETWORK FOR CLASSIFICATION》的经验,网络层采用2-4-1的结构。(2-2-1结构达到准确率0.98)
relu:对初始值(随机数种子)敏感。有可能获得死亡神经元,即一个或多个神经元不发挥作用,导致训练准确性一直不变。
Epoch 1/20 2021-12-22 12:21:24.605430: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 80/80 [==============================] - 1s 2ms/step - loss: 0.2506 - accuracy: 0.4859 - val_loss: 0.2500 - val_accuracy: 0.4920 Epoch 2/20 80/80 [==============================] - 0s 783us/step - loss: 0.2500 - accuracy: 0.5015 - val_loss: 0.2500 - val_accuracy: 0.4920 Epoch 3/20 80/80 [==============================] - 0s 757us/step - loss: 0.2500 - accuracy: 0.5015 - val_loss: 0.2500 - val_accuracy: 0.4920 Epoch 4/20 80/80 [==============================] - 0s 770us/step - loss: 0.2500 - accuracy: 0.5015 - val_loss: 0.2500 - val_accuracy: 0.4920 Epoch 5/20 80/80 [==============================] - 0s 783us/step - loss: 0.2500 - accuracy: 0.5015 - val_loss: 0.2500 - val_accuracy: 0.4920
因此选择一个类似relu的激活函数——elu。[1]
因为label是0和1,神经网络的输出也应该在[0,1],所以最后一层采用输出范围[0,1]的激活函数,因此选择sigmoid。
loss函数采用mse(mean square error),因为label和prediction的范围都在[0,1]。[latex]\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2[/latex]
metric选择binary_accuracy。如果写accuracy也不会错,因为tensorflow会自动改为binary_accuracy。[2]
import tensorflow as tf
if __name__ == '__main__':
size = 10000
x = tf.random.uniform((size, 2), 0.1, 0.5)
label = tf.cast(x[:, 0] > x[:, 1], tf.int32)
validationSize = 2000
trainingSet = tf.data.Dataset.from_tensor_slices((x[validationSize:], label[validationSize:]))
validationSet = tf.data.Dataset.from_tensor_slices((x[:validationSize], label[:validationSize]))
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2, input_shape=(2,), activation='elu'))
model.add(tf.keras.layers.Dense(4, activation='elu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='mse', metrics=['binary_accuracy'])
assert len(trainingSet.element_spec) == 2, 'fit stipulates that if x is a dataset, its element must have two elements, feature and label, though it is ok to have None as label.'
assert len(trainingSet.element_spec) == 2, 'Each example in dataset must be a 2-tuple.'
assert len(trainingSet.element_spec[0].shape) > 0, 'In each example, the first element itself must be a list.'
model.fit(trainingSet.batch(100), epochs=20, validation_data=validationSet.batch(100))
Epoch 20/20 80/80 [==============================] - 0s 884us/step - loss: 0.0351 - accuracy: 0.9952 - val_loss: 0.0343 - val_accuracy: 0.9915
变化输入范围
原先的输入范围是[0.1,0.5),变化范围到[0.1,50)、[0.1,500)、[0.1,5000),发现该模型开始变得对初始值(随机数种子)敏感。例如在输入范围为5000时,loss只会下降一点点,然后就不动了。
Epoch 1/20 2021-12-22 13:03:47.884242: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 80/80 [==============================] - 0s 2ms/step - loss: 0.2514 - binary_accuracy: 0.4849 - val_loss: 0.2498 - val_binary_accuracy: 0.5150 Epoch 2/20 80/80 [==============================] - 0s 745us/step - loss: 0.2500 - binary_accuracy: 0.4966 - val_loss: 0.2503 - val_binary_accuracy: 0.4850 Epoch 3/20 80/80 [==============================] - 0s 732us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2505 - val_binary_accuracy: 0.4850 Epoch 4/20 80/80 [==============================] - 0s 745us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 5/20 80/80 [==============================] - 0s 732us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 6/20 80/80 [==============================] - 0s 745us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 7/20 80/80 [==============================] - 0s 720us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 8/20 80/80 [==============================] - 0s 732us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 9/20 80/80 [==============================] - 0s 770us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850 Epoch 10/20 80/80 [==============================] - 0s 757us/step - loss: 0.2498 - binary_accuracy: 0.5151 - val_loss: 0.2506 - val_binary_accuracy: 0.4850
把elu的alpha改为2.0或3.0,不能改善这个问题。
import tensorflow as tf
if __name__ == '__main__':
size = 10000
x = tf.random.uniform((size, 2), 0.1, 5000)
label = tf.cast(x[:, 0] > x[:, 1], tf.int32)
validationSize = 2000
trainingSet = tf.data.Dataset.from_tensor_slices((x[validationSize:], label[validationSize:]))
validationSet = tf.data.Dataset.from_tensor_slices((x[:validationSize], label[:validationSize]))
elu=tf.keras.layers.ELU(alpha=2.0)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(2, input_shape=(2,)))
model.add(elu)
model.add(tf.keras.layers.Dense(4))
model.add(elu)
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='mse', metrics=['binary_accuracy'])
assert len(trainingSet.element_spec) == 2, 'fit stipulates that if x is a dataset, its element must have two elements, feature and label, though it is ok to have None as label.'
assert len(trainingSet.element_spec) == 2, 'Each example in dataset must be a 2-tuple.'
assert len(trainingSet.element_spec[0].shape) > 0, 'In each example, the first element itself must be a list.'
model.fit(trainingSet.batch(100), epochs=20, validation_data=validationSet.batch(100))
结论是,如果输入范围大,应该需要更多的example,或者缩小输入范围。
Feature Engineering 数字相除
x原先是1000*2的张量(见图),现在把第一列除以第二列,把商作为输入。
这个技巧的要求是第二列不能为0。注意我们的输入范围是从0.1开始的。虽然在原始输入是[0.1, 5000]这种极端情况下,商仍然很大(如果不是更大),但大部分情况下都能缩小输入。在代码,相除这个操作被实现为Layer,可以被直接放入神经网络中,但这样训练速度变慢了一点点。所以代码把Division单独提了出来,作为preprocessing。准确率基本都达到0.99。
比较三个数字
def f2(a, b, c):
if a > b:
return 1
elif a > c:
return 2
else:
return 0
从代码可见,为了学习新的模式,神经网络的输出要改为3个,并且是categorical。神经网络的输入层从2改为4个神经元,中间层从4改为12个神经元。中间层的数量从1加到4。我尝试把中间层限制为8个神经元,但不能保证每次达到0.99的准确率。
这说明增加一个比较,要增加2到3倍的神经元,也要增加网络层。
增加干扰项
import tensorflow as tf
import numpy as np
import nn
def g(t):
(a, b, c, d) = t
if a > b:
return 1
elif a > c:
return 2
else:
return 0
if __name__ == '__main__':
size = 10000
x = np.random.uniform(0.1, 5000, (size, 4))
label = np.apply_along_axis(g, 1, x)
x = tf.constant(x)
label = tf.constant(label)
division = nn.Division((1, 4), 0)
x = division(x)
validationSize = 2000
trainingSet = tf.data.Dataset.from_tensor_slices((x[validationSize:], label[validationSize:]))
validationSet = tf.data.Dataset.from_tensor_slices((x[:validationSize], label[:validationSize]))
three = tf.keras.models.load_model('three')
three.trainable = False
model = tf.keras.Sequential()
# model.add(nn.Division((1, 3), 0))
model.add(tf.keras.layers.Dense(8, activation='elu'))
model.add(tf.keras.layers.Dense(8, activation='elu'))
model.add(tf.keras.layers.Dense(2, activation='elu'))
model.add(three)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])
assert len(trainingSet.element_spec) == 2, 'fit stipulates that if x is a dataset, its element must have two elements, feature and label, though it is ok to have None as label.'
assert len(trainingSet.element_spec) == 2, 'Each example in dataset must be a 2-tuple.'
assert len(trainingSet.element_spec[0].shape) > 0, 'In each example, the first element itself must be a list.'
model.fit(trainingSet.batch(100), epochs=20, validation_data=validationSet.batch(100))
代码的最后保存了模型。现在用transfer learning的方法载入该模型,接着创建一个新模型,先过滤干扰项,然后用已保存的模型计算。因为transfer learning的缘故,准确率不可能比原先的0.99高,所以这里只要求0.97。
这里8-8已是最小要求。2是为了令输出符合已保存模型three。
如果有两个干扰项,模型开头两层要变成10-10。
大一个百分比
基于“比较三个数字”,现在把label函数改为代码,如果a比b的1.1倍大,标签才为1;如果a比c的1.2倍大,标签才为2。
按照Andreas Madsen; Alexander Rosenberg Johansen. NEURAL ARITHMETIC UNITS. ICLR 2020. , (): [2021-12-24].,用神经网络进行四则运算是困难的。
我新增了3个12神经元的Dense层,也不能保证每次达到0.99。
参考资料
- Pragati Baheti. 12 Types of Neural Network Activation Functions: How to Choose?. . 2021-11-29 [2021-12-22].↑
- . tf.keras.Model. TensorFlow API. [2021-12-22].↑