Quantcast
Channel: Active questions tagged feed-forward+python+neural-network - Stack Overflow
Viewing all articles
Browse latest Browse all 25

Multilayer feedforward net fails to train in TensorFlow

$
0
0

I started with the TensorFlow tutorial to classify the images in the mnist data set using a single layer feedforward neural net. That works OK, i get 80+ percent on the test set. Then I tried to modify it to a multilayer network by adding a new layer in between. After this modification all my attempts to train the network fails. The first couple of iterations the network becomes a bit better but then it stagnates at 11.35% accuracy.

First twenty iterations using 1 hidden layer:

Train set: 0.124, test set: 0.098
Train set: 0.102, test set: 0.098
Train set: 0.112, test set: 0.101
Train set: 0.104, test set: 0.101
Train set: 0.092, test set: 0.101
Train set: 0.128, test set: 0.1135
Train set: 0.12, test set: 0.1135
Train set: 0.114, test set: 0.1135
Train set: 0.108, test set: 0.1135
Train set: 0.1, test set: 0.1135
Train set: 0.114, test set: 0.1135
Train set: 0.11, test set: 0.1135
Train set: 0.122, test set: 0.1135
Train set: 0.102, test set: 0.1135
Train set: 0.12, test set: 0.1135
Train set: 0.106, test set: 0.1135
Train set: 0.102, test set: 0.1135
Train set: 0.116, test set: 0.1135
Train set: 0.11, test set: 0.1135
Train set: 0.124, test set: 0.1135

It does not matter how long time I train it, it is stuck here. I have tried to change from rectified linear units to softmax, both yields the same result. I have tried to change the fitness function to e=(y_true-y)^2. Same result.

First twenty iterations using no hidden layers:

Train set: 0.124, test set: 0.098
Train set: 0.374, test set: 0.3841
Train set: 0.532, test set: 0.5148
Train set: 0.7, test set: 0.6469
Train set: 0.746, test set: 0.7732
Train set: 0.786, test set: 0.8
Train set: 0.788, test set: 0.7887
Train set: 0.752, test set: 0.7882
Train set: 0.84, test set: 0.8138
Train set: 0.85, test set: 0.8347
Train set: 0.806, test set: 0.8084
Train set: 0.818, test set: 0.7917
Train set: 0.85, test set: 0.8063
Train set: 0.792, test set: 0.8268
Train set: 0.812, test set: 0.8259
Train set: 0.774, test set: 0.8053
Train set: 0.788, test set: 0.8522
Train set: 0.812, test set: 0.8131
Train set: 0.814, test set: 0.8638
Train set: 0.778, test set: 0.8604

Here is my code:

import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
batch_size = 500

# Create the network structure
# ----------------------------

# First layer
x = tf.placeholder(tf.float32, [None, 784])
W_1 = tf.Variable(tf.zeros([784,10]))
b_1 = tf.Variable(tf.zeros([10]))
y_1 = tf.nn.relu(tf.matmul(x,W_1) + b_1)

# Second layer
W_2 = tf.Variable(tf.zeros([10,10]))
b_2 = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(y_1,W_2) + b_2)

# Loss function
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y), reduction_indices=[1]))

# Training method
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Train network
# -------------
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch, batch_labels = mnist.train.next_batch(batch_size)
for i in range(20):
    print("Train set: " + str(sess.run(accuracy, feed_dict={x: batch, y_true: batch_labels}))
            + ", test set: " + str(sess.run(accuracy, feed_dict={x: mnist.test.images, y_true: mnist.test.labels}))) 
    sess.run(train_step, feed_dict={x: batch, y_true: batch_labels})
    batch, batch_labels = mnist.train.next_batch(batch_size)

So with this code it does not work, but if I change from

y = tf.nn.softmax(tf.matmul(y_1,W_2) + b_2)

to

y = tf.nn.softmax(tf.matmul(x,W_1) + b_1)

then it works. What have I missed?

Edit: Now I have it working. Two changes was needed,first of all initiating the weights to random values instead of zero (yes, actually it was the weights that needed to be not zero, the bias as zero was OK despite the relu function). The second thing is strange to me: If I remove the softmax function from the output layer and instead of manually applying the formula for cross entropy uses the softmax_cross_entropy_with_logits(y,y_true) function then it works. As I understand it that should be the same.. And previously I also tried with the sum of squared errors which didn't work either.. Anyway, the following code is working. (Quite ugly though, but working..) With 10k iterations it gets me 93.59% accuracy on the test set, so not optimal in any way but better than the one with no hidden layer. After only 20 iterations it's already up to 65%.

import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
batch_size = 500

# Create the network structure
# ----------------------------

# First layer
x = tf.placeholder(tf.float32, [None, 784])
W_1 = tf.Variable(tf.truncated_normal([784,10], stddev=0.1))
b_1 = tf.Variable(tf.truncated_normal([10], stddev=0.1))
y_1 = tf.nn.relu(tf.matmul(x,W_1) + b_1)

# Second layer
W_2 = tf.Variable(tf.truncated_normal([10,10], stddev=0.1))
b_2 = tf.Variable(tf.truncated_normal([10], stddev=0.1))
y = tf.matmul(y_1,W_2) + b_2

# Loss function
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y,y_true))

# Training method
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Train network
# -------------
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch, batch_labels = mnist.train.next_batch(batch_size)
for i in range(10000):
    if i % 100 == 0:
        print("Train set: " + str(sess.run(accuracy, feed_dict={x: batch, y_true: batch_labels}))
                + ", test set: " + str(sess.run(accuracy, feed_dict={x: mnist.test.images, y_true: mnist.test.labels}))) 
    sess.run(train_step, feed_dict={x: batch, y_true: batch_labels})
    batch, batch_labels = mnist.train.next_batch(batch_size)

Viewing all articles
Browse latest Browse all 25

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>