Multilayer feedforward net fails to train in TensorFlow

I started with the TensorFlow tutorial to classify the images in the mnist data set using a single layer feedforward neural net. That works OK, i get 80+ percent on the test set. Then I tried to modify it to a multilayer network by adding a new layer in between. After this modification all my attempts to train the network fails. The first couple of iterations the network becomes a bit better but then it stagnates at 11.35% accuracy.

First twenty iterations using 1 hidden layer:

Train set: 0.124, test set: 0.098
Train set: 0.102, test set: 0.098
Train set: 0.112, test set: 0.101
Train set: 0.104, test set: 0.101
Train set: 0.092, test set: 0.101
Train set: 0.128, test set: 0.1135
Train set: 0.12, test set: 0.1135
Train set: 0.114, test set: 0.1135
Train set: 0.108, test set: 0.1135
Train set: 0.1, test set: 0.1135
Train set: 0.114, test set: 0.1135
Train set: 0.11, test set: 0.1135
Train set: 0.122, test set: 0.1135
Train set: 0.102, test set: 0.1135
Train set: 0.12, test set: 0.1135
Train set: 0.106, test set: 0.1135
Train set: 0.102, test set: 0.1135
Train set: 0.116, test set: 0.1135
Train set: 0.11, test set: 0.1135
Train set: 0.124, test set: 0.1135

It does not matter how long time I train it, it is stuck here. I have tried to change from rectified linear units to softmax, both yields the same result. I have tried to change the fitness function to e=(y_true-y)^2. Same result.

First twenty iterations using no hidden layers:

Train set: 0.124, test set: 0.098
Train set: 0.374, test set: 0.3841
Train set: 0.532, test set: 0.5148
Train set: 0.7, test set: 0.6469
Train set: 0.746, test set: 0.7732
Train set: 0.786, test set: 0.8
Train set: 0.788, test set: 0.7887
Train set: 0.752, test set: 0.7882
Train set: 0.84, test set: 0.8138
Train set: 0.85, test set: 0.8347
Train set: 0.806, test set: 0.8084
Train set: 0.818, test set: 0.7917
Train set: 0.85, test set: 0.8063
Train set: 0.792, test set: 0.8268
Train set: 0.812, test set: 0.8259
Train set: 0.774, test set: 0.8053
Train set: 0.788, test set: 0.8522
Train set: 0.812, test set: 0.8131
Train set: 0.814, test set: 0.8638
Train set: 0.778, test set: 0.8604

Here is my code:

import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
batch_size = 500

# Create the network structure
# ----------------------------

# First layer
x = tf.placeholder(tf.float32, [None, 784])
W_1 = tf.Variable(tf.zeros([784,10]))
b_1 = tf.Variable(tf.zeros([10]))
y_1 = tf.nn.relu(tf.matmul(x,W_1) + b_1)

# Second layer
W_2 = tf.Variable(tf.zeros([10,10]))
b_2 = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(y_1,W_2) + b_2)

# Loss function
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y), reduction_indices=[1]))

# Training method
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Train network
# -------------
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch, batch_labels = mnist.train.next_batch(batch_size)
for i in range(20):
    print("Train set: " + str(sess.run(accuracy, feed_dict={x: batch, y_true: batch_labels}))
            + ", test set: " + str(sess.run(accuracy, feed_dict={x: mnist.test.images, y_true: mnist.test.labels}))) 
    sess.run(train_step, feed_dict={x: batch, y_true: batch_labels})
    batch, batch_labels = mnist.train.next_batch(batch_size)

So with this code it does not work, but if I change from

y = tf.nn.softmax(tf.matmul(y_1,W_2) + b_2)

y = tf.nn.softmax(tf.matmul(x,W_1) + b_1)

then it works. What have I missed?

Edit: Now I have it working. Two changes was needed,first of all initiating the weights to random values instead of zero (yes, actually it was the weights that needed to be not zero, the bias as zero was OK despite the relu function). The second thing is strange to me: If I remove the softmax function from the output layer and instead of manually applying the formula for cross entropy uses the softmax_cross_entropy_with_logits(y,y_true) function then it works. As I understand it that should be the same.. And previously I also tried with the sum of squared errors which didn't work either.. Anyway, the following code is working. (Quite ugly though, but working..) With 10k iterations it gets me 93.59% accuracy on the test set, so not optimal in any way but better than the one with no hidden layer. After only 20 iterations it's already up to 65%.

import numpy as np
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
batch_size = 500

# Create the network structure
# ----------------------------

# First layer
x = tf.placeholder(tf.float32, [None, 784])
W_1 = tf.Variable(tf.truncated_normal([784,10], stddev=0.1))
b_1 = tf.Variable(tf.truncated_normal([10], stddev=0.1))
y_1 = tf.nn.relu(tf.matmul(x,W_1) + b_1)

# Second layer
W_2 = tf.Variable(tf.truncated_normal([10,10], stddev=0.1))
b_2 = tf.Variable(tf.truncated_normal([10], stddev=0.1))
y = tf.matmul(y_1,W_2) + b_2

# Loss function
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y,y_true))

# Training method
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_true,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Train network
# -------------
sess = tf.Session()
sess.run(tf.initialize_all_variables())
batch, batch_labels = mnist.train.next_batch(batch_size)
for i in range(10000):
    if i % 100 == 0:
        print("Train set: " + str(sess.run(accuracy, feed_dict={x: batch, y_true: batch_labels}))
                + ", test set: " + str(sess.run(accuracy, feed_dict={x: mnist.test.images, y_true: mnist.test.labels}))) 
    sess.run(train_step, feed_dict={x: batch, y_true: batch_labels})
    batch, batch_labels = mnist.train.next_batch(batch_size)

Multilayer feedforward net fails to train in TensorFlow

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List