资讯专栏INFORMATION COLUMN

TensorFlow 安装笔记

617035918 / 2400人阅读

摘要:而且我们可以看到他自动帮我们安装了,,等等需要注意的是最后会出现这里选择才能把加入环境变量中,然后才能使用不然之后就得手动配置。来安装支持的。步骤中下载太慢了,需要个小时,还是直接在线安装吧,先下载这个,然后这个只需要分钟左右。

前言

最近上了几门深度学习的公开课,还是觉得不过瘾,总觉得要搞一个框架来试试。那么caffe,tensorflow,torch等等选哪一个呢?经过一番比较我还是选择tensorflow,首先他是一个更通用的框架,而且对python支持最好,其次还有google支持,也是开源的,相信在未来无论是学术界还是工业界,他都会流行起来的。

安装-实况记录

首先得在我的电脑(win10)上装一个双系统(不装虚拟机是因为虚拟机对显卡等资源的利用不是很好),就装一个ubuntu吧(版本14.10),怎么装就不写了,毕竟网上一大把,然后就是安装tensorflow了,官网提供了5种安装办法,基于pip,基于docker,基于Anaconda,基于Virtualenv,基于源码。由于Anaconda包含了众多的科学计算库,相信对未来的工作能大有用处,所以我就选择了基于Anaconda的安装方式。

1.首先在这里选择相应的Anaconda版本下载。

2.进入下载目录,输入命令 bash Anaconda2-4.1.1-Linux-x86_64.sh

然后根据提示进行安装,他会提示安装目录等。而且我们可以看到他自动帮我们安装了python2.7.12,beautifulsoup,ipython等等:

installing: python-2.7.12-1 ...
installing: _nb_ext_conf-0.2.0-py27_0 ...
installing: alabaster-0.7.8-py27_0 ...
installing: anaconda-client-1.4.0-py27_0 ...
installing: anaconda-navigator-1.2.1-py27_0 ...
installing: argcomplete-1.0.0-py27_1 ...
installing: astropy-1.2.1-np111py27_0 ...
installing: babel-2.3.3-py27_0 ...
installing: backports-1.0-py27_0 ...
installing: backports_abc-0.4-py27_0 ...
installing: beautifulsoup4-4.4.1-py27_0 ...

需要注意的是最后会出现:

Do you wish the installer to prepend the Anaconda2 install location
to PATH in your /root/.bashrc ? [yes|no]

这里选择yes才能把anaconda加入环境变量(path)中,然后才能使用,不然之后就得手动配置path。由于修改了环境变量,所以打开一个新的终端来测试安装结果:在新的终端中输入python,显示:

Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:42:40) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

可见的确是安装成功了。

3.conda create -n tensorflow python=2.7 来建立一个conda 计算环境

4.source activate tensorflow 来激活计算环境。

5.pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0rc0-cp27-none-linux_x86_64.whl 来安装支持GPU的tensorflow。

需要注意,支持GPU要先安装Cuda Toolkit 和 CUDNN Toolkit(先在官网注册)

6.安装成功后打开python,

import tensorflow as tf

然后报了一堆错:

Traceback (most recent call last):
  File "", line 1, in 
  File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in 
    from tensorflow.python import *
  File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 45, in 
    from tensorflow.python import pywrap_tensorflow
  File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in 
    _pywrap_tensorflow = swig_import_helper()
  File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module("_pywrap_tensorflow", fp, pathname, description)
ImportError: libcudart.so.7.5: cannot open shared object file: No such file or directory

看样子是我还没有安装好cuda所致。步骤5中下载Cuda Toolkit 太慢了,需要10个小时,还是直接在线安装吧,先下载这个,然后

dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb 
apt-get update
apt-get install cuda 

这个只需要20分钟左右。安装好过后cuda应该就在/usr/local/路径下了。然后安装CUDNN Toolkit,进入其下载目录:

tar xvzf cudnn-7.0-linux-x64-v3.0-prod.tgz
cp cuda/include/cudnn.h  /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

然后设置 LD_LIBRARY_PATH 和 CUDA_HOME 环境变量. 可以将下面的命令 添加到 ~/.bashrc文件中, 这样每次登陆后自动生效:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

7.测试

测试之时发现依然报上面的错。libcudart.so.7.5没找到,我先在磁盘上查找这个文件,locate libcudart.so.7.5,果然没有,应该是我的cuda版本低了吧,cd /usr/local/cuda/lib64,然后果然发现了libcudart.so.7.0.28,而不是 libcudart.so.7.5

8.重装Cuda Toolkit

apt-get remove cuda
apt-get autoremove
#下载http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
apt-get remove cuda-repo-ubuntu1410
dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb#正试图覆盖 /etc/apt/sources.list.d/cuda.list,它同时被包含于软件包 cuda-repo-ubuntu1410 7.0-28,所以必须要上一步
apt-get update
sudo apt-get install cuda
#报错:cuda : 依赖: cuda-7-5 (= 7.5-18) 但是它将不会被安装 
#E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。

太乱了,还是重头来过吧

同上

同上

conda create -n tensor python=2.7

source activate tensor

安装Cuda Toolkit,先下载,进入目录:

dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
apt-get update
apt-get install cuda
#报错:cuda : 依赖: cuda-7-5 (= 7.5-18) 但是它将不会被安装 
#E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。
#也是醉了

装错了版本真是麻烦,清理一下系统吧

apt-get --purge remove nvidia-*  #彻底卸载nvidia
rm -rf anaconda2
# .bashrc文件中删除关于把anaconda加入环境变量的那一句
#还是不行,依旧报错:cuda : 依赖: cuda-7-5 (= 7.5-18) 但是它将不会被安装 
#E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。

搞不定了,还是换成本地安装试试吧,下载cuda 和 cudnn。奇怪:ubuntu下载很慢,但是windows上就快好多了,在windows上下好直接在ubuntu中拷贝过去吧。

安装-无bug版 1.

由于包依赖问题没法解决,重装了系统Ubuntu14.04.5

2.

下载cuda 和cudnn,进入下载目录

dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
#稍等片刻,然后配置cudnn
tar xvzf cudnn-7.5-linux-x64-v5.0-ga-tgz
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
3.

修改 .bashrc 加入:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
4.

下载Anaconda,进入下载目录

bash Anaconda2-4.1.1-Linux-x86_64.sh
注意修改配置,根据你的喜好来修改目录
5.

重新打开一个终端

conda create -n tfgpu python=2.7
source activate tfgpu
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl
6.

装好过后,重启,黑屏了。应该是双显卡的问题,不管了,先进入tty试试tensorflow是否装好了。

Ctrl+Alt+F2#进入tty2,并登陆
root@mageek-ThinkPad-T550:~# source activate tfgpu
(tfgpu) root@mageek-ThinkPad-T550:~# python
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> sess = tf.Session()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)
>>> 
(tfgpu) root@mageek-ThinkPad-T550:~# source deactivate
可见是安装成功了
7. 解决黑屏
vim /etc/modprobe.d/blacklist.conf
#添加如下几句来屏蔽一些软件
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
#退出
sudo prime-select intel #优先intel集显
reboot#重启就进入图像化界面了
8. IPython

这个时候直接用ipython 可以进入界面,但是没法import tensorflow,要先安装conda install ipython然后再次进入ipython,就可以了,因为只有执行了这个命令才能将ipython加入虚拟环境tfgpu,在同一个环境中ipython才能找到tensorflow。

9. IDE

虽然IPython已经比原生的python终端好多了,但是每次都要敲相同命令,比如import tensorflow as tf还是相当麻烦的,所以还是要搞一个IDE才行。这里推荐Komodo Edit,下载过后,解压。进入目录运行 ./install.sh 然后按照提示修改安装目录(注意要有权限)。比如我的目录就是 /usr/local/Komodo-Edit-10/ 然后加入环境变量。这样就可以重新打开一个终端,命令 komodo,就可以打开这个IDE了,然后配置一些基本的选项比如缩进,配色方案等等就可以正式使用了。

新建一个 tf1.py:

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will "run" this first.
init = tf.initialize_all_variables()

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
for step in range(201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(W), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]

运行:

#进入文件目录
source activate tfgpu
python tf1.py

结果:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)
(0, array([-0.09839484], dtype=float32), array([ 0.5272761], dtype=float32))
(20, array([ 0.02831561], dtype=float32), array([ 0.33592272], dtype=float32))
(40, array([ 0.07941294], dtype=float32), array([ 0.31031665], dtype=float32))
(60, array([ 0.09408762], dtype=float32), array([ 0.30296284], dtype=float32))
(80, array([ 0.09830203], dtype=float32), array([ 0.3008509], dtype=float32))
(100, array([ 0.09951238], dtype=float32), array([ 0.30024436], dtype=float32))
(120, array([ 0.09985995], dtype=float32), array([ 0.3000702], dtype=float32))
(140, array([ 0.09995978], dtype=float32), array([ 0.30002016], dtype=float32))
(160, array([ 0.09998845], dtype=float32), array([ 0.30000579], dtype=float32))
(180, array([ 0.09999669], dtype=float32), array([ 0.30000168], dtype=float32))
(200, array([ 0.09999905], dtype=float32), array([ 0.30000049], dtype=float32))
10.NN
#找到tensorflow的目录
python -c "import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))"
#/root/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow
cd /root/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow/models/image/mnist/#j进入目录
python convolutional.py
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library    libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Traceback (most recent call last):
  File "convolutional.py", line 326, in 
    tf.app.run()
  File "/root/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "convolutional.py", line 138, in main
    train_data = extract_data(train_data_filename, 60000)
  File "convolutional.py", line 85, in extract_data
    buf = bytestream.read(IMAGE_SIZE * IMAGE_SIZE * num_images * NUM_CHANNELS)
  File "/root/anaconda2/envs/tfgpu/lib/python2.7/gzip.py", line 268, in read
    self._read(readsize)
  File "/root/anaconda2/envs/tfgpu/lib/python2.7/gzip.py", line 315, in _read
    self._read_eof()
  File "/root/anaconda2/envs/tfgpu/lib/python2.7/gzip.py", line 354, in _read_eof
    hex(self.crc)))
IOError: CRC check failed 0x4b01c89e != 0xd2b9b600L

看来是CRC校验出错,还是直接去官网下载吧,然后直接拷贝到data路径中。读一下convolutional.py就知道下载路径了,其实比较一下data里程序已经下载的文件和官网的文件就知道程序下载的文件出错了,文件小了不少,应该是丢包了。
再次执行:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)
Initialized!
E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 4007 (compatibility version 4000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

意思就是cudnn我安装的是v5,但是cuda7.5支持的是v4,所以就去下载v4,然后按照步骤2来重新配置cudnnv4:

#这里会覆盖cudnnv5,所以记得备份cudnnv5,万一用得上,我把原来解压的cuda改为cudnn5005
cd /usr/local/cuda/lib64
rm -f libcudnn* #删掉cudnnv5
#先进入cudnnv4下载目录
tar xvzf cudnn-7.0-linux-x64-v4.0-prod.tgz
cp cuda/include/cudnn.h /usr/local/cuda/include#用v4覆盖v5
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64#加入v4
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

再次执行:

cd /root/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow/models/image/mnist/#j进入目录
python convolutional.py

结果:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: mageek-ThinkPad-T550
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: mageek-ThinkPad-T550
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.63  Sat Nov  7 21:25:42 PST 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.63.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 352.63.0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.
Initialized!
Step 0 (epoch 0.00), 5.4 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 280.2 ms
Minibatch loss: 3.287, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
Step 200 (epoch 0.23), 281.0 ms
Minibatch loss: 3.491, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 3.6%
Step 300 (epoch 0.35), 281.0 ms
Minibatch loss: 3.265, learning rate: 0.010000
Minibatch error: 10.9%
Validation error: 3.2%
Step 400 (epoch 0.47), 293.0 ms
Minibatch loss: 3.221, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 2.7%
Step 500 (epoch 0.58), 289.0 ms
Minibatch loss: 3.292, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 2.7%
Step 600 (epoch 0.70), 287.4 ms
Minibatch loss: 3.227, learning rate: 0.010000
Minibatch error: 7.8%
Validation error: 2.6%
Step 700 (epoch 0.81), 287.0 ms
Minibatch loss: 3.015, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.4%
Step 800 (epoch 0.93), 287.0 ms
Minibatch loss: 3.152, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.0%
Step 900 (epoch 1.05), 287.7 ms
Minibatch loss: 2.938, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.6%
Step 1000 (epoch 1.16), 287.4 ms
Minibatch loss: 2.862, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.7%
.
.
.

可见程序是跑起来了,但是没有找到GPU,

reboot
#.....
source activate tfgpu
cd /root/anaconda2/envs/tfgpu/lib/python2.7/site-packages/tensorflow/models/image/mnist/#j进入目录
python convolutional.py

结果:

I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce 940M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1023.88MiB
Free memory: 997.54MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:839] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940M, pci bus id: 0000:08:00.0)
Initialized!
Step 0 (epoch 0.00), 81.3 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 44.4 ms
Minibatch loss: 3.291, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.1%
Step 200 (epoch 0.23), 44.4 ms
Minibatch loss: 3.462, learning rate: 0.010000
Minibatch error: 12.5%
Validation error: 3.6%
Step 300 (epoch 0.35), 44.0 ms
Minibatch loss: 3.188, learning rate: 0.010000
Minibatch error: 4.7%
Validation error: 3.2%
Step 400 (epoch 0.47), 44.3 ms
Minibatch loss: 3.253, learning rate: 0.010000
Minibatch error: 9.4%
Validation error: 2.8%
Step 500 (epoch 0.58), 44.3 ms
Minibatch loss: 3.288, learning rate: 0.010000
Minibatch error: 9.4%
Validation error: 2.5%
Step 600 (epoch 0.70), 43.9 ms
Minibatch loss: 3.180, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.8%
Step 700 (epoch 0.81), 44.2 ms
Minibatch loss: 3.033, learning rate: 0.010000
Minibatch error: 3.1%
Validation error: 2.4%
Step 800 (epoch 0.93), 44.0 ms
Minibatch loss: 3.149, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 2.0%
Step 900 (epoch 1.05), 44.0 ms
Minibatch loss: 2.919, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.6%
Step 1000 (epoch 1.16), 43.8 ms
Minibatch loss: 2.849, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.7%
Step 1100 (epoch 1.28), 43.6 ms
Minibatch loss: 2.822, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.6%
Step 1200 (epoch 1.40), 43.6 ms
Minibatch loss: 2.979, learning rate: 0.009500
Minibatch error: 7.8%
Validation error: 1.5%
Step 1300 (epoch 1.51), 43.6 ms
Minibatch loss: 2.763, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.9%
Step 1400 (epoch 1.63), 43.6 ms
Minibatch loss: 2.781, learning rate: 0.009500
Minibatch error: 3.1%
Validation error: 1.5%
Step 1500 (epoch 1.75), 43.6 ms
Minibatch loss: 2.861, learning rate: 0.009500
Minibatch error: 6.2%
Validation error: 1.4%
Step 1600 (epoch 1.86), 43.8 ms
Minibatch loss: 2.698, learning rate: 0.009500
Minibatch error: 1.6%
Validation error: 1.3%
Step 1700 (epoch 1.98), 43.9 ms
Minibatch loss: 2.650, learning rate: 0.009500
Minibatch error: 0.0%
Validation error: 1.3%
Step 1800 (epoch 2.09), 44.1 ms
Minibatch loss: 2.652, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.3%
Step 1900 (epoch 2.21), 44.1 ms
Minibatch loss: 2.655, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.3%
Step 2000 (epoch 2.33), 43.9 ms
Minibatch loss: 2.640, learning rate: 0.009025
Minibatch error: 3.1%
Validation error: 1.2%
Step 2100 (epoch 2.44), 44.0 ms
Minibatch loss: 2.568, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2200 (epoch 2.56), 44.0 ms
Minibatch loss: 2.564, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.1%
Step 2300 (epoch 2.68), 44.2 ms
Minibatch loss: 2.561, learning rate: 0.009025
Minibatch error: 1.6%
Validation error: 1.2%
Step 2400 (epoch 2.79), 44.2 ms
Minibatch loss: 2.500, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.3%
Step 2500 (epoch 2.91), 44.0 ms
Minibatch loss: 2.471, learning rate: 0.009025
Minibatch error: 0.0%
Validation error: 1.2%
Step 2600 (epoch 3.03), 43.8 ms
Minibatch loss: 2.451, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.2%
Step 2700 (epoch 3.14), 43.6 ms
Minibatch loss: 2.483, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 2800 (epoch 3.26), 43.7 ms
Minibatch loss: 2.426, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 2900 (epoch 3.37), 44.3 ms
Minibatch loss: 2.449, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 1.1%
Step 3000 (epoch 3.49), 43.9 ms
Minibatch loss: 2.395, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.0%
Step 3100 (epoch 3.61), 44.1 ms
Minibatch loss: 2.390, learning rate: 0.008574
Minibatch error: 3.1%
Validation error: 1.0%
Step 3200 (epoch 3.72), 43.6 ms
Minibatch loss: 2.330, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.1%
Step 3300 (epoch 3.84), 43.8 ms
Minibatch loss: 2.319, learning rate: 0.008574
Minibatch error: 1.6%
Validation error: 1.1%
Step 3400 (epoch 3.96), 44.4 ms
Minibatch loss: 2.296, learning rate: 0.008574
Minibatch error: 0.0%
Validation error: 1.0%
Step 3500 (epoch 4.07), 44.4 ms
Minibatch loss: 2.273, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3600 (epoch 4.19), 44.2 ms
Minibatch loss: 2.253, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 0.9%
Step 3700 (epoch 4.31), 44.4 ms
Minibatch loss: 2.237, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.0%
Step 3800 (epoch 4.42), 43.8 ms
Minibatch loss: 2.234, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 0.9%
Step 3900 (epoch 4.54), 43.9 ms
Minibatch loss: 2.325, learning rate: 0.008145
Minibatch error: 3.1%
Validation error: 0.9%
Step 4000 (epoch 4.65), 43.6 ms
Minibatch loss: 2.215, learning rate: 0.008145
Minibatch error: 0.0%
Validation error: 1.1%
Step 4100 (epoch 4.77), 43.6 ms
Minibatch loss: 2.209, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 1.0%
Step 4200 (epoch 4.89), 43.6 ms
Minibatch loss: 2.242, learning rate: 0.008145
Minibatch error: 1.6%
Validation error: 1.0%
Step 4300 (epoch 5.00), 43.5 ms
Minibatch loss: 2.188, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 0.9%
Step 4400 (epoch 5.12), 43.5 ms
Minibatch loss: 2.155, learning rate: 0.007738
Minibatch error: 3.1%
Validation error: 1.0%
Step 4500 (epoch 5.24), 43.5 ms
Minibatch loss: 2.164, learning rate: 0.007738
Minibatch error: 4.7%
Validation error: 0.9%
Step 4600 (epoch 5.35), 43.5 ms
Minibatch loss: 2.095, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 0.9%
Step 4700 (epoch 5.47), 43.6 ms
Minibatch loss: 2.062, learning rate: 0.007738
Minibatch error: 0.0%
Validation error: 0.9%
Step 4800 (epoch 5.59), 43.6 ms
Minibatch loss: 2.068, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 4900 (epoch 5.70), 43.6 ms
Minibatch loss: 2.062, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 1.0%
Step 5000 (epoch 5.82), 43.5 ms
Minibatch loss: 2.148, learning rate: 0.007738
Minibatch error: 3.1%
Validation error: 1.0%
Step 5100 (epoch 5.93), 43.5 ms
Minibatch loss: 2.017, learning rate: 0.007738
Minibatch error: 1.6%
Validation error: 0.9%
Step 5200 (epoch 6.05), 43.5 ms
Minibatch loss: 2.074, learning rate: 0.007351
Minibatch error: 3.1%
Validation error: 1.0%
Step 5300 (epoch 6.17), 43.6 ms
Minibatch loss: 1.983, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.1%
Step 5400 (epoch 6.28), 43.6 ms
Minibatch loss: 1.957, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 5500 (epoch 6.40), 43.5 ms
Minibatch loss: 1.955, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5600 (epoch 6.52), 43.5 ms
Minibatch loss: 1.926, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 5700 (epoch 6.63), 43.5 ms
Minibatch loss: 1.914, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.0%
Step 5800 (epoch 6.75), 43.6 ms
Minibatch loss: 1.897, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.9%
Step 5900 (epoch 6.87), 43.5 ms
Minibatch loss: 1.887, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 0.8%
Step 6000 (epoch 6.98), 43.6 ms
Minibatch loss: 1.878, learning rate: 0.007351
Minibatch error: 0.0%
Validation error: 1.0%
Step 6100 (epoch 7.10), 43.5 ms
Minibatch loss: 1.859, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6200 (epoch 7.21), 43.6 ms
Minibatch loss: 1.844, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6300 (epoch 7.33), 43.6 ms
Minibatch loss: 1.850, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.9%
Step 6400 (epoch 7.45), 43.6 ms
Minibatch loss: 1.916, learning rate: 0.006983
Minibatch error: 3.1%
Validation error: 0.8%
Step 6500 (epoch 7.56), 43.6 ms
Minibatch loss: 1.808, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6600 (epoch 7.68), 43.5 ms
Minibatch loss: 1.839, learning rate: 0.006983
Minibatch error: 1.6%
Validation error: 0.9%
Step 6700 (epoch 7.80), 43.6 ms
Minibatch loss: 1.781, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6800 (epoch 7.91), 43.6 ms
Minibatch loss: 1.773, learning rate: 0.006983
Minibatch error: 0.0%
Validation error: 0.8%
Step 6900 (epoch 8.03), 43.5 ms
Minibatch loss: 1.762, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7000 (epoch 8.15), 43.5 ms
Minibatch loss: 1.797, learning rate: 0.006634
Minibatch error: 1.6%
Validation error: 0.9%
Step 7100 (epoch 8.26), 43.5 ms
Minibatch loss: 1.741, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7200 (epoch 8.38), 43.5 ms
Minibatch loss: 1.744, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7300 (epoch 8.49), 43.6 ms
Minibatch loss: 1.726, learning rate: 0.006634
Minibatch error: 1.6%
Validation error: 0.8%
Step 7400 (epoch 8.61), 43.5 ms
Minibatch loss: 1.704, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7500 (epoch 8.73), 43.6 ms
Minibatch loss: 1.695, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.8%
Step 7600 (epoch 8.84), 43.5 ms
Minibatch loss: 1.808, learning rate: 0.006634
Minibatch error: 3.1%
Validation error: 0.8%
Step 7700 (epoch 8.96), 43.6 ms
Minibatch loss: 1.667, learning rate: 0.006634
Minibatch error: 0.0%
Validation error: 0.9%
Step 7800 (epoch 9.08), 43.5 ms
Minibatch loss: 1.660, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 7900 (epoch 9.19), 43.6 ms
Minibatch loss: 1.649, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
Step 8000 (epoch 9.31), 43.5 ms
Minibatch loss: 1.666, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8100 (epoch 9.43), 43.6 ms
Minibatch loss: 1.626, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8200 (epoch 9.54), 43.5 ms
Minibatch loss: 1.633, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Step 8300 (epoch 9.66), 43.6 ms
Minibatch loss: 1.616, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8400 (epoch 9.77), 43.6 ms
Minibatch loss: 1.597, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
Step 8500 (epoch 9.89), 43.5 ms
Minibatch loss: 1.612, learning rate: 0.006302
Minibatch error: 1.6%
Validation error: 0.8%
Test error: 0.8%

Finally Dode!!!

总结

来来回回折腾了4天。教训就是一定要根据官网一步一步来,因为不同版本兼容性不行,所以不要随意下载其他版本,同时要仔细分析报出的错误,再采取下一步行动。

欢迎访问我的主页(http://mageek.cn/)

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/18145.html

相关文章

  • tensorflow学习笔记1——mac开发环境配置

    摘要:模块中包含着大量的语料库,可以很方便地完成很多自然语言处理的任务,包括分词词性标注命名实体识别及句法分析。导入工具包,下载数据源。在终端输入是第一被添加到核心中的高级别框架,成为的默认。至此开发环境配置完毕 1. mac电脑推荐配置 内存:8G+cpu:i5+硬盘:SSD 128G+ 本人的电脑配置是cpu:i7, 内存:16G,硬盘:SSD 256G 2. mac开发环境配置 1.1...

    Muninn 评论0 收藏0
  • 学习笔记TF064:TensorFlow Kubernetes

    摘要:在参数服务器容器执行在计算服务器容器执行把需要执行的源代码入训练数据测试数据放在持久卷,在多个间共享,避免在每一个分别部署。 AlphaGo,每个实验1000个节点,每个节点4个GPU,4000 GPU。Siri,每个实验2个节点,8个GPU。AI研究,依赖海量数据计算,离性能计算资源。更大集群运行模型,把周级训练时间缩短到天级小时级。Kubernetes,应用最广泛容器集群管理工具,...

    jayce 评论0 收藏0
  • 学习笔记TF064:TensorFlow Kubernetes

    摘要:在参数服务器容器执行在计算服务器容器执行把需要执行的源代码入训练数据测试数据放在持久卷,在多个间共享,避免在每一个分别部署。 AlphaGo,每个实验1000个节点,每个节点4个GPU,4000 GPU。Siri,每个实验2个节点,8个GPU。AI研究,依赖海量数据计算,离性能计算资源。更大集群运行模型,把周级训练时间缩短到天级小时级。Kubernetes,应用最广泛容器集群管理工具,...

    marek 评论0 收藏0
  • 学习笔记TF064:TensorFlow Kubernetes

    摘要:在参数服务器容器执行在计算服务器容器执行把需要执行的源代码入训练数据测试数据放在持久卷,在多个间共享,避免在每一个分别部署。 AlphaGo,每个实验1000个节点,每个节点4个GPU,4000 GPU。Siri,每个实验2个节点,8个GPU。AI研究,依赖海量数据计算,离性能计算资源。更大集群运行模型,把周级训练时间缩短到天级小时级。Kubernetes,应用最广泛容器集群管理工具,...

    lewif 评论0 收藏0
  • 深度学习

    摘要:深度学习在过去的几年里取得了许多惊人的成果,均与息息相关。机器学习进阶笔记之一安装与入门是基于进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域。零基础入门深度学习长短时记忆网络。 多图|入门必看:万字长文带你轻松了解LSTM全貌 作者 | Edwin Chen编译 | AI100第一次接触长短期记忆神经网络(LSTM)时,我惊呆了。原来,LSTM是神...

    Vultr 评论0 收藏0
  • 深度学习

    摘要:深度学习在过去的几年里取得了许多惊人的成果,均与息息相关。机器学习进阶笔记之一安装与入门是基于进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域。零基础入门深度学习长短时记忆网络。 多图|入门必看:万字长文带你轻松了解LSTM全貌 作者 | Edwin Chen编译 | AI100第一次接触长短期记忆神经网络(LSTM)时,我惊呆了。原来,LSTM是神...

    cncoder 评论0 收藏0

发表评论

0条评论

最新活动
阅读需要支付1元查看
<