Maxim Milakov

Nov 30, 2016

nnForge v2.3.0

I have added multi-GPU support to nnForge! Both training and inferene can be done on multiple GPUs now. Single node only is supported. Training is parallelized with data parallel approach, where mini-batch is split across multiple GPUs.
The framework moved to C++11 now, you will need gcc 4.7 or newer to build the lib, and MS VS 2013 for Windows.

Jul 5, 2016

nnForge v2.2.0

Hi, nnForge v2.2.0 is published!

Convolutional layer

strides added
w/out bias option added

check_gradient command added
Imagenet: reproduced ResNet50 result (7.5% Top5 single crop)
Average subsampling layer allows specifying output size instead of subsampling window sizes
Added profiling to CUDA backend
Max subsampling layer:

round_up mode added
Strides added

Step learning rate decay policy added
Added update_bn_weights action (but calculating mean and invsigma during training works well)
Spatial Transformer:

affine_grid_generator_layer added
linear_sampler layer added

Utilizing cudnnFindConvolution*AlgorithmEx functions to get maximum perf (cuDNN v5 is required for that)
Added strides to sparse convolution layer

Feb 21, 2016

nnForge v2.1.0

2 months passed since the last release, this one is pretty big. A number of layers added, existing layers' functionality is extended. Here is the full list of changes in nnForge v2.1.0:

New layers added: Concat, Reshape, CDFMax, PrefixSum, Upsampling, Add (element-wise), CDF2PDF, EntryConvolution
Average and Max subsampling layers are now capable of subsampling in feature map and entry directions
MSE Layer reworked into generic LError layer (L2 by default)
Max subsampling can do MIN as well
Optional scale parameter for AverageSubsampling layer added
Detailed info on layers in the schema dumped
Dumping graph with layer configs in debug mode
Added dumping data in CSV format
Runtime layer replacement with data layers
Bug fixes

Dec 20, 2015

nnForge v2.0.2

Small release nnForge v2.0.2 here:

Gradient modifier layer added
Structured_data_constant_reader added
Error function layers accept the 3rd optional input layer - mask
ADAM training algo implemented, use "--momentum_type adam", rate should generally be much smaller than for other methods
Changed default value for cuda_fixed_working_buffers_ratio to 0.4

I get very nice 5.4 TFLOPS on the whole model when training VGG-A with cuDNN v4 RC.

Nov 24, 2015

nnForge v2.0.1

Hi,

I significantly improved performance of CUDA backend recently in nnForge v2.0.1:

Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):

Taking buffer sizes into account when coloring graph
Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
Action graph is optimized to exclude unnecessary concurrency - taking into account device width here

Migrated to cuDNN v3
Reusing CUDA streams
Allocating chunk of mem for fixed working buffers - improves perf
Few bug-fixes

See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.

Nov 7, 2015

nnForge v2.0.0

Hi all,

6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See nnForge v2.0.0:

The model is now arbitrary DAG (directed acyclic graph)
Running independent actions in mutiple streams in CUDA backend
Memory buffers are heavily reused

The changes are so radical, I had to drop support for the old trained data storage format. Unfortunately this means you will have to re-train your models from scratch.

Expect more goodies in near future!

Apr 30, 2015

nnForge v1.2.0

Hi, this is a pretty big release of nnForge. The most important improvement is that mode schemas are now stored in Protobuf format. You now define the schema via plain text file. Use convert_schema action to convert from old binary format to new one. I also implemented Overfeat functionality - this allows running inference on large input data with fine-frained results efficiently.

All the change are:

Schema:

Model schema is now stord in Protobuf format. Use convert_schema to convert schemas in old binary format to new one

Input and output data normalizers are stored in protobuf format now. Use convert_input_normalizer and convert_output_normalizer to convert existing binary normalizers to new format
Schema and data are compatible now if non-empty layers match. Now empty-data layers don't matter

Training data:

Improvements insupervised_image_stream_reader
embed_data_transformer added

Training:

Nesterov momentum added (see --momentum_type option)
uniform_intensity_data_transformer added
Momentum data is kept between epochs (it is save and restored as well)
ROC result outputs accuracy, precision, recall, and F-score now (in addition to AUC)

Visualization:

snapshot_invalid now saves images, including binary classifier case

Inference:

Overfeat functionality added (see tiling option of max subsampling layer, and untile layer)

Pages