I have added multi-GPU support to nnForge! Both training and inferene can be done on multiple GPUs now. Single node only is supported. Training is parallelized with data parallel approach, where mini-batch is split across multiple GPUs.
The framework moved to C++11 now, you will need gcc 4.7 or newer to build the lib, and MS VS 2013 for Windows.
Maxim Milakov
A researcher in machine learning and high-performance computing
Nov 30, 2016
Jul 5, 2016
nnForge v2.2.0
Hi, nnForge v2.2.0 is published!
- Convolutional layer
- strides added
- w/out bias option added
- check_gradient command added
- Imagenet: reproduced ResNet50 result (7.5% Top5 single crop)
- Average subsampling layer allows specifying output size instead of subsampling window sizes
- Added profiling to CUDA backend
- Max subsampling layer:
- round_up mode added
- Strides added
- Step learning rate decay policy added
- Added update_bn_weights action (but calculating mean and invsigma during training works well)
- Spatial Transformer:
- affine_grid_generator_layer added
- linear_sampler layer added
- Utilizing cudnnFindConvolution*AlgorithmEx functions to get maximum perf (cuDNN v5 is required for that)
- Added strides to sparse convolution layer
Feb 21, 2016
nnForge v2.1.0
2 months passed since the last release, this one is pretty big. A number of layers added, existing layers' functionality is extended. Here is the full list of changes in nnForge v2.1.0:
- New layers added: Concat, Reshape, CDFMax, PrefixSum, Upsampling, Add (element-wise), CDF2PDF, EntryConvolution
- Average and Max subsampling layers are now capable of subsampling in feature map and entry directions
- MSE Layer reworked into generic LError layer (L2 by default)
- Max subsampling can do MIN as well
- Optional scale parameter for AverageSubsampling layer added
- Detailed info on layers in the schema dumped
- Dumping graph with layer configs in debug mode
- Added dumping data in CSV format
- Runtime layer replacement with data layers
- Bug fixes
Dec 20, 2015
nnForge v2.0.2
Small release nnForge v2.0.2 here:
- Gradient modifier layer added
- Structured_data_constant_reader added
- Error function layers accept the 3rd optional input layer - mask
- ADAM training algo implemented, use "--momentum_type adam", rate should generally be much smaller than for other methods
- Changed default value for cuda_fixed_working_buffers_ratio to 0.4
I get very nice 5.4 TFLOPS on the whole model when training VGG-A with cuDNN v4 RC.
Nov 24, 2015
nnForge v2.0.1

I significantly improved performance of CUDA backend recently in nnForge v2.0.1:
- Multiple improvements to reduce total buffer sizes, allows running larger chunks (3x for ImageNet):
- Taking buffer sizes into account when coloring graph
- Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
- Action graph is optimized to exclude unnecessary concurrency - taking into account device width here
- Migrated to cuDNN v3
- Reusing CUDA streams
- Allocating chunk of mem for fixed working buffers - improves perf
- Few bug-fixes
See buffer graph coloring for the optimized action graph of VGG-A-like schema to the right. You can get this and other interesting graphs by specifying "--debug_mode 1" option.
Nov 7, 2015
nnForge v2.0.0
Hi all,
6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See nnForge v2.0.0:
6 months passed since last nnForge release and there is a good reason for it: I have been working on a major framework redesign, and now it is out! See nnForge v2.0.0:
- The model is now arbitrary DAG (directed acyclic graph)
- Running independent actions in mutiple streams in CUDA backend
- Memory buffers are heavily reused
The changes are so radical, I had to drop support for the old trained data storage format. Unfortunately this means you will have to re-train your models from scratch.
Expect more goodies in near future!
Apr 30, 2015
nnForge v1.2.0
Hi, this is a pretty big release of nnForge. The most important improvement is that mode schemas are now stored in Protobuf format. You now define the schema via plain text file. Use convert_schema action to convert from old binary format to new one. I also implemented Overfeat functionality - this allows running inference on large input data with fine-frained results efficiently.
All the change are:
All the change are:
- Schema:
- Model schema is now stord in Protobuf format. Use convert_schema to convert schemas in old binary format to new one
- Input and output data normalizers are stored in protobuf format now. Use convert_input_normalizer and convert_output_normalizer to convert existing binary normalizers to new format
- Schema and data are compatible now if non-empty layers match. Now empty-data layers don't matter
- Training data:
- Improvements insupervised_image_stream_reader
- embed_data_transformer added
- Training:
- Nesterov momentum added (see --momentum_type option)
- uniform_intensity_data_transformer added
- Momentum data is kept between epochs (it is save and restored as well)
- ROC result outputs accuracy, precision, recall, and F-score now (in addition to AUC)
- Visualization:
- snapshot_invalid now saves images, including binary classifier case
- Inference:
- Overfeat functionality added (see tiling option of max subsampling layer, and untile layer)
Subscribe to:
Posts (Atom)