Typically for a CNN architecture, in a single filter as described by your number_of_filters parameter, there is one 2D kernel per input channel. There are input_channels * number_of_filters sets of weights, each of which describe a convolution kernel. So the diagrams showing one set of weights per input channel for each filter are correct. More @Wikipedia
Hover over any link to get a description of the article. Please note that search keywords are sometimes hidden within the full article and don't appear in the description or title.