Anyone want to discuss my new idea for human-level image recognition? It's a small algorithm I'll code in the next months, but still faster to discuss what could go wrong, because any code takes time to implement, especially for me as I build from scratch too. So let's take a base example. We have a short song or image of a stop sign to recognize. We only see one example of it, very little data to train on here. Now the task, we are given many dummy candidates, and must identify which is the song or stop sign. The one we want though is not the same, the stop sign or music is stretched longer / wider, it is brighter / louder, it is higher pitch / color, other forms of location stretching is noise/ blur, rotation, flipped backwards, and may even be missing parts. A human can recognize Jingle Bell Rock if it is stretched slower, higher pitch, and louder. My solution to this advanced recognition ability is a very simple solution. We take the pixels of the image or music we were shown to search for, and when compared to a given sighting of a possible suspect input, we see a given pixel is similar in brightness to one we want, but it is much brighter, while the other pixels that would be needed are just as off from the expected brightness - they are all brighter by the same or similar amount, so the image is very similar. If each pixel were non-relatively brighter, some darker, the image could be a frog and still be less bright globally summed up, so this is proof relatively works. Now for location, same, we see 2 pixels in the input we would want have a similar expected distance, it is a bit far off, but so are the others, so it is not that bad, it is a stretched stop sign or song. Same for color. We do this for each layer in the toy hierarchy which stores the exact image pixel by pixel. The idea in conclusion is there is lots of distortion yet still very recognizable, this is because, relatively, all pixels or groups are very relatively similar as expected, there really is little distortion of the object, a line may be rotated 90 degrees but so are all others, so one has error in expectation but so do all others. So we don't sanction it for each line, only once then.
The way I got to this idea was, in my original huge design for AGI, for text, my method of matching was ex. see my image above of my toy hierarchy above, it stores the word hello let's say, and if input enters lleh then it can't activate 100%! The 'o' is missing. And, the order is only similar, so the node is upset, not activated as much as could. I did not realize though until I looked at image recognition that there was another secret as said above, the relationship between parts's error in expectation. So if we have hzzzezzzlzzzlzzzo (hello), it is yes not in close time as was stored, but the error is only once sanctioned on it because the error is 3 letters to wait for but this is repeated across the word hello. So the new trick for text then is, don't sanction it so bad, relieve it, the error may be same across it. I.e. after the first 3 z's (zzz), it is upset, but the next zzz and so on it isn't getting so much extra upset, it is used to it... ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T759eb6f9d5c84273-Mba148c9b893eecd11554c7a1 Delivery options: https://agi.topicbox.com/groups/agi/subscription
