Anyone want to discuss my new idea for human-level image recognition? It's a 
small algorithm I'll code in the next months, but  still faster to discuss what 
could go wrong, because any code takes time to implement, especially for me as 
I build from scratch too. So let's take a base example. We have a short song or 
image of a stop sign to recognize. We only see one example of it, very little 
data to train on here. Now the task, we are given many dummy candidates, and 
must identify which is the song or stop sign. The one we want though is not the 
same, the stop sign or music is stretched longer / wider, it is brighter / 
louder, it is higher pitch / color, other forms of location stretching is 
noise/ blur, rotation, flipped backwards, and may even be missing parts. A 
human can recognize Jingle Bell Rock if it is stretched slower, higher pitch, 
and louder. My solution to this advanced recognition ability is a very simple 
solution. We take the pixels of the image or music we were shown to search for, 
and when compared to a given sighting of a possible suspect input, we see a 
given pixel is similar in brightness to one we want, but it is much brighter, 
while the other pixels that would be needed are just as off from the expected 
brightness - they are all brighter by the same or similar amount, so the image 
is very similar. If each pixel were non-relatively brighter, some darker, the 
image could be a frog and still be less bright globally summed up, so this is 
proof relatively works. Now for location, same, we see 2 pixels in the input we 
would want have a similar expected distance, it is a bit far off, but so are 
the others, so it is not that bad, it is a stretched stop sign or song. Same 
for color. We do this for each layer in the toy hierarchy which stores the 
exact image pixel by pixel. The idea in conclusion is there is lots of 
distortion yet still very recognizable, this is because, relatively, all pixels 
or groups are very relatively similar as expected, there really is little 
distortion of the object, a line may be rotated 90 degrees but so are all 
others, so one has error in expectation but so do all others. So we don't 
sanction it for each line, only once then.

The way I got to this idea was, in my original huge design for AGI, for text, 
my method of matching was ex. see my image above of my toy hierarchy above, it 
stores the word hello let's say, and if input enters lleh then it can't 
activate 100%! The 'o' is missing. And, the order is only similar, so the node 
is upset, not activated as much as could. I did not realize though until I 
looked at image recognition that there was another secret as said above, the 
relationship between parts's error in expectation. So if we have 
hzzzezzzlzzzlzzzo (hello), it is yes not in close time as was stored, but the 
error is only once sanctioned on it because the error is 3 letters to wait for 
but this is repeated across the word hello.
So the new trick for text then is, don't sanction it so bad, relieve it, the 
error may be same across it.
I.e. after the first 3 z's (zzz), it is upset, but the next zzz and so on it 
isn't getting so much extra upset, it is used to it...
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T759eb6f9d5c84273-Mba148c9b893eecd11554c7a1
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to