First let me explain what you are trying to do. You are given an image, that is actually split into two. This image has two images (both from a slightly different perspective, aligned next to each other). When we look at everyday objects, we view them from two perspectives (both our eyes).
When viewing these types of images, by crossing your eyes you can make the two blend together. If you do it right, a third image will appear. This third image will be in 3D (technically, stereoscopic 2D).
Here is an example image I made (practice on this)

1. Look at the image. Notice how it is divided into two.
2. Look at the image front on.
3. Cross your eyes slightly, until the two images blur together to make a third image
4. At first this third image will be pretty blurry, however if you keep trying, it will eventually become crystal clear.
When trying to make the third image clearer you must imagine that it is popping off the screen. Imagine you are trying to focus on something that is actually a couple centimeters in front of the screen.
Once you can properly do this, try an image like this:

After a lot of practice, I can basically focus in perfectly on any image in about half a second. See if you can get that good.