The purpose of vision in this project is to obtain the 3D ball position in the robot frame. The implementation to obtain the ball position from cameras is shown in the diagram below.
Accessing RealSense Camera Stream:
We use the realsense2_camera package, which broadcasts the RGB and depth video streams to 2 separate topics for access.
With a video stream of 30 FPS, we obtain new ball coordinates at the same rate, which is roughly 0.03s between updates.
HSV Thresholding:
We first obtain optimal HSV filtering thresholds by manual tuning until we can only see points in the image associated with the ball
remaining after the thresholding process. Given the RGB image from RealSense, we then first perform HSV thresholding so that only points
of the ball remain, and then derive the 2D ball coordinate in the image by averaging the coordinates of these points.
DBSCAN Clustering:
Since averaging is used to obtain the 2D ball coordinate in the image, which is highly susceptible to noise (outliers),
we utilize a popular clustering technique to first locate the largest cluster in all the points that passed through the HSV thresholding,
and then remove any points that are outside of the main cluster. This leads to overall greater robustness against background noises in the image.
Depth Thresholding:
Since the RealSense Camera we used is only accurate up to 3 meters, we further eliminate background noise by removing any points with a depth above
3 meters. This helps remove the influence of objects further in the background that are not useful for ball detection.
3D Ball Coordinate:
Using the 2D ball coordinate derived from HSV thresholding, we can use the depth image to look up the depth of the ball, and obtain the full
3D ball coordinate in the camera frame.
We attach the AR tag to the back of the RealSense camera, which is in view of the Sawyer robot head camera. We use the
intera_examples package from Rethink to turn on the Sawyer head camera, which broadcasts the video stream to a topic.
We then use the ROS ar_track_alvar package to automatically detect AR tags in view of the Sawyer head camera and
broadcast the 3D coordinate of the AR tag to a topic for access. With this setup, we can obtain the position of the RealSense camera
in the robot frame.
To perform frame transform for the 3D ball coordinate from camera frame to the robot frame, we utilize the ROS tf package. We first initialize a static transform publisher to link the camera nodes and sawyer nodes, which are from two separate trees. We then use the tf.TransformListener to convert the ball coordinate at each frame.