Thomas Yokota

Thomas Yokota

APTOS 2019 Blindness Detection

Wrap up: APTOS 2019 Blindness Detection

5 minutes read


Problem Definition

The host of APTOS 2019 Blindness Detection, Aravind Eye Hospital, asked participants to develop a machine learning model to speed up diabetic retinopathy detection. Participants were provided fundus photography and labeled data based on clinician ratings for the severity of diabetic retinopathy. The competition was evaluated using the quadratic weighted kappa.


My approach was to borrow heavily from existing methods as this was my first serious computer vision competition. I gathered information and code from the competition’s kernel & discussions page, ArXiv and past computer vision competitions. I used PyTorch for this competition. My baseline was borrowed from a shared notebook by seefun. Matt cleaned up the code such that we could quickly iterate through ideas. His adaptation served as the foundation for our team’s solution.

Transfer Learning

Transfer learning was a popular topic among participants early in the competition. A few notebooks, consequently, were shared including the following three which I studied and brought into our pipeline:

public notebooks using transfer learning

In addition to torchvision, packages such as EfficientNet-Pytorch and Pretrained models for Pytorch made this task effortless. I tried variants of ResNet, ResNext, SE-ResNext, and EfficientNet early in the competition; however, Matt and I eventually focused our efforts using EfficientNet-b5 prior to the team merge. Afterwards, we trained models using EfficientNet-b7 after it was made available for PyTorch about a month or so into the competition. In the last couple of weeks, both Philipp and I trained models using SE-ResNext 50 and SE-ResNext 101. Consequently, we dropped SE-ResNext-101. Our solution, therefore, consisted of predictions from both EfficientNet-b7 and SE-ResNext 50.

In regards to pretrained models and image size, our team was unable to reach a consensus on image size and models; some of us saw performance increase with larger image size while others did not. After the competition, Qishen Ha explained in his solution details on the relationship between image size, data size and performance.


comparison of scaling

Figure 1: applying local average color with [left] and without [right] scaling.

Initial preprocessing was motivated by a method outlined in Ben Graham’s 2015 winning solution. Local average color removal resulted in an embossing that can be seen at the edges of the retina. Graham performed both scaling the image and applying a circle crop to remove the embossed edge. In figure 1, we can see the embossed edge bleeding into the retina on a small image where scaling was not applied. After scaling, however, the embossed edge is “controlled”. Preprocessing code used in our earlier models is shown below.

Some additional modifications were needed including a negative space crop that could pass through asynchronous kernel submission, and padding to non-squared images (figure 2). Matt’s padding modification boosted our public LB score significantly into the “desirable to team merge early” range. Our final solution more or less stuck to this preprocessing scheme for the final submissions.

preprocessed examples

Figure 2: preprocessed examples.

Winning solutions utilized minimal preprocessing and more work on augmentation. Our solution was clearly the opposite. Towards the end of the competition, we admittedly wasted a lot of time and effort on preprocessing methods although we all shared skepticism and doubt about it helping.

def preprocess_dataset(id_code, image_path, save_path, square_pad=True, circle_crop=True, blur=True, image_size=320):

    # load image
    image = cv2.imread(f"{image_path}/{id_code}.png")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # remove extra black space
    if image.ndim == 2:
        mask = image > 7
        image = image[np.ix_(mask.any(1), mask.any(0))]
    elif image.ndim == 3:
        mask = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) > 7
        check_shape = image[:,:,0][np.ix_(mask.any(1), mask.any(0))].shape[0]
        if check_shape != 0:
            image = np.stack([image1,image2,image3],axis=-1)

    # scale radius
    scale = image_size
    x = image[int(image.shape[0]/2),:,:].sum(1)
    r = (x>x.mean() / 10).sum() / 2
    s = scale * 1.0/r
    image = cv2.resize(image, (0,0), fx=s, fy=s)

    # blur
    if blur:
        image = Image.fromarray(image.astype(np.uint8))
        image = np.array(image, dtype=int) - \
                np.array(image.filter(ImageFilter.GaussianBlur(radius=scale/30)), dtype=int)
        image = np.clip(image*4+128, 0, 255)

    # pad
    if square_pad:
        max_dim = max(image.shape[0], image.shape[1])
        padder = albumentations.augmentations.transforms.PadIfNeeded(
            value=(128, 128, 128),
        image = padder(image=image)["image"]

    # circle crop
    if circle_crop:
        b = np.zeros(image.shape), (image.shape[1]//2, image.shape[0]//2), int(scale*0.9), (1,1,1), -1, 8, 0)
        image = image*b + 128*(1-b)    

    # save
    image = cv2.resize(image.astype("uint8"), (image_size, image_size))"{save_path}/{id_code}", image.astype(np.uint8))


The albumentations package was used for augmentation. I made a list of possible augmentations based on factors including past diabetic retinopathy competition, shared notebooks, and literature found on both Google Scholar and ArXiv. Deciding which augmentations to keep was based on local score validation and sparingly on public LB. I settled on rotation, horizontal/vertical flipping and an either/or of CLAHE or brightness/contrast. Matt fine-tuned rotation a bit and found that a range of (-45,45) worked best. One thing to note about albumentations: after removing the local average color, the negative space is RGB (128,128,128) or gray. When using rotations, we can retain the background color by setting both the border_mode=constantand value = (128,128,128).


We did not see any significant improvements immediately after merging. With that said, we finally saw our first significant boost after pretraining on 2015 data. The 2015 and 2019 datasets were prepared by APTOS MVP Benjamin Warner. We used the same pipeline for the 2019 data set to pretrain on 2015. We settled on using 15-30 epochs after a few submissions.


We iterated through many shared ideas which explains our ridiculously high submission count. Our final solution, however, was always going to be a blend as no single model performed well enough to stand on its own. Although we all agreed that the public test set was unlike both the train and private test set, we were less confident about how to blend our predictions. With that said, Philipp and Dmitry both handled the blending skillfully, and helped the team to avoid a heart breaking slide. Interestingly, our blending approaches never surpassed 0.839 on the public leaderboard until the very end with a marginal lift to 0.840; we basically sat on this score for the last month of the competition. At the end of the competition, our team secured 9th place or gold standing.


A journal of my machine learning journey... journey... journey.....