Completed Research

This project focuses on visual attention as an approach to solve captioning tasks with computer vision. We have studied the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.

The team:

Results presented on:

Source code:

Slides:

Computer vision to identify the incorrect use of face masks for COVID-19 awareness

Face mask detection has become a great challenge in computer vision, demanding the coalition of technology with COVID-19 awareness. Researchers have proposed deep learning models to detect the use of face masks. However, the incorrect use of a face mask can be as harmful as not wearing any protection at all. In this thesis, we propose a compound convolutional neural network (CNN) architecture based on two computer vision tasks: object localization to discover faces in images/videos, followed by an image classification CNN to categorize the faces and show if someone is using a face mask correctly, incorrectly, or not wearing any mask at all. The first CNN is built upon RetinaFace, a model to detect faces in images; whereas the second CNN uses a Resnet-152 architecture as a classification backbone. Our model enables an accurate identification of people who are not correctly following the COVID-19 healthcare recommendations on face masks use. We have released our proposed computer vision model to the public, and optimized it for embedded systems deployment, empowering a global use of our technology.

The team: 

Results presented on:

Source code:

Slides:

The Future of Agriculture: Detecting Tomato Maturity Levels with YOLOv8

The team:

Results presented on:

Source code:

Slides:

A web application to learn sign language with deep learning

Deep learning and computer vision are used to create applications that facilitate a better interaction between humans and machines. In the educational domain, obtaining information about sign language is simple, but finding a platform that allows for intuitive interaction is quite challenging. A web app has been developed to address this issue by employing deep learning to assist users in learning sign language. In this study, two models for hand-gesture recognition were tested, utilizing 20,800 images; the models tested were Alexnet and GoogLeNet. The overfitting problem encountered in convolutional neural networks has been considered while training these models. Several techniques to minimize the overfitting and improve the overall accuracy have been employed in this study. AlexNet achieved an 87\% of accuracy rate when interpreting hand gestures whereas GoogLeNet achieved an 85\% accuracy rate. These results were incorporated into the web app, which aims to teach the alphabet of American sign language intuitively

The team:

Results presented on:

Source code:

Slides:

Deep reinforcement learning for efficient nucleus cell location in digital pap smears

In August 2020, World Health Assembly (WHA) defined three main global targets for eliminating cervical cancer. Hopefully,  they may eliminate cervical cancer by 2030 by following three targets. One of those targets is ``70\% coverage of screening". This target required experienced professionals to complete the analysis of Papanicolaou digital images. Pathologists carry out the analysis of the image which takes around 30 minutes. The lack of pathologists retard the goals of the target proposed by WHA. This thesis focuses on using a deep reinforcement learning agent that learns by itself, by rewards, penalties and pass experiences, to move toward cell nucleus in digital images, by following an optimal path.In principle this will create an agent whose input are raw pixels and its output is nucleus coordinates. This information will be used for other specialised future agents to detect deviated cells. The final idea is to construct high efficiency automatic Papanicolaou analyzing machine.

The team: 

Results presented on:

Source code:

Slides:

Septorhinoplasty prediction with Denoising Diffusion Probabilistic Models

Septorhinoplasty is a complex surgical procedure aimed at improving the functional and aesthetic aspects of the nose. In this study, we propose a novel approach for predicting post-septorhinoplasty outcomes using Denoising Diffusion Probabilistic Models (DDPM). The methodology involves generating synthetic post-surgery images from preoperative patient photos through the application of DDPM-induced noise. Subsequently, a U-Net architecture is employed to denoise the synthesized images and predict the potential postoperative appearance.

The first step of our framework involves training the DDPM on a large dataset of pre-septorhinoplasty patient images to learn the underlying distribution of natural nose variations. This trained model is then used to synthesize a diverse set of postoperative nose images by introducing diffusion-based noise to the original preoperative images. The introduction of noise helps capture the inherent uncertainty and variability present in real-world surgical outcomes.

Next, we employ a U-Net, a popular architecture for image denoising, to process the synthetic postoperative images generated by the DDPM. The U-Net is trained on paired data consisting of the original preoperative images and their corresponding noise-introduced counterparts. This supervised training process enables the U-Net to effectively remove the induced noise while preserving the essential features of the post-surgery nose appearance.


Technical requitements:

The student is expected to use a dataset of images from rhinoplasty surgeries and generate an image based on CGANs (Conditional Generative Adversarial Networks). 

The new image will be generated from a photo of a patient before the surgery. 

The team:

Results presented on:

Source code:

Slides:

Ongoing Research

Decentralized-based blockchain system for enhancing the management of scientific publications

The peer review process by which a scientific research must go through to be reviewed and subsequently published or rejected has several positive and negative factors, among them is the factor that other scientists evaluate the quality of the work of other scientists to guarantee a rigorous and consistent work . Likewise, the identity of the evaluators is not revealed, in this way it is sought that the evaluation is not biased. However, the review process may lead to the publication to be delayed for more than one year. In some cases, investigators need to pay for the article processing charges. This is why the following proposal was born to improve this system by creating a decentralized version based on blockchain, which is used to review research managed by the scientific community through smart contracts. Since this new system will have a more secure anonymity factor, every step of the review process will be open and transparent. Reviewers will be rewarded with system tokens. The authors of the manuscripts with the highest impact will also be rewarded through tokens. In this way, the review process is expected to be faster, fairer, safer, and more transparent. In this graduation project, it is expected to write a white paper to document the proposed system.

The team: 

Results presented on:

Source code:

Slides:

Knowledge Curation in the Digital Age: A Deep Learning-based Natural Language Processing Approach

The team:

Hypothesis:

Resources:

Results:

Source code:

Slides:

Boosting Image Captioning Using ConvNeXt Deep Neural Networks

The team:

Results:

Source code:

Slides:

Deep reinforcement learning for dynamic electromagnetic spectrum access

Required knowledge:

The team:

Results presented on:

Source code:

Slides:

Deep learning hyperparameter optimization via genetic algorithms

The team:

Results presented on:

Source code:

Slides:

Development of a Generative Language Model-Based Supplementary Tool for High School and College Students Using META Language Models

EDUAI chatbot powered by Llama-2-7b-chat-hf, based on the model developed by Meta, uses Pinecone as its vector database and employs cosine similarity calculation to identify and provide the most appropriate answers to user queries. This chatbot stands out for its educational approach, as it draws from an extensive database that includes exercises, audiovisual material, formulas and theorems essential for university and college level students. Its ability to contextualize answers within the database allows users to efficiently access valuable educational resources, making it a powerful tool for learning and teaching.

The team:

Results presented on:

Source code:

Slides:

End-to-end Sign Language Translation with Stochastic and Natural Language Processing Transformers (BERT)

The team:

Results presented on:

Source code:

Slides:

Artificial intelligence-based eye-tracking algorithm as a neuro-scientific instrument

Technical requirements:

Potential applications:

The team:

Results presented on:

Source code:

Slides:

Open Research Calls for Yachay Students

If you are a Final Graduation Project  student, you can apply for supervision on the following topics. The students will be the main researchers supervised by one of the DeepARC teams.

Plague pattern recognition based on inspection data collected in Galápagos

Description:

The plan:

Technical Requirements:

The team:

Results presented on:

Source code:

Artificial intelligence-based model to help teachers grade academic writing essays

Functional requirements:

Non-functional requitements:

Notes:

The team:

Results presented on:

Source code:

An Evolutionary Approach for Deep Neural Network Parametrization

The team:

Reading requisites:

The dataset:

Results presented on:

Source code:

Machine learning as an enabler the next-generation WiFi networks

Technical requirements:

The team:

Results presented on:

Source code:

Online vs. offline computing: comparison, applications and use-cases

Required knowledge:

Technical requirements:

The team:

Results presented on:

Source code:

Social media data to empower an AI-based sentiment analysis model

Technical requirements:

Potential applications:

The team:

Results presented on:

Source code:

AI-based mass movement inventory model for early warning systems

Technical requirements:

Potential applications:

Available data:

The team:

Results presented on:

Source code:

Feel free to contact us if you are interested in being part of any of our teams.If you are an external professor/researcher, we are happy to collaborate.You can bring your own ideas and we will build a tailored research team for you to work with.If you are a Yachay Tech student, you are required to have approved a related lecture with 8/10 or higher. The max. number of supervised students per semester is 4 (not including co-supervision).Student's applications are evaluated until the first week of the semester (as long as the max. number of students has not been reached).