Trinity College Dublin

Skip to main content.

Core links

FYP Projects 2016/17 and Proposals for 2017/18

1. Introduction

The information on this page applies to students taking final year projects, Year 5 dissertations, and M.Sc. dissertations in the School of Computer Science and Statistics under the following programmes:

  • BA (Mod) in Computer Science
  • BA (Mod) in Computer Science, Linguistics and a Language
  • BA (Mod) in Computer Science and Business or Business and Computing
  • BAI in Computer Engineering (D Stream)
  • BA Management Science and Information System Studies (MSISS)
  • BAI in Computer Engineering and Microelectronics (CD stream)
  • Management Science and Information Systems Studies
  • BA (Mod) Mathematics
  • Master in Computer Science (MCS)
  • Master in Computer Science (M.Sc.)
  • MAI in Computer Engineering

2. Guidelines for students

Important dates and deadlines for academic year 2016/17

Course Activity Date
Integrated Computer Science (Yr 4) Project Selection Fri Oct 13, 2017
Final Year Project Project Demonstration Period April 2-6, 2018
  Project Presentation Material and Poster Submission Date Mon Apr 9, 2018
  Project Report Due Fri May 4, 2018
  Project Presentation and Poster Session Thu Apr 12, 2018
Integrated Computer Science (Yr 4) Internship Details Form Submission Date Fri Dec 8, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Mon Aug 6, 2018
  Final Submission of Reflective Diary Mon Aug 6, 2018
Master in Computer Science (Integrated, Yr. 5) Project Demonstration Period Tues May 1, 2017 & Thu May 3, 2018
  Poster Submission Date Mon May 7, 2018
  Project Presentation and Poster Session Thu May 10, 2018
  Dissertation Submission Date Thurs May 17, 2018
Master in Computer Science (M.Sc.) Project Demonstration Period TBD (Summer 2018)
  Dissertation Submission Date Friday August 31, 2018
Computer Engineering (Yr 4) Project Demonstration Period April 2-6, 2018
Final Year Project Project Report Due Fri May 4, 2018
Computer Engineering (Yr 4) Internship Details Form Submission Date Fri Dec 8, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Mon Aug 6, 2018
  Final Submission of Reflective Diary Mon Aug 6, 2018
Master in Computer Engineering Project Demonstration Period Wed March 21 - Fri March 23, 2018
  Project Report Due Thurs May 17, 2018
Management Science and Information System Studies Interim Presentations Mon Dec 5, 2016 - Fri Dec 9, 2016
  Project Report Due

Fri March 24, 2017

Computer Science & Business / Business and Computing Project Demonstration Period April 2-6, 2018
  Project Report Due Fri April 20, 2018
Computer Science Linguistics and Language Project Demonstration Period March 12-16, 2018
  Project Report Due Tues Apr 3, 2018


* Due to scheduling constraints it may be necessary to hold some demonstrations later in the week.

When to choose a project

An initial list of project proposals (from lecturing staff) will be released on the Thursday of the last week of Semester 2 in your Junior Sophister year. Supervisors will not accept supervision requests before this time. Further project proposals may be added to this list by lecturing staff over the summer vacation.

Students should select a final year project before the end of the third week of Semester 1. Where students have not selected a project by the deadline, a project supervisor will be allocated to them in consulation with the relevant course director out of the supervisors who have not yet reached their supervision limits. The chosen supervisor will assign the student a project or help them to specify a project in an area selected by the supervisor.

How to choose a project

Students may either

  • select a project from the list of project proposals put forward by the lecturing staff, or
  • alternatively propose their own projects. If you have a project proposal of your own and if you are having trouble finding an appropriate supervisor, then contact your course director:

In either case students must get the agreement of a supervisor before they will be considered as having selected a project. Supervisors may require a meeting with the student to discuss the project before accepting a supervision request. Once a supervisor agrees to supervise a project, details of the project assignment will be recorded centrally by the supervisor.

Students may only select a single project, but they may change their minds and select an alternative project before the end of the third week of Semester 1. However, if a student selects a new project, they must notify both the old and new supervisors that their previously chosen project is to be cancelled.

Choosing a project supervisor

Students should note that each supervisor will only take a limited number of students. If you find the information is incorrect please send details to

Students should also note that there are only a limited number of supervisors in any area. Hence students are not guaranteed a project in their area of choice.

Project demonstrations and reports

See the following documents:


3. Supervisors' project areas

The following table indicates the broad areas within which projects are generally supervised, together with the potential supervisors in these areas. Each name is linked to a list of projects proposed by that lecturer.

Subject AreaSupervisors willing to supervise projects in this area
Artificial Intelligence Michael Brady, Vincent Wade, Martin Emms, Tim Fernando, Rozenn Dahyot, Carl Vogel, Khurshid Ahmad, Joeran Beel
Computational Linguistics Martin Emms, Tim Fernando, Carl Vogel, Khurshid Ahmad
Computer Architecture Jeremy Jones, David Gregg, Michael Manzke, John Waldron, Jonathan Dukes
Computer Vision Kenneth Dawson-Howe, Gerard Lacey
Distributed Systems Vinny Cahill, Stefan Weber, Mads Haahr, Dave Lewis, Jonathan Dukes, Melanie Bouroche, Siobhan Clarke
Foundations and Methods Hugh Gibbons, Andrew Butterfield, Glenn Strong, Tim Fernando, Vasileios Koutavas
Graphics, Vision and Visualisation Kenneth Dawson-Howe, Fergal Shevlin, Gerard Lacey, Michael Manzke, John Dingliana, Carol O'Sullivan, Rozenn Dahyot, Khurshid Ahmad, Rachel McDonnell, Aljosa Smolic
Health Informatics Lucy Hederman, Gaye Stephens, Mary Sharp, Joeran Beel
Information Systems Mary Sharp, Joeran Beel
Instructional Technology Brendan Tangney, Mary Sharp, Glenn Strong, Richard Millwood
Interaction, Simulation and Graphics John Dingliana
Knowledge and Data Engineering Vincent Wade, Lucy Hederman, Mary Sharp, Declan O'Sullivan, Dave Lewis, Owen Conlan, Khurshid Ahmad, Rob Brennan, Seamus Lawless, Kris McGlinn, Kevin Koidl, Alex O'Connor, Joeran Beel
Networks and Telecommunications Donal O'Mahony, Hitesh Tewari, Stefan Weber, Eamonn O'Nuallain, Meriel Huggard, Ciaran McGoldrick, Jonathan Dukes, Stephen Farrell, Melanie Bouroche, Marco Ruffini, Douglas Leith, Lory Kehoe, Georgios Iosifidis
Other David Abrahamson, Michael Brady, Stephen Barrett, Khurshid Ahmad, Melanie Bouroche, Marco Ruffini, Vasileios Koutavas, Douglas Leith, Joeran Beel
Statistics Mary Sharp, Rozenn Dahyot, John Haslett, Simon Wilson, Brett Houlding, Jason Wyse, Arthur White, Douglas Leith, Bernardo Nipoti

4. Project proposals for the academic year 2017/18

The following is a list of suggested projects for final year BA (CS), BA (CSLL), BA (CS&B /B&C), BAI, MAI, MCS, M.Sc., and MSISS students for the current year. Note that this list is subject to continuous update. If you are interested in a particular project you should contact the member of staff under whose name it appears.

This is not an exhaustive list and many of the projects proposed can be adapted to suit individual students.

Dr. Arthur White

Using Social Media Analytics to Facilitate Hepatitis C Immunology Study - TAKEN


In 1977-79, hundreds of Irish women fell victim to hepatitis C virus (HCV) infection when they were given virus-contaminated anti-D. Usually anti-D is a blood product of great benefit given to women with blood groups incompatible with their new-born baby. However, in 1977-79 this normally beneficial product was unknowingly contaminated with HCV, which can invade and gradually destroy the liver. Until recently, researchers believed that receiving HCV-contaminated blood products, where high viral loads directly enter the blood stream, would inevitably lead to infection.

In the aftermath of the outbreak, it was discovered that, when screened for HCV, almost half of the women who clearly had contact with the virus showed no signs of infection. Researchers in the School of Biochemistry and Immunology are interested in comparing the genetic profiles of naturally resistant people with those who were unable to resist infection, with a view to uncovering the mechanism behind the mystery of natural HCV-resistance. Exploiting this knowledge could lead to new ways to make vaccines and anti-viral drugs.

To facilitate this study, the research team seek volunteers who were exposed to HCV via contaminated anti-D in 1977-79 to help with the study. A publicity campaign is currently being launched (see video here) , which will incorporate both traditional and more modern social media elements. For example a Facebook page has been launched. We are interested in understanding how effective these elements will be in identifying the cohort of exposed women from the general public.

Anticipated Outcomes

This project will be principally concerned with the social media elements of the media campaign. In particular, a Facebook page and Twitter profile will be created to in order to share information related to the project. The student will be required to:

  • Conduct a literature review of similar previous social media campaigns.
  • Construct datasets of interest based on the campaign's social media interactions.
  • Profile the audience demographics using data analytics and social network analysis tools.

For further information, I can be reached by email ( or phone (+1062). I am based in room 144, Lloyd Institute.

Dr. Joeran Beel

Position: Ussher Assistant Professor
Affiliation: ADAPT Centre & Intelligent Systems Discipline / Knowledge and Data Engineering Group (KDEG)
Contact: If you want to do one of the projects, or have your own idea, please read about how to continue in my WIKI (you need to register to get access).
Last update: 2017-09-26

The following projects are only suggestions, and I am open for your own ideas in the areas of:

  • Recommender Systems
  • Machine Learning
  • User Modelling
  • Information Retrieval
  • Artificial Intelligence
  • Information Extraction
  • Natural Language Processing
  • Text Mining
  • Citation Analysis
  • Bibliometrics
  • Altmetrics
  • Scientometrics
  • Plagiarism Detection
  • Blockchain
  • Digital Libraries
  • Digital Humanities
  • Finance (FinTech)
  • Legal
  • Tourism
  • Healthcare
  • Business start-ups

Please note that I am not always sure that the following ideas are novel and feasible. It is your responsibility to do some research before you start the project to find out if the idea is novel and if you are capable of completing the project in the given time frame.

Many of the projects are suitable for business start-ups. If you are interested in doing a business start-up based on one of the project ideas (as part of your FYP or in some other context), contact me to discuss the details.

Improving Research-Paper Recommendations with One of Various Methods (Machine Translation, Machine Learning, Natural Language Processing, ...)

One of my main projects is Mr. DLib, which is a recommender-system as-a-service that delivers every month some million research-paper recommendations via an API to partners such as JabRef, Sowiport, MediaTUM and soon also TCD's TARA. In the context of Mr. DLib there are many projects you could do. The advantage of participating in Mr. DLib is that you will work with a real-world system that is used by real-users, i.e. you can evaluate your work with thousands of users instead of evaluating it with a small user study that has maybe ten participants as you would do in many other projects. In addition, you will work closely with the team of Mr. DLib, i.e. you are involved in an active ongoing project instead of sitting alone on your desk and pursuing your FYP. To work with Mr. DLib you need good JAVA programming skills and basic Linux knowledge. Knowledge in APIs and (REST) Web Services, Python, and web standards (XML, JSON, HTTP, ...) are helpful, though not a requirement. A few project ideas are outlined in the following.

Machine-Translation-Based Recommendations: improve our content-based recommendations with machine-translations

Problem/Background: In Mr. DLib, we have millions of documents in various languages (English, German, French, Russian, ...). This leads to a problem, when a user looks at a German document, and Mr. DLib should recommend related documents. Assuming that every researcher speaks English, it would make sense to recommend English documents, even when a user currently looks at a German document. However, Mr. DLib's current recommendation approach only recommends documents as related that contain the same words as the input document. Consequently, multi-lingual recommendations are not possible.

Solution/Goal of the Project: You apply different machine-translation frameworks to translate all non-English documents (title and abstracts) to English. This way, all documents are available in the same language in our database, and we can also recommend e.g. English documents when a user looks at a German document.

Predict Demographics based on data such as names, location, or emails to improve recommender systems

Problem/Background: There are many recommendation approaches that utilize users' demographics (age, gender, nationality, ...). For instance, similar users might be identified based on demographics, or certain stereotype recommendations may be made based on demographics (e.g. recommend perfume to female users, and cars to male users). However, demographic data is not always available.

Solution/Goal of the Project: There are many ways to predict users' demographics based e.g. on their name. However, also other information can be used (e.g. if someone is having an email like, the user is probably from Germany; and if someone has an the user is probably 30+ years old). You task would be to improve a recommender system with such predicted demographic data. This means, you take a demographic recommendation approach with some data that has no demographic data but information about users' names, emails etc. and use then the inferred demographics to improve the recommender system.

Personalized Research-Paper Recommendations for JabRef Users

Problem/Background: We have integrated Mr. DLib into the reference management software JabRef already. However, so far, users can only receive non-personalized related-article recommendations. This means, a user looks at one article, and receives a list of similar articles. The problem of such recommendations are that they don't take into account what articles a user has previously looked at.

Solution/Goal of the Project: You extend the integration of Mr. DLib in JabRef, so that more comprehensive data of the users is transfered to Mr. DLib's servers, and on the servers persoanlized recommendations are generated. You will work with REST web services during this project, JAVA, MySQL, and recommendation frameworks.

Taken: "Nobel Prize or Not?" (Academic Career/Performance Prediction)

Problem / Background: Researchers' performance needs to be assessed in many situations - when they apply for a new position (e.g. a professorship), when they apply for research grants, or when they are considered for an award (e.g. the Nobel Prize). To assess a researcher's performance, the number of publications, the reputation of the journals and conferences of the publications where published in, and the citations a researcher has received are often used. However, up to now, these numbers are rather focused on the past. For instance, knowing that a researcher has published 50 papers by now and accumulated 2,000 citations says little about how the researcher will perform in the future.

Solution / Goal of the project: Your goal is to develop a tool (website or desktop application) that predicts how well a researcher will perform in the future, i.e. in which venues the researcher will publish, how many citations s/he will receive, etc. Ideally, a user could specify a name in that tool, and then a chart or table would be shown with the predicted citation counts etc. In addition, the tool might predict what the next university could be the researcher will work at.

Methodology (how to achieve the goal): One way to achieve the goal is to 1. create a data set that contains historic data of many researchers. The data should include the researchers' publication lists, citation counts, and ideally a work history (at which universities did the researcher work, and when) and the reputation of the venues the researcher has published in. Such data could be obtained from Google Scholar, SemanticScholar, LinkedIn, ResearchGate, and Scimago. 2. Based on the data, you train a machine learning algorithm that learns how well researchers perform based on the collected data 3. You apply the machine learning algorithm to predict a researcher's performance.

"Pimp That Voice" (Eliminate Annoying Accents from Audio/Video or Live Talks)

Problem / Background: You watch a movie on YouTube or on Coursera, and the person that talks in the video has a horrible voice. For instance, the person might have a terrible accent (e.g. German), a super-high pitch voice that hurts your ears, or the person starts every sentence with ‘so’, ends every sentence with ‘right?’ and uses the word ‘like’ in every sentence twice.

Solution / Goal of the project: You develop either a software library or an entire application that gets a video (or audio) file as input and returns the file with the “pimped” audio. The “pimping” could focus on:

  1. Removing the speaker’s accent or simply replacing the voice with a completely different one. This means, from a user’s perspective, a user could either select to “make the original voice more pleasant” or “replace the original voice with a completely new one”.
  2. Improving the speaker’s grammar, i.e. remove unnecessary words such as ‘so’ or unnecessary uses of ‘like’.

Ideally, the tool works in real-time, i.e. it could process videos streams e.g. from YouTube while watching a video, or even when a public speaker talks with a microphone. However, for this project it is also ok to work with normal videos files, and if the processing takes a while. You could also slightly shift the focus of the application from improving the voice to helping people to become better speakers. This means, a user would talk with a microphone in private, and whenever the user starts a sentence with e.g. "So, " or ends a sentence with a rhetorical "right?), a beep would occur to remind the speaker to not do that. A very simple version of this project could be that you simply count how often a user says some of the "prohibited" words and at the end of the talk you display some statistics. Other variations include to change the problem from video/audio to phone calls (e.g. with customer support centres). Your goal would then be to develop a tool that allows a company having e.g. a call centre in India and giving all employees a British accent, or the same voice, when they talk to customers. There is a huge business potential in this. See also

Mass Job-Application Detector

Problem/Background: Every professor and every HR person know the problem: You get an email from someone asking for a job, and, among others, you need to assess how much the applicant really wants to work with you, or if the applicant has sent the application to dozens of other companies/professors.

Solution/Goal of the Project: Develop a tool that detects mass-job-applications. The tool could be, for instance, an add-on for Gmail and when there is job application, the tool warns the current user that the application (probably) is a mass application. I see two options to realize such a tool (but there might be more)

  • Provide a probability score that the email is a mass application. This score could be machine learned based on the emails text and maybe other features. This task would be similar to detecting spam mails.
  • Check how many other users have received the same (or very similar) email. The more users received the email, the more likely it is a mass application. For professors, it would probably already enough if just a few other colleagues at the department also use the add-on to get reliable results.

Detect Drones with Crowdsourced Distributed Cameras

Problem/Background: There are more and more drones flying around, and the risk of e.g. terror attacks through drones is increasing. Consequently, the need to being able to detect drones as early as possible increases.

Solution/Goal of the Project: Imagine a distributed network of house owners who all install cameras (video or photos) on the roofs of their houses. The cameras could communicate over the Internet and together detect if an object in the air is a drone or a bird, cloud, airplane, ... Such recognized drones would be shown on a map and if they enter no-go zones, the policy would be informed. There are many extensions possible. For instance, cameras could be mounted on rotatable installations. Once, one camera detects a suspicious object, other cameras could rotate to that direction and take additional photos. The automatic plagiarism creator

Problem/Background: None :-)

Solution/Goal of the Project: Develop a tool that takes a text as input (optionally a text with additional data such as citations, figures, tables, ...) and then rephrases the text and/or translates it, replaces (some) references, and redraws the figures and tables. All this should be done with machine learning. The goal of this project would be to demonstrate how good machine learning is nowadays and how difficult it is to detect such plagiarism. This project would be for demonstration purposes only, and/or to eventually support the detection of plagiarism. You would have to think of a way to minimize the risk of your tool being abused for actually creating plagiarism (one solution could be that all submitted texts and returned output are publicly available).

The Kaggle Machine-Learning Competition Solver

Problem/Background: Kaggle is a platform for machine learning and many companies offer competitions on Kaggle with the winner receiving significant prizes. Participating on these competitions is not really difficult in theory as it's usually always about cleaning the data, trying different machine learning algorithms and calculating whatever evaluation metrics the company wants. However, this is a time consuming process.

Solution/Goal of the Project: You write a tool that automates the process of participating in Kaggle competitions as much as possible. One potential solution would be to use "Automated Machine Learning Frameworks", which are capable, to some extent, to automatically run a large number of algorithms on a given dataset and find the most effective one. Alternatively, such a tool might also be used by Kaggle as a baseline in the long run, i.e. other participants need to be better than the automated machine learning libraries.

"Photo2Location" (Guide a User to the Location a Photo was taken)

Problem / Background: When people visit a nice restaurant or sight seeing spot, they usually take photos. However, when they look at the photos some days, weeks, or even months later they often do not remember where the photo was taken exactly, or, how to get there to e.g. eat again in that nice restaurant.

Solution / Goal of the project: You develop an app for iOS and/or Android that allows opening a photo in that app, and then the app displays the location of the photo on a map, and guides you there. This way, users can easily find e.g. the nice restaurant where they have eaten some weeks ago.

The project as describes here is probably not comprehensive and novel enough to justify a Final Year Project. So, you would have to come up with some additions to extend the project scope and make it a bit more original (maybe add a recommender system?).

Time-Normalized TF-IDF Term Weighting for Enhanced Search and Document Similarity Calculations

Problem/Background: TF-IDF is one of the most common term-weighting schemes to calculate how relevant a document is for a given search query. 'TF' stands for 'term frequency' and the rationale is that the more often a query term appears in a document, the more relevant that document is for the query. 'IDF' stands for 'inverse document frequency' and the rationale is that a document is the more relevant for a query term, the less documents in the corpus contain that term. The problem with this approach is that it does not consider since when the term is used in the documents. For instance, the term 'bitcoin' appears only since few years in documents because Bitcoin was only introduced in 2009. In contrast terms like "search engines" are used since decades, consequently there are much more documents to expect containing that term.

Solution/Goal of the Project: You modify the traditional TF-IDF formula (and/or other term weighting schemes) to consider since then the query term is used, and then you evaluate if the retrieval performance can be improved.


Problem/Background (1): You get a text message from your dearest and wish she/he had send that message as voice or video message

Problem/Background (2): You voice or video talk with your dearest and the quality is too low to really hear or see the other person

Solution/Goal of the Project: You develop a tool, e.g. a WhatsApp add-on, that is capable to play a text message in the voice of the person who sent the message. This means, for instance, when your mother sends you a text message, you can press a button "Read out loud" and then the message is played in your mother's voice. To accomplish this, the voice of your mother would have to been learned e.g. from previous WhatsApp voice calls with her. Or, even cooler, the text message would not be only read out loud but you could see a video of your mother. Similarly, to address the second problem, you could develop a tool, e.g. a WhatsApp add-on, that transforms the speach of user 1 to text, sends the text user 2, and on user 2's phone the text is then played as voice or video.

There are variations possible. For instance, it would not necessarily have to be Text2Voice or Text2Video. It could also be Voice2Video.

Sketcha: Captchas based on Sketches

Problem/Background: Captchas are common tools to detect spammers and bots and there is a constant battle between the two groups. There are audio captchas, voice captchas, text captchas and so on and so forth. Also sketches have been used to detect spammers and bots However, Google recently released a huge dataset with sketches To the best of our knowledge, this data has not been used to implement a novel captcha method.

Solution/Goal of the Project: Be creative and find a way to develop a novel captcha method based on sketches (either the ones of Google, or some other method).

A Generic Machine-Learning Based Website Parser/Scraper (for research articles)

Problem/Background: In many situations it is necessary to parse a website and identify e.g. the title, authors or abstract of the text being displayed on a web page. This is important for web crawling but also to e.g. import research articles from the web to your reference manager (see e.g. ). As far as I know (please let me know if I am wrong), current parsers use heuristics and templates to identify e.g. the title of a website. With the huge advances in machine learning this seems not appropriate any more.

Solution/Goal of the Project: Develop a web page scraper/parser that identifies certain elements automatically from a web page. The parser should be trained with machine learning, and compared against the state-of-the art parsing tools. I am particularly interested in parsers for academic content, similar to but am also open for other disciplines (e.g. parsing news websites).

Extending Word-Embeddings with Citation Context

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: The idea is to replace citations with e.g. the titles of the cited documents. So, while normally a text being used for learning word embeddings would look like this:

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

... the "extended" approach would add the title of the cited document to that text and use this extended text for learning. Hence, the text would change to

One of the most common recommendation approaches is content based filtering. Research Paper Recommender Systems: A Literature Survey found that 53% of all research-paper recommender systems use content-based filtering.

Citation-Embeddings: Applying the Idea of Machine-Learned Word-Embeddings to Citations in the Context of Research-Paper Recommender Systems

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: Instead of terms, citations are used for the embedding. The approach would either use citations only, or a hybrid approach of citations and terms. Citation embeddings can rather easily be created when for each citation a unique document ID is given. For instance, if two documents both cite the same document ...

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering [4].

these texts would be converted to

One of the most common recommendation approaches is content based filtering. unique_document_id-4854564 found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering unique_document_id-4854564.

A Citation-Proximity-Based Ground-Truth to Train Text-Based Recommender Systems / Learning to Predict Citation Proximity based on Terms

Problem/Background: Many recommender systems (e.g. for news, web pages, research articles, ...) need to be able to identify related documents for a given input document. For instance, if a user reads a news article, the website might want to recommend related news articles to keep the user reading. Calculating document relatedness is not a trivial task. Often, text similarity is used (e.g. cosine similarity) or relatedness is learned from some ground-truth. For instance, a machine-learning-based recommender system for research articles, might learn that all articles published in the same journal or the same author are somewhat related. However, this approach often does not achieve satisfying results.

Solution/Goal of the Project: A solution could be to learn text-based document similarity based on citation proximity analysis (CPA) You would need to find a document corpus that contains the full-text of research articles and their in-text references. You would then train a machine learning algorithm with the terms and citation proximity. This means, the algorithm should be able to learn how closely two documents will be cited, given their text as input. When you later will have a new un-cited document, you will be able to predict based on the document's text, which other documents potentially would be cited in close proximity to that article.

The 1-billion Citation Dataset for Machine-learning Citation Styles and Entity Extraction from Citation Strings

Problem/Background: Effective citation parsing is crucial for academic search engines, patent databases and many other applications in academia, law, and intellectual property protection.  It helps to identify related documents, or calculate the impact of researchers and journals (e.g. h-index). “Citation parsing” refers to identifying and extracting a reference like [4] in the full text, and author names, journals, publication year etc. from the bibliography. For instance, in the following example the citation parser would have to identify the citation markers [1] and [2] and [3], [4], and then extract from the bibliography that for the first entry "K. Balog", "N. Takhirov" etc are the atuhors.

1 Introduction
Retrieving a list of ‘related documents’ for a given source document – e.g. a web page, patent, or research article – is a common feature of many applications, including recommender systems and search engines (Figure 1 ). Document relatedness is typically calculated based on documents’ text (title, abstract, full-text) [1] and metadata (authors, journal, …) [2], or based on citations/hyperlinks [3], [4].

6 Bibliography
[1]    K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg, “Multi-step Classification Approaches to Cumulative Citation Recommendation,” in Proceedings of the OAIR’13, 2013.
[2]    D. Aumueller, “Retrieving metadata for your local scholarly papers,” 2009.
[3]    B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 2, pp. 571–575.
[4]    S. Liu and C. Chen, “The Effects of Co-citation Proximity on Co-citation Analysis,” in Proceedings of the Conference of the International Society for Scientometrics and Informetrics, 2011.

Over the years many approaches to reference parsing have been proposed, including regular expressions, knowledge-based approaches and supervised machine learning. Machine learning-based solutions, in particular those falling into the category of supervised sequence tagging, are considered a state-of-the-art technique for reference parsing. Unfortunately, they still suffer from two issues: the lack of sufficiently big and diverse data sets and problems with generalization to unseen reference formats. Especially for deep learning, much larger datasets would be needed than exist today.

Solution/Goal of the Project: Your goal would be 1) to create a massive citation dataset 2) use that dataset to train (deep) machine learning approaches to parse citation strings.

Methodology: To achieve the goal, you could do the following

  1. Download/parse millions of structured metadata of academic publications e.g. from ACM Digital Library, IEEE, PubMed, ... (they all offer their metadata as BibTeX, Endnote, ...). For instance, you would have millions of entries like this:

    author = {Beel, Joeran and Aizawa, Akiko and Breitinger, Corinna and Gipp, Bela},
    title = {Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia},
    booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
    year = {2017}
  2. Use and to create millions or even billions of citation strings based on the parsed metadata. "" is a collection of thousands of citation styles and citeproc-java a framework to convert e.g. bibtex into one of the thousand styles, i.e. you could output the previously parsed metadata in thousands of different citation styles, e.g.
    • J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • Beel, J., Aizawa, A., Breitinger, C. & Gipp, B. Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017).
    • Beel, Joeran et al. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • ...
  3. In addition, it might make sense, to create further artificial citations with a knowledge base. For instance, you download a list of journal names (e.g from and names (first name, last name), create random page numbers etc. and then create billions of new citation strings.
  4. It could also make sense to create the citation strings, make a Word or LaTeX document that contains a bibliography with e.g. 5-30 citation strings, create a PDF out of it, Parse the PDF with one of the many PDF parsing tools to identify the bibliography and citation strings, and then use that data for learning. This would be a more realistic scenario because probably the PDF creation and parsing would introduce some errors/noise into the citation strings.
  5. Use machine learning frameworks like scikit-learn or Tensorflow to learn the elements of a citation string.

"Unshredd Me" (Reconstruct Shredded Documents)

Problem/Background: Criminal investigators and others often face the problem that suspects shredded documents, i.e. destroyed evidence. Hence, the investigators need to restore the shredded documents, which is a lot of work and sometimes impossible.

Solution/Goal of the Project: You develop an "unshredder" tool (website or desktop application) that takes as input a photo of a shredded document and then returns the unshredded documents. To accomplish the project, you probably have to create a dataset of photos showing shredded documents and the original unshredded versions. With this dataset, you can train a machine learning algorithm and then evaluate how well your algorithm works. When doing the project, you should decide if you want to focus on machine-shredded documents (probably easier) or documents that were torn apart.

"ASEO Me" (Optimize Research Articles for Academic Search Engines)

Problem / Background: Researchers have an interest that their research articles are indexed by academic search engines such as Google Scholar and SemanticScholar as this increases the articles' visibility in the academic community. In addition, researchers should be interested in how well their articles rank for certain keyword searches. Some years ago, I published an article about "academic search engine optimization" that gave advice on how to optimize research articles to make them easily indexable by academic search engines. Nowadays, ASEO is being used by many publishers. However, many researchers are not yet aware of the importance of ASEO and/or they do not have the skills to perform ASEO.

Solution / Goal of the project: Your goal is to develop a website, on which researchers can upload a research article, and your website is analyzing the article and 1. Predicts how well the article will rank for certain keywords 2. makes suggestions on how to improve the article's ranking and 3. (optionally) modifies the article automatically to make it better readable/indexable by academic search engines.

ASEO Reloaded (Academic Search Engine Optimization 2)

Problem / Background: As described in the previous project, I published an article about "academic search engine optimization" that gives advice on how to optimize research articles to make them better indexable by academic search engines. The article was published some years ago, hence not all advices might be sensible, or, due to advances in search engine ranking, some additional aspects might need to be considered.

Solution / Goal of the project: Your goal is to find out how to optimize research articles for academic search engines. In contrast to the other project, the focus here is on the research, i.e. you will do some experiments to find new ways of optimizing research articles, while the focus of the previous project is more on the application (i.e. enabling a user to upload a paper, and getting advice).

Extraction of Phrases Answering the Five Journalistic W-Questions using ML

Problem / Background: News articles typically answer the five journalistic W-questions (5W) within the first few sentences, i.e., who did what, when, where, and why. The 5W phrases answering these questions describe the main event of an article. Thus, the 5W phrases are used for various applications in automated analysis and processing of news, including news clustering (finding related articles), news aggregation (such as Google News), and summarization.

Solution / Goal of the project: Your goal is to find proper features that can be used by machine learning (ML) and deep learning methods to extract from a given article for each 5W question one phrase that best describes the main event. Thereby, you can start by experimenting with features and methods that we implemented in a previous, non-ML-based 5W extractor. After you have devised your method, you should compare the results to our previous system using a gold-standard dataset.

Prof. David Gregg



027 Lloyd Institute


Note: I will not be supervising projects in the 2017/18 academic year. Sorry about that.

However, if you are...

  • a strong C, C++, OpenCL/OpenGL or assembly programmer
  • interested in program optimization, parallel programming, algorithm design
  • interested in contributing to my research group's work on accelerating deep neural networks (DNNs) on embedded systems

...then we may be able to find you a suitable final-year project in my group and in collaboration with my colleague Dr Andrew Anderson.

Possible projects include methods for implementing DNNs using low-precision arithmetic; efficient algorithms for DNNs with sparse weight data; fast DNNs on embedded targets such ARM Mali GPUs; program generation for creating fast DNN libraries; GPU, vector and/or multicore implementations of key deep learning algorithms. Those on the Computer Science degree can find more details on a few specific ideas here.

Note: This work is mostly about program optimization, code generation, parallel programming, algorithm design, and understanding computer architecture to improve software performance. Although the target application area is machine learning, the work is about software performance rather than machine learning techniques.

Dr. John Dingliana



02-014 Stack B


Some general info

I am a member of the graphics, vision and visualisation research group interested in the areas of:

  • 3D Visualisation
  • Computer Graphics and Virtual Reality
  • Graphical aspects of Mixed and Augmented Reality
  • Stylised Rendering / Non photo-realistic rendering
  • Physically-based Animation

Suggested Projects:

  1. Remote telemetry using head mounted Virtual Reality/Augmented Reality displays: video and 3D information acquired from a remote sensor will be transmitted live to a user wearing an HMD (an occulus rift or metavision AR head mounted display will be available for use). Instead of the user merely seeing through "the eyes of the camera", the spatial information gathered will be displayed asynchronously in order to minimize motion sickness effects and allow the user to explore the data independently. Potential challenges include the following (each of which could be the focus of a project):
    • reducing latency between acquisition, processing and display of 3d environments on AR/VR displays
    • adaptive level of detail in interactive AR/VR
    • blending different modalties (e.g. fusion of different sensors or seamless integration of real and virtual).
    • ensuring accuracy/fidelity of the visualization
  2. Engaging with characters in VR/AR : This project is concerned with how immersive, high detail, stereo displays change how we react to virtual characters in Virtual Reality. For instance, are characters that are displayed within our personal space, more engaging or discomforting. The project will require competence at using and exploiting off-the-shelf graphics engines (Unity, Unreal, etc) and datasets however the main focus is in quickly engineering a number of test cases for user studies. A pilot study will be conducted as part of the project, but it is intended that the system developed should be usable in more detailed follow-up user experiments.
  3. Spatial perception in AR: This project will explore how users perceive relative distances of objects (e.g. real vs virtual) in mixed environments. Can users reliably judge which object or feature is closer, do users have an accurate sense of scale, can users be convinced that a real and virtual object are collocated/connected? In particular there is limited work in up-and-coming "see-through AR" devices such as the Microsoft Hololens.
    The effort in the project will be in using one or more AR displays to render experimental 3D graphical scenes wherin virtual objects are embedded in the real world; implementing a number of strategies (mostly from the existing literature) to improve spatial perception in such scenes; implementing a testing scenario to compare spatial percpetion from different strategies; and potentially running a pilot experiment.
  4. Principal Components Analysis in 3D Graphics and Visualization
  5. Superhuman vision using augmented reality. : The objective of this project is to address some of the challenges of merging virtual graphical objects with dynamic real-world objects to provide information about the object to the user in augmented reality (AR). Many AR applications already exist that overlay textual and 2D-graphic information for similar purpose. In addition, various graphical applications in entertainment add augmented objects to real-world environments. This project will deal mainly with graphical (and, where possible, 3D) augmentations of the real-world and the augmentation should be done in real-time. Some example application areas include night vision, “audio vision”, X-Ray vision, enhancing visibility of threats in an environment etc. Microsoft ran a competition for serious applications proposals for the Hololens AR display – and some of the winners are listed [HERE]. As an input the shape of the model captured using imaging and depth sensors will be used. Then the data might be post-processed or filter before blending into the real environment. Data from advanced sensors such as X-ray, audio etc. will likely not be available but might be simulated for the purposes of this project.

Stephen Barrett

Social Software Engineering

My research is focussed on the identification of the unique contribution and impact of the software engineering practice of individuals and the teams they work in. My approach is to treat software engineering as a sociological phenomenon, with source code as the primary artefact of a constructive social network process. My aim is to develop toolsets that capture expert domain knowledge regarding software engineering practice and management in a form that enables us, in the language of Daniel and Richard Susskind, to externalise its qualitative assessment.

The technical basis of the approach is the application of defeasible/non-monotonic argumentation schemes, borrowed from proponents of the strong AI model such as John L. Pollock, but applied to the assessment of human behaviour rather than the replication of human decision making. We apply this method to infer judgements regarding software engineering practice, this analysis being grounded in data derived from code quality measurement, software development process monitoring, and a social analysis of software construction.

This research work is being conducted in the context of a Haskell based software platform that gathers and processes 'ultra-large' scale data sets regarding active opensource and closed source software development. Projects sudents will thus need to be willing at least to take on Haskell as a programming language. Prior experience is not necessary but you should consider yourself to be a strong programmer to work with me.

Some example topics from which suitable projects can be developed include:

  • Automation of Software Development Methodology Adherence Testing: the use of fine grained behavioural measurement regarding software engineering to quantify adherence to development methodology. In this topic, we are interested in delivering practical tools and methods by which software teams can encourage and monitor process development goals.
  • Privacy Preserving Gamification of Software Engineering Processes: the use of gamification in the assessment and management of software engineering processes. In this topic, we are interested in exploring how gamification can positively impact on the performance of teams and individuals.
  • Situated Learning Framework for Software Engineering Community of Practice: the development of a model for the automated identification and recording of engineering activity for practice learning. In this topic, we are interested in developing ways in which the best practice and skill of senior and experienced team members can be automatically packaged as learning resources for the organisation.
  • Sociometrics in Software Engineering: the use of sociometric and biometric data to predict individual and team performance in software engineering. In this topic, we are interested in studying the environment and social network structure of software engineering teams in order to provide actionable measures of team performance and health.
  • A Platform for Social Software Engineering Research: the development of a scalable platform for social software engineering analysis. In this topic, we are interested in developing high level domain specific languages to enable sophisticated bespoke analysis by non-technologists of social network and software quality data pertaining to the software engineering effort.
  • High Scale Code Quality Measurement: a data evolution based cloud platform for the efficient computation and continuous re-computation of code quality metrics. In this topic, we are interested in exploring how predictive relationships might exist between various possible ways of measuring software engineering, such that more efficient and rapid result computation can be achieved.

Please note that I am unfortunately unable to take on projects outside this broad research space.

If these topics interest you, do send me an email, briefly summarising your interest, and software development experience.



Dr Hugh Gibbons





Support for Literate Programming in Java

Literate Programming is defined as the combination of documentation and source put together in a fashion suited for reading by human beings. It is an approach to programming which emphasises that programs should be written to be read by people as well as compilers.
There are many tools available to support Literatre programming but they are mostly available on Unix systems and for programming languages such as Pascal and C. While Javadoc is available to document Java programs, the aim of the project is investigate the benefit of using Literate Programming in Java.

Using CASL to Model a software system

CASL (Common Aalgebraic Specification Language) offers the possibility of formal specification and development while allowing a more informal working style. The project would investigate using CASL to develop a formal model of some software problem which may or may not have already been previously presented in formalisms such as VDM or  Z. This model could then be informally translated into a programming language such as Java.

Developing Programs using Perfect Developer or How to develop a Java program without writing Java.

Perfect Developer is a program development system which allows one to develop provably correct programs. First one develops the program in the notation of Perfect Developer and then the system can verify the program written. Once one has a correct program, Perfect Developer can automatically translate the notation to Java, C++ or Ada.
(See Perfect Developer: How it works)

Imperative Programming within a Functional Programming Setting

While Functional Programming (FP) supports Lists better than Arrays, it is possible to write FP programs that are based on arrays. Since FP programs are side-effect free, it is usually easier to prove FP programs correct than Imperative programs. The aim of the project is to develop Java type programs within an FP setting.

Simulating Puzzles or Games in Functional Programming

Over the last many years there have been successful projects in using Functional Programming to provide animations or simulations of puzzle type programs. Since Functional Programming languages such as Haskell are very high level language, expressing solutions to puzzle type problems may prove easier than in imperative languages or declarative languages such a Prolog or Lisp/Scheme. Possible puzzle problems would be cryptarithms where one has to fill in the missing digits in a arithmetic calculation, logic puzzles or puzzles involving Graph Theory. Puzzles and games from the works of Martin Gardner would be an interesting starting point.

Support Systems for Teaching Logic and Logic Proofs

Systems such as Tarski's World and Hyperproof have proved very valuable in teaching an understanding of both propositional and predicate logic. These systems are part of a more general Logic project Openproof at Stanford's Center for the Study of Language and Information (CSLI). An alternative logic proof system is provided by Jape, a system developed by Richard Bornat and Bernard Sufrin which supports Bornat's book Proof and Disproof in Formal Logic . A more modern Logic Proof System, KE, has been developed by Marco Mondadori and Marcello D'Agostino with associated computer program systems WinKE by Ulle Endriss and LogicPalet by Jan Denef. It would be useful to provide support tools for these system so that these systems could be more widely used. An example of a logic support system is given by the Logicola system by Harry Gensler to support his logic book Introduction to Logic.

Ruler and Compass Construction within Vivio

Vivio is an animation tool that allows one to animate algorithms and simulations. The project would involve investigating the use of this tool for creating classical Euclidean constructions, for example, the construction of a pentagon using a compass and ruler.

Program Transformation of Z into JML

The development of the Java Modelling Language (JML) was influenced by specification languages such as Z. Many software projects make use of tranforming specifications into imperative programs. An example of this approach can be seen, in particular, in the book "Introduction to Z", by Wordsworth. The examples in Wordsworth's book could be used as a starting point in transforming Z specifications to programs into JML.

Annotated Java with JML

The Java Modeling Language (JML) is a behavioral interface specification language that can be used to specify the behavior of Java modules. It is based on the approach of Design By Contract  (DBC) The draft paper Design by Contract with JML (by Gary T. Leavens and Yoonsik Cheon) explains the basic use of JML as a design by contract (DBC) language for Java. See also Joe Kiniry (University of Copenhagen)  presentation, Introduction to JML. A given project would investigate the use of JML providing examples of its use.  For example, how would a program for Binary Searching an array be implemented in JML.


Developing High Integrity Code in Spark

Spark is a high level programming language designed for developing software for high integrity applications. Spark encourages the development of programs in an orderly manners with the aim that the program should be correct by virtue of the techniques used in construction. This 'correctness by construction' approach is in marked contrast to other approaches which aim to generate as much code as quickly as possible in order to have something to demonstrate. Quoting from the book on Spark  "High Integrity Software, the Spark approach to Safety and Security " by John Barnes
" There is strong evidence from a number of years of use of Spark in application areas such as avionics and raliway signalling that indeed, not only is the program more likely to be correct, but the overall cost of development is actually less in total after all the testing and integration phases are taken into account"
SPARK will be familiar to programmers with knowledge of imperative languages such as C, Java and Ada. There is some effort involved with learning how to use the annotations correctly. 
A project using Spark would involve the development of reliable programs that can be proved correct by the Spark system.

Dr Gerard Lacey




087 2396567


I am currently a part time academic as I am currently working in a Trinity spinout My main research areas are computer vision, robotics and augmented reality. My research focus is the development and empirical evaluation of mixed media solutions using real world problems.

Problem / Background

Augmented reality(AR) is the overlay of interactive graphics onto live video such that it reacts to the content of the video image e.g. selfie filters that track face movement. Mobile phones are becoming one of the main platforms for AR. This project focuses on the tracking of hands in mobile phone images for gesture recognition, content overlay and gaming. General purpose hand pose tracking is a complex problem but custom hardware solutions: , and complex software libraries: are available.

One of the biggest challenges is achieving high speed and reliable segmentation of the hands against real-world backgrounds and under variable lighting conditions. The next main challenge is the identification of the fingers and matching them to a hand pose model. If the hand gesture problem can be constrained this may be simplified and good performance achieved.

Solution / Goal of the Project:

This project will aim to develop a mixed reality application that will allow someone to “try on a glove” using their mobile phone. The goals of this project are to:

  • Reliably segment the hands on a mobile phone
  • Recognise and track the orientation of the hands
  • Render a 3D glove model over the live video image aligned to the hands
  • Develop a solution for finding an accurate measure of hand size
  • Develop a prototype application on iOS or Android
  • Perform user testing and formal evaluation of performance

The application will be developed using and .

Glenn Strong



ORI G.15


Many of these projects involve some knowledge of functional programming. No prior knowledge is needed before starting the projects, we provide support for learning this new programming paradigm for students who have not been previously exposed to it. Of course, if you already know a language like Haskell you'll be able to start the project a little quicker.

Functional programming and Live Music Composition

This project would involve working with the Haskell embedded DSL "Tidal" to produce one or more variants on the existing system. Some interesting options could include:
  • A Tidal server which would allow performers to publish (stream) their live performances so that users in other locations could experience them.
  • A more ambitious project would allow for users to remotely collaborate on live performances, perhaps using one of the existing collaboration frameworks (ShareDB, TogetherJS, etc)

    Drag and drop Python

    This project would involve building a structured programming editor for the Python programming language, probably using an existing framework such as Google's Blockly. While there are some tools that can generate Python from Blockly programs the source that the users work with don't tend to look much like Python programs. The goal with this project would be to provide a Python-oriented system (perhaps in the style of the old Carnegie Mellon structure editors, or the Raise Toolkit)

    An initial version of this wouldn't need to parse existing Python programs, only provide the editing environment, and perhaps capture the output of a running program. There are a lot of potential extensions possible for this project, depending on whether the student's interests.

    Other projects

    I am happy to discuss project ideas in the Functional Programming, Literate Programming, Theoretical Computer Science, or other similar areas. If you have a specific project in mind then send me an email. I am also willing to discuss software implementation projects with a bias towards rigour (using formal techniques, or design-by-contract ideas). I am also interested in creative ways to support novice programmers and in the study of Free Software related projects,

  • Lucy Hederman



    ORI G.13


    My proposed projects/areas should suit MSc CS students on the Intelligent Systems and Data Science strands. Some projects may be suitable for Final Year Projects for ICS and CSB students. To discuss, please email me at

    Broadly I am interested in "data wrangling" for health IT and clinical research purposes.

    The following projects relate to the AVERT project which is concerned with predicting relapses (or flares) of ANCA vasculitis, a relapsing and remitting rare autoimmune disease that results in rapidly progressive kidney impairment and destruction of other organs. Epidemiological data seem to show a strong environmental impact on relapse in ANCA vasculitis, though it is unclear which exactly environmental factors are responsible for this. The rapidly emerging discipline of data science - alongside massive increases in computing capability, machine learning and artificial intelligence - is poised to allow the incorporation of such highly complex health big data environments, and the generation of outputs with potential applicability in personalised medicine. We aim to integrate a wide array of unstructured data streams to define the signature of relapse of the disease. We believe this approach will represent a new paradigm in managing chronic conditions governed by interaction between patient-level factors and their environment, and could be scaled up if successful for use for other autoimmune diseases.

    Data integration for AVERT is using linked data principles. Different streams of data are combined in an RDF triple store.

    RDF modelling of data coming from the AVERT app

    For the AVERT project, patients with the disease are using an app to capture data which may help to determine patterns which lead to flares of the disease (see above). We wish to "uplift" this data into the AVERT RDF data store to connect it with clinical data, weather data and pollution data. Issues which may lead to interesting research questions are around anonymity, appropriate representation of location data, appropriate provenance metadata, etc.

    AVERT uplift workbench

    This project involves developing a suite of Eclipse plugins to support uplift of data into RDF. (Details to follow.)

    Making an air pollution model available as a service for AVERT:

    Environemntal engineering colleagues have devleoped a model of air pollution in Ireland using ArcGIS. This project involves turning that model into (1) a service that can answer queries such as "What was the level of silica at location X on date D?" and (2) a user interface to facilitate engagement with the model.

    Making clinical research project data shareable outside the project

    The AVERT project hopes to make its data available, in structured, semantically interoperable, de-identified, form, to other researchers, as part of an "information commons". This project will need to explore a broad range of technical and non-technical issues in devising a safe, useful and usable solution. How do bio-scientists work with data? How does data protection impact science? How do we ensure shared meaning of data? How do we protect patient identity? etc.

    AVERT application prototype

    The end goal of AVERT is to have a "realtime" decision support system to predict flares from realtime patient and environment data and advise clinicians on treatments. This project will develop a prototype of such a system, in anticipation of a future flare prediction model. It will combine data from mobile app, clinical records, and location based weather and pollution data (where available), feed it to a "black-box" flare model, and present the output in some form that would be useful to clinicians, and possibly patients.

    Driving AVERT app user engagement

    (Suitable for an MSc CS (Information Systems) student).

    This project seeks to use state of the art (machine learning) techniques to source and serve content of relevance for the vasculitis patient group, with a view to increasing patient engagement with the app.

    Other project

    The final project is not AVERT related, but is not dissimilar.

    Serve probability data to an infectious disease risk prediction system

    We are developing a service to predict a person’s risk of being infected by a disease, given personal details (e.g. age, occupation) and environmental factors (weather, terrain) at their location. The probabilistic prediction at the core needs a service to provide probability data (e.g. probability of being under 5 in Indonesia; current incidence of TB in Indonesia) from online sources such as WHO, UNSD. The project could be broadened to provide a range of tools over the demographic and public health data, including an interactive user interface with visualisations. It would involve researching current systems and state of the art approaches. (The narrower project might suit a final year student).

    Updated 20 September 2017

    Dr. Hitesh Tewari



    Lloyd 131


    Last Updated 30th May 2016

    Fully Anonymous Transferable E-Cash (Assigned)

    In this project we will further develop and implement an anonymous E-Cash scheme which allows for the unlimited transfer of coins within the network. The work builds upon the foundations of David Chaum's "Blind Signature Protocol"and combines it with the "Discrete Logarithm Problem" to allow for delegated signatures and anonymous transfer of coins. The system also makes use of "Blockchain" technology for the network participants to collectively verify the authenticity of coins in the system to prevent double-spending of coins.

    The student who undertakes this project will be required to familiarize themselves with the number theory aspects of the cryptographic primitives used within the FATE system. They will be required to study and enhance the existing FATE protocols. More details about the FATE system can be found in this paper. Finally, they will be required to implement a working prototype of the system.

    SSLChain - Secure Login App (Assigned)

    The basic premise of the SSLChain technology is to make use of "Blockchain" technology to store and disseminate X.509 certificates. This project is a continuation of work that was carried out in a FYP in 2015/16 whereby the student built a basic prototype of the backend using the Bitcoin distribution, and developed a cryptographic add-on for Gmail.

    In this project we would like to refactor the backend and make it more robust. We would then like to build a secure login application that allows for a one-time-password (OTP) to be sent to a user who is trying to login to a web service. This secure OTP mechanism will eliminate the need for passwords to be remembered by users for various websites that they use of a daily basis.

    The student undertaking this project will be required to have a good understanding of cryptography, and in particular the specific protocols used within the SSLChain system (X.509, Blockchain, XMPP etc.). They will quickly need to familiarize themselves with the Bitcoin distribution and the current codebase. Finally, they will be required to develop a robust prototype that we hope to trail on the College network.

    Secure Voting Protocols in the Irish Context of a Single Transferable Vote (Assigned)

    You may remember the fiasco of the electronic voting machines that the state bought in 2002 at a cost of millions of Euros, and which were finally scrapped in 2012 because they were deemed to be unsafe, as no one could guarantee the correctness of the results produced by the machines.

    Keeping that in mind in this project we would like to explore the state-of-art in secure voting protocols today, with a view of developing a system for the Irish context of a single transferable vote (STV) i.e. the proportional representation system that we have in Ireland. We would like to explore the possibility of using "blind signature" protocols, Blockchain technology, threshold cryptography etc. to try and solve the problem of electronic voting in Ireland.

    The student that takes up this project will be required to develop a solid understanding of the Irish electoral system. They will be required to study in detail the state-of-art in the area of secure voting protocols which in itself is a challenge as this will require additional study of cryptographic primitives and number theory. As a result of our understanding of the above two we will then develop a secure STV protocol, and develop mathematical proofs to backup our claims.

    Banning Cookies - What's Next? (Assigned)

    As users have become both more aware and wary of cookies - the technology that tracks browsing activity for advertising purposes many people, therefore, avoid cookies, either by turning them off or using services that block them. Companies in turn have started experimenting with new tracking methods that don't use cookies.

    One of those concepts includes "user-agent fingerprinting" a technique that allows a web site to look at the characteristics of a computer such as what plugins and software you have installed, the size of the screen, the time zone, fonts and other features of any particular machine. Another mechanism is to track users using some other persistent local/client side storage. Your browser transmits all sorts of information that has nothing to do with cookies. All of those things put together form a unique identity that can be assigned an identifying number and then used just like a cookie.

    In this project we would like to explore the various new techniques that are currently being developed or used to track users. Armed with this knowledge we would like to develop a browser add-on (similar to ad-blockers) that will prevent transmission of machine/user specific information which allows organizations to track users.

    Georgios Iosifidis

    SCSS and CONNECT Centre

    Lloyd Institute



    Project 1


    Title: Resource Orchestration in Hybrid Cloud-Fog Networks


    Background: The emerging fifth-generation (5G) wireless networks are expected to offer various mobile services such as ultra high definition video delivery, augmented reality  services, and machine learning-based applications. Due to the limited capabilities of the usersŐ devices, these demanding mobile services can only be delivered with the support of cloud computing and storage resources. The latter can be located in distant data-centers, or in proximity with the end-users e.g., in Cloudlets or even nearby mobile devices. According to Forbes and Economist [1, 2], Cloud and Fog computing solutions are attracting increasing interest as a promising and cost-efficient solution for next generation communication networks. However, these hybrid architectures induce substantial network bandwidth costs as well as very high energy consumption in data centers, especially under high-load conditions. This is currently one of the largest obstacles hampering the large-scale adoption of these promising solutions.


    Goals: In this project, we will design algorithms for jointly optimising the allocation of computation, storage and communication resources that are located at the Fog (in proximity with the devices) or the Cloud, aiming to increase the quality of the offered services and reduce the systems expenditures. We will leverage android programming tools (e.g., based on Java) to make mobile  applications and MATLAB programming to execute trace-driven large-scale simulations and data processing. In the final step, this project will analyse the quality of service of various cloud-based applications and efficiency of the resource management by performing experiments using mobile devices, local computing/storage servers and cloud platforms, e.g., Microsoft Azure [3]. 

    Student Info: This project is particularly suitable for M.Sc and MAI students interested in Cloud/Fog architectures and (i) system modeling and analytical methods (i.e., optimisation) and/or (ii) system design and performance evaluation. The student will collaborate with Dr. J. Kwak ( and Dr. G. Iosifidis (; and will have the opportunity to participate in ongoing (fast-paced) research projects and acquire important analytical and technical skills.



    [1] Economist, Shifting computer power to the cloud brings many benefits—but donŐt ignore the risks, URL:

    [2] Forbes, Is Fog computing the next big thing in Internet of Things? URL:

    [3] Microsoft Azure, URL:



    Project 2


    Title: Economics of the Internet of Things


    Background: The promise of the Internet-of-Things (IoT) is to enhance our physical world with connected and intelligent devices that can respond in real-time to environmental conditions, perform tasks with increased precision, augment human capabilities by operating in a semi-autonomous fashion, and improve resource utilization. Applications of IoT can be found in manufacturing, traffic control, energy grid, electric vehicles, environment monitoring and many other domains [1], [2]. IoT is expected to have a profound impact on our economy and society and is currently subject to intensive research in industry and academia. A particularly promising feature of IoT devices is their capability to interact with each other so as to jointly perform a task, or coordinate the execution of their missions. For example, consider a set of sensors that jointly monitor certain environmental parameters (e.g., air pollution) in a given area. The sensors might belong to the same or different business entities (e.g., different companies) and might have overlapping coverage. This enables them to cooperate and exchange measurements or support each other in case of failure, improving this way their performance and reducing their costs.


    Goals: In this exciting era of ubiquitous connectivity that extends from humans and large systems to small-scale devices, a new type of cyber-physical economy emerges offering novel opportunities for fruitful collaboration among the users and their devices. In this project we will design and evaluate algorithms that enable IoT devices to cooperate by exchanging resources (such as energy and wireless bandwidth) and jointly improve their performance. We will combine tools from dynamic optimization and game theory to develop solutions that achieve efficient equilibriums [4]. Different market scenarios will be considered, ranging from fully decentralized (peer-to-peer) to hierarchical markets where more resourceful users/devices sell their resources or services to smaller IoT nodes. The designed algorithms will be thoroughly evaluated in Matlab and/or R.


    Student Info:  This project is suitable for students interested in algorithms, optimisation, market mechanisms (game theory) and IoT business models. The student will collaborate with Dr. G. Iosifidis and CONNECT [3].



    [1] L. Atzori, A. Lera, and G. Morabito, The Internet of Things: A Survey, Elsevier Computer Networks, vol. 54, no. 15, 2010.

    [2] Cisco, Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are, White Paper, 2015; URL: 

    [3] CONNECT Centre, Pervasive Nation IoT Platform; URL:

    [4] G. Iosifidis, and L. Tassiulas, Dynamic Policies for Cooperative Networked Systems, in Proc. of ACM NetEcon Workshop, 2017, Boston, USA.


    Dr Jonathan Dukes



    Room F.27, O'Reilly Institute

    Firmware Updates for Bluetooth Low Energy Devices in the Internet of Things

    Several projects related to this topic are proposed. Traditional firmware updates for the constrained devices we find in the Internet of Things (IoT) employ a very simple approach. An update controller transmits the new firmware image in its entirety to the target device. This is wasteful of constrained target communication and memory resources.

    Alternative approaches that aim to reduce this cost are based on modular updates (updating a small part of the firmware only) and incremental updates (transmitting only the differences between the old and new versions.)

    Projects in this area will investigate both of these approaches for constrained devices that communicate using Bluetooth Low Energy (BLE).

    • [TAKEN] Incremental Updates: This project will investigate the use of incremental over-the-air firmware updates for BLE Devices. The key contributions of the project will be (i) identification of one or more appropriate representations of the binary difference between two firmware viersions, (ii) extension of an existing BLE over-the-air firmware update protocol to support incremental updates, (iii) impementation of a prototype firmware update controller and target, including (iv) impementation of an algorithm to apply the firmware update in flash memory on the target device and (v) a performance evaluation of the prototype system.
    • [TAKEN] Modular Updates: This project will investigate the potential to perform modular updates of parts of the firmware on a target device. A number of approaches will be considered, but all approaches are likely to involve linking firmware modules on the target device, representing a significant departure from the "monolithic" firmware approach widely used in industry. The contributions of the project will be (i) identification of a suitable model for developing modular firmware (granularity), (ii) extension of an existing BLE over-the-air firmware update protocol to support modular updates, (iii) impementation of a prototype firmware update controller and target, including (iv) impementation of an algorithm to apply the firmware update in flash memory on the target device and (v) a performance evaluation of the prototype system.

    Opportunities for a collaborative comparison of the prototypes developed by the above two projects will be explored, including opportunities to implement hybrid (incremental/modular) approaches.

    Remote Patient Monitoring in the Internet of Things

    Two projects are proposed in this area. Both will explore the application of emerging, standards-based Internet of Things transport protocols to remote monitoring of patients (biometric data). It will be assumed that monitoring (e.g. temperature, motion, blood pressure, heart rate, ECG, EMG) is performed by resource constrained, wearable devices.

    In both cases, communication will be based on RFC7668 (IPv6 over BLUETOOTH(R) Low Energy) and other suitable higher-level protocols, making the constrained sensor nodes addressible IPv6 devices.

    • [TAKEN] On-Demand and Event Driven Remote Patient Monitoring: This project will explore two models for communicating biometric data from the constrained sensor device to a central or cloud-based service. In the on-demand model, sensors will store a small amount of recorded data in flash memory. By implementing a simple (standard-based?) query language, central services will be able to query the stored data, with query results transmitted back to the central service. In the event-driven model, in addition to storing data, sensors will communciate events of interest (e.g. when the measured heart rate exceeds some configurable threshold.) Again, the application of a standards-based protocol (e.g. MQTT) will be explored.
    • [TAKEN] Real-Time (live) Remote Patient Monitoring: This project will explore the feasibility of transmitting real-time biometric data (e.g. ECG, EMG, motion) data from strained sensor devices to a remote central server. Communication will be over Bluetooth Low Energy initially to a router, which will forward the data to a central or cloud-based monitoring service.
    • [TAKEN] Remote Patient Monitoring using LPWAN Communication

    Last update: Friday 02 June 2017

    Dr Jeremy Jones (updated 25-Sep-17)

    F.11 O'Reilly Institute top floor

    1.      A typical bioinformatics ancient DNA analysis program (eg BWBBLE which is written in C) involves finding the location of short DNA sequences in a reference genome. The reference genome contains approximately 3x109 (billion) base pairs and the short sequences 30 - 300 base pairs. The DNA sequences are stored in text files. At first sight, the algorithms are naturally parallel. This objective of this project is to speed up the analysis by investigating a number of different approaches (i) more efficient implementation of the multi-threaded algorithms by, for example,  taking advantage of  the streaming SIMD extended instruction set (SSEx) (ii) using a compute cloud (iii) using a graphics processor and (iv) using an external FPGA. Each approach could be a separate project. This project is being carried out in consultation with the Genetics Department.

    2.      In 2013, Intel introduced the Haswell CPU which supports retricted transactional memory (RTM) through the TSX extension to its instruction set. TSX can greatly simplify the implementation of many parallel algorithms. The objective of this project is to develop a self-tuning parallel implementation of a skip list (or any other interesting data structure or algorithm) that can determine dynamically the optimum settings needed to maximise throughput. This project would particularly suit students taking CS4021.

    3.      Most of you will have used the Vivio animations as part of the CS3021/3421 Computer Architecture II module. The Vivio animations have now being re-implemented using JavaScript and HTML5 so that they are truly portable and can run in any web browser. The objective of this project is to implement a touch interface (in JavaScript and HTML5) so that the animations can be controlled on a phone or tablet where there is no mouse or mouse wheel.

    4.      Have you ever had difficulty remembering someone’s name? Imagine what it’s like being a lecturer standing in front of a large class trying to remember student names. The object of this project is to use a Google Glass equivalent and face recognition software to recognise student faces and project their names onto a heads up display in real time.

    These projects could form the basis of a final year project, year 5 dissertation or taught MSc dissertation.


    Prof. Khurshid Ahmad

      My principal area of interest is in artificial intelligence, including expert systems, natural language processing, machine learning and neural networks. My other area of interest is in the ethical issues of privacy and dignity in the realm of social computing. I have recently finished a largish EU project on the impact of social media (monitoring) in exceptional circumstance ( ¨C this brings together social media analytics (natural language processing/image analysis), geolocation, and ethics. My research has been applied to price prediction in financial markets and in forecasting election results. The projects on offer in this academic year are:

    • 1. Social Media Analytics and monitoring:

      Microbloggs and social networks are large and noisy source of data that is valuable for marketing and sales specialists, law enforcement agencies, disaster NGOs, and policy makers. This project will help you in acquiring social media data, in using natural language processing techniques for processing this data, and techniques for visualise the results of the analysis. You will be expected to include a brief discussion of questions of privacy and ownership of social media users. A proficiency in Java and/or Python is required for this project.

    • 2. Sentiment Analysis:

      This is an exciting branch of computer science that attempts to discover sentiment in written text, in speech fragments and visual image excerpts. The sentiment is extracted from streams of texts (messages on social media systems, digital news sources) and quantified for inclusion into econometric analysis or political opinion analysis systems that deal with quantitative data like prices or preferences: the aim is to predict price changes or ups and downs of a political entity. You will write a brief note on questions of divulging identities of people and places. You have an option of developing your own sentiment analysis system or use a system developed in my research group.

    • 3. Machine Learning and Big Data:

      Large data sets, for example genomic data, high-frequency trading data, meteorological data, and image data sets, pose significant challenges for curating these data sets for subsequent analysis and visualisation. Automatic categorisation systems, systems that have learnt to categorise arbitrary data sets, are in ascendance. One fast way of building such systems is to integrate components in large machine learning depositories like GoogleˇŻs TensorFlow, MATLAB, Intel Data Analytics, to build prototype systems for text, videos, or speech streams for instance. Issues of data ownership will be briefly outlined in your final year project report.

    Dr. Kenneth Dawson-Howe

    I supervise projects which have a computer vision component (and most particularly in the area of object tracking at the moment!!). To give you a feel for this type of project have a look at some previous projects. For further information (about project organisation, platform, weighting for evaluation, etc.) see this page. If you want to talk about a project or suggest one in the area of computer vision my contact details are here.

    The following are my proposals for 2017-18. If any of them catch your imagination (or if you have a project idea of your own in computer vision) come and talk to me or send me an email.

    Illustration. Project Details.
    Image linked from another site

    (AVAILABLE) CPR Assistant & Assessment.Cardiopulmonary resuscitation (CPR) is a technique used to keep oxygenated blood flowing in the human body when the heart and breathing of a person have stopped. It is a repeated combination of chest compressions (30 in a 15 second period) and artifical breaths (2 within a 5 second period), and has to be continued until other measures are taken to restore spontaneous operation of the heart and breathing. This project aims to create a mobile phone based assistant which (through analysing a video of CPR being performed) will guide the person giving CPR in the rates of compression and the rates & timings of breathing.

    Image linked from another site

    (AVAILABLE) Lobster Pot Monitoring.Believe it or not sometimes people interfere with other people's lobster pots! This project aims to build a system which monitors the bouys of a number of lobster pots and identifies when they are removed. We will need to get ethical approval to record the video.

    Image linked from another site

    (AVAILABLE) Automatic Traffic Light Recognition. Develop a system to implement automatic traffic light recognition using the LARA dataset (

    Image linked from another site

    (AVAILABLE) A tool for organising personal photographs. Many people have years of digital photographs and videos organised into folders (or not) for each event or period of time. Most of the pictures will have peculiar names which are generated by the camera. The pictures

    • may come from multiple cameras,
    • probably have almost useless names (generated by the camera),
    • could have incorrect date information (as many digital cameras easily lose the current date and time),
    • may or may not have location metadata,
    • may be out of focus
    • may have poor contrast
    • may exhibit red eye
    This project aims to create a tool to manage personal digital photos allowing photos to be grouped, renamed, identified with different levels of importance (so that short or long photo slide shows can be created), enhanced, annotated (describing what/who is pictured), etc. The scope of this project is up to the student…
    Image linked from another site

    (TAKEN) Detecting product placement OR smoking. Increasing advertising is being embedded into images and videos, and the detection of product logos has become of significant importance. Datasets of logo images are available (See and and this project aims to develop software to automatically detect product placement in videos (e.g. in movies), and perhaps develop a metric quantifying the amount of embedded advertising... Or we could look at how much smoking happens in a video...

    Prof Doug Leith

    The following projects are related to recent research in online privacy and recommender systems being carried out here in TCD, see They could form the basis of a Year 5 dissertation or of a final year project.

    Recommending WiFi/4G Access Points/Cells

    A recommender system collects information on items that a user likes/dislikes e.g. items which the user has bought, rated, viewed or clicked (such as a movie, hotel review, news item). By comparing this information with the information for other users, the system tries to predict which new items the user might like, to predict the rating which the user might give for a new item, what adverts/services are most likely to be of interest to the user etc. Such recommendations are often a core part of personalised web services. One popular approach to making recommendations (and which won the netflix prize) is based on matrix factorization, where a matrix R containing user-item ratings is approximated by product UV where the inner dimension of U and V is much smaller than the number of users or items (see reference below for more details). In this project the aim is apply this approach to recommending WiFi hot spots to users based on ratings supplied by other users plus information on their location etc. We will also investigate use of recent clustering approaches to enhance privacy, e.g. with regard to location. This project will require familiarity with matrices/linear algebra, probability and ideally python programming.
    Reference: Y.Koren, R.Bell, C.Volinsky, 2009, "Matrix Factorization Techniques for Recommender Systems",

    Pluggable Tor Transport

    The Tor browser supports use of pluggable transports to allow shaping of traffic sent over the network so as to obfuscate its nature. These transports can also be used with VPN tunnels, e.g. see Shapeshifter-dispatcher. The aim of this project is to implement a pluggable transport based on recent research here in TCD on making making traffic resistant to timing analysis attacks without the need to introduce high latency or many dummy packets. We'll take one of the existing Tor transports as a starting framework and then modify it as needed. The project will require good programming skills, but its a great chance to contribute to Tor's development and improve existing VPNs.
    Reference: Feghhi,S., Leith,D.J., "On Making Encrypted Web Traffic Resistant to Timing-Analysis Attacks",

    Privacy-Enhanced HTTPS

    Recently a number of successful attacks have been demonstrated against encrypted HTTPS web traffic. Even though the contents of packets are encrypted, the packet size and timing information is often sufficient to allow details of the web pages being browsed to be inferred with high probability. Similar approaches can also be used to successfully attack VPNs and Tor. In this project we will look at possible defences against such attacks, focussing particularly on defending against timing attacks since these are amongst the hardest to defeat. The aim will be to implement recent server-side defences exploiting the push feature in HTTP/2 on either apache or node.js. The project will provide a good opportunity to learn about the next generation of web technology http/2. It will require good programming skills.
    Reference: Feghhi,S., Leith,D.J., 2015, "A Web Traffic Analysis Attack Using Only Timing Information", Technical Report,

    Mobile Handset Anomaly Detection

    Mobile handsets are largely black boxes to users, with little visibility or transparency available as to how apps are communicating with the internet, trackers etc. Handsets are also potentially compromised devices in view of the relatively weak security around apps, and so monitoring activity in a reliable way is important. This project aims to carry out a measurement study to record actual mobile phone network activity with a view to making this more visible to users (via a dashboard) and highlighting anomalies/potentially interesting activity. By routing traffic through a VPN we can log traffic externally to the phone in a straightforward way, so the main challenges are (i) organising a study to collect data for multiple users, (ii) developing a usable dashboard for them to inspect their own traffic and (iii) exploring machine learning methods for classifying traffic and anomaly detection.


    The project will involve implementing a peer-to-peer system for web cookie sharing/management. Our online "identity" is largely managed via cookies set by web sites that we visit, and these are also used to track browsing history etc. With this in mind, recently a number of approaches have been proposed for disrupting unwanted tracking by sharing and otherwise managing the cookies presented publicly to the web sites that we visit, the aim being to allow personalisation to still occur (so simply deleting cookies is not a solution) but avoiding fine-grained tracking of individual user activity. The project will involve implementing a browser plugin for cookie management and selection plus a service for cookie sharing. The project will require good programming skills.

    Wireless Augmented RealityTAKEN

    There is currently much interest in adding augmented reality to mobile handsets. Augmented reality systems typically involve heavy computational burdens e.g. using deep learning to tag objects in the scene currently being viewed. This computation is offloaded to a server or the cloud. Currently most augmented reality systems are tethered i.e. connected to this server via wires, in order to ensure that there exists a high bandwidth low delay connection between mobile handset and server. In this project we will investigate the impact of replacing this wired link with a wireless WiFi/LTE link and using local processing within the mobile handset to mask network impairments (latency, loss). The project will involve android app development, image processing and simple socket/WebRTC programming.

    Dr. Kris McGlinn

    3D Building Information Modeling with Googles Tango

    Googles project tango is a tool for smart phones and tablets for creating 3D models of buildings (or apartments) by simply carrying the device around! In this project you will explore the use of tango to develop 3D models, and examine methods for annotating and linking these models to existing building information models standard like IFC (Industry Foundation Classes). You will examine how objects are identified and labeled using the tango sdk, you will see if you can tag those objects and export them as IFC entities. You will see whether walls and other objects can be given additional properties, like materials and thickness. This integrated data can then be used for different applications, ranging from navigation too energy simulations.

    Dr Richard Millwood

    Room Lloyd Institute 0.29
    Extension 1548

    Keywords: education - programming - HCI - emulation - learning design

    My background is in education and technology, I am course director for the MSc Technology and Learning. You can read more about me at and you can also have a look at my recently completed PhD by Practice.

    Here are some areas of interest which may inspire a project, some based on my research plan:

    1. Learning computer programming

    Two ideas here:

    1. Developing in Blockly to meet some of the challenges made in my blog on Jigsaw Programming
    2. Constructing an online research instrument for tapping in to teachers' tacit knowledge about teaching computational thinking. This may be an extension using Python to an existing content management system such as Plone to add functionality for innovative interactive forms of survey.

    2. Collaborative support in learning computer programming

    This is directed at 'educational github' to suit young learners. The development will create an interface that better clarifies and support the roles and workflow in collaborative work online so that this can be more readily learnt in use. It is not clear exactly what software development would be appropriate and would suit someone with imagination and drive to be very creative.

    3. UK National Archive of Educational Computing apps

    The design and development of device-responsive educational apps (for mobile, tablet and web) based on historical educational programs, such as Snooker:

    1. Original Snooker - from 1978 in BASIC
    2. Prototype Snooker Angles - from 2013 as iPhone Web app

    Key features are that the app includes a faithful emulation of the original educational program as a historical account, and that the modern app maintains similar educational objectives but may be updated to take advantage of new technology and new pedagogy. The app must be able to scale appropriately and work on phone, tablet and web page. This is an HTML5 development project using Scaleable Vector Graphics for interactive visuals.

    I have a list of apps that I have prioritised to support the the UK National Archive of Educational Computing.

    Dr Martin Emms



    O'Reilly LG.18


    I would be interested in supervised FYPs which centre around applying computational techniques to language.

    Machine Learning and Word Meanings

    An interesting question is the extent to which a machine can learn things about word meanings just from lots of textual examples, and there has been a lot of research into this (see Wikipedia intro), all based on the idea that different meanings show up in rather different contexts eg.

    move the mouse till the cursor ...

    dissect the mouse and extract its DNA ...

    Several kinds of project could be attempted in this area, starting either from scratch or building on code that I could supply.

    1. One kind of system would do what people call unsupervised word sense disambiguation, learning to partition occurrences of an ambiguous word into subsets all of which exhibit the same meaning.

      Someone last year added the diachronic twist of attempting to recognise, from time-stamped text, that semantic matters concerning that word have undergone a change over a period of time: mouse (1980s) and smashed it (last 10 years?) have acquired novel meanings.

    2. Another possibility is to investigate to what extent it is possible to recognise that a particular word combination has a non-compositional meaning, that is a meaning not entirely expectable given its parts, for example that shoot the breeze means chat

    There are a number of corpora that can be used to drive such systems, such as the Google n-grams corpus, spanning several hundred years (viewable online here, also available off-line)


    I have interests in (and sometime knowledge in!) a number of areas in which a project would be possible

    Projects Exploiting Treebanks
    We have copies of several so-called treebanks, which are large collections of syntactically analysed English. Each treebank contains a large number of items of the following kind


    One issue that such corpora allow the empirical exploration of is whether or not Multiple Centre Embedding occurs in English (left, right and centre embedding is illustrated below):

    [S I think [S he said [S I'm deaf]]] right embedded
    [NP$_{gen}$ [NP$_{gen}$ john's] father's] dog left embedded
    [NP the candidate that [NP the elector that I bribed] chose] centre embedded

    Tree distance
    Ways to calculate the difference between two trees, and things you might do with that (a paper)

    Lambek Calculus Categorial grammar
    for anyone keen on Logic, a logic inspired way to do grammar

    Continuations of projects from 15-16

    Virtual Erasmus
    Erasmus students spend a year of their study in a distant land. The idea is to simulate as much as possible of that process into a program that outgoing students could interact with, probably telescoping a year out into a few weeks. One part would be to emulate the bureaucratic obstacle course that such a year entails: you want to open a bank account, is it open Tuesday afternoons, have you got your residence permit, you want your residence permit, have you got your insurance, have you got another 3 passport photos ....

    There is the possibility of continuing a development of this from last year

    Someone wrote an interactive scrabble game, with computer opponent. This could be reworked and developed further.

    DOM Projects

    Dr. Donal O'Mahony



    Dunlop-Oriel House


    My projects (for 2017-2018 ) will center on themes related to networking, computer security and sustainability.

    Using the Blockchain as a general purpose identity mechanism

    For many years now, users have used Public Key encryption to assert their identity on the net. A vital support tool for this has been Public Key Infrastructure (PKI) which allows people to link their 'identities' to their public key. One of the main thinkgs holding back digital identity is the difficulty in managing this public key infrastructure: establishing the link in the first place, dealing with lost or stolen keys etc. 
    Blockchain technology has potential to vastly improve on this - opening up a new digital identity system that can not only be used for electronic payment and participation in smart contracts, but can open up a whole new world of trusted network interaction. This project will investigate how this might be done on the Ethereum blockchain. It will draw lessons from the experience in PKI and also the work of startup companies in this space (e.g.  as well as previous TCD projects in this area.  This year, one major focus will be on investigating the possibility of making a blockchain system that is in some ways PKI-compatible.

    Electricity Trading Between Smart Nano-Grids using Blockchain

    Today, we draw all of our electrical power from the electicity grid.  This is largely powered by fossil fuels and distributes power to consumers 24/7 based on a fixed price per kWh.  In an energy constrained future, it is more likely that energy will be generated in a distributed fashion from renewable sources such as wind and solar with limited amounts of storage (batteries) to help even out the peaks and valleys of supply and demand.  Managing the energy flows in such an environment will be challenging.  This project will invove developing computer processes that will control small inter-connected sub-sets of the grid known as Nano-Grids.  These will be implemented as Python programs running on the popular Raspberry Pi  computing device.    You will develop processes that communicate with each other in real-time over sockets/wi-fi.  
    Since the primary way of communicating scarcity and surplus will be through so-called 'price-signals' - a real-time payment method will be key to the process.  For this we willl investigate an idea known as state-channels - which allow micropayments to be made from node to node with settlement on the blockchain.  This part of the project will investigate the ideas from the bitcoin lightening network and the Ethereum Raiden network, with possibly a look at newer systems like plasma.  Ideally, the project will develop a prototype state-channel system to implement payment between the interacting nano-grids.

    Big Data Analysis of the Blockchain

    [Note: These projects are formulated specifically for students of the new MSc Computer Science (Data Science) ]

    All blockchains since Bitcoin have maintained their 'state' in an ever-growing chain of blocks that are cryptographically linked. This is a historical record of every single transaction that has taken place within the system since inception.  This is a large amount of data - the Bitcoin blockchain (August, 2017) is 154GB and Ethereum is 122.23 GB.
    I will supervise three projects that will treat this information as a Big Data set and attempt to analyse it to address questions such as:
    • determine traffic patterns e.g. distinguish exchange transactions from individual transactions
    • focus on frustrating money-laundering activites e.g. examine clusters of activity around sites like shapeshift and other coin mixers.
    • identify fraudulent activity e.g frontrunning by exchanges during Initial Coin Offerings(ICOs); tracing criminal acvity such as extorsion
    • Tapping the power of Events - these are unique to Ethereum and may offer great insights into blockchain activity

    The first part of this project will be to source or create data sets representing blockchain acvitiy on Amazobn Web services - this may be done collaboratively by all students involved in these projects.  The 2nd step is devising data analytic approaches that can deliver insights within a sensible computing horizon.

    Dr. Eamonn O Nuallain



    WR 13.3.1


    Radio Tomography, RF Coverage Prediction and Channel Modelling in Wireless Networks My postgraduate students and I research Radio Tomography (seeing through things with Radio Frequency (RF)), RF propagation and wireless channel modelling. The objectives of this research are to enable the rapid and accurate prediction of RF Coverage, frequency domain response, data throughput and interference mitigation in wireless networks - in particular as it relates to Cognitive Radio and MANETS. We are also interested in intelligent handoff algorithms. Such information is of great interest to wireless network planners and regulators. We are concerned with both fixed and ad-hoc networks. The methodology we employ is largely computational mathematics, computational electromagnetics, RF propagation, and mathematics. We research, develop and code numerical techniques to acheive our objectives. We test our results against field-measurements. We are also very interested in exploiting the capabilities of paralell computing and software radio. Most of our programming work is done in C, C++ and Matlab and using the NS3 simulator. These projects are available either as Final Year Projects or Masters dissertations. You should be interested in mathematics, ad-hoc networks and RF. If you are interested in pursuing a project in this area then contact me by e-mail and we can meet to discuss.

    Declan O'Sullivan

    Prof. Declan O'Sullivan

    I am generally interested in projects in the area of data/information integration.

    I also have an interest in the ethics of the use of technology in digital content technology. (see

    Typically the projects I supervise use techniques such as linked data, semantic web, semantic mapping, and XML-based technologies such as RDF, SPARQL, OWL and SPIN.

    There are also opportunities to work on aspects of research ongoing within the Science Foundation Ireland ADAPT centre focusing on Digital Content Technology, and ongoing digital projects with the TCD Library and Ordnance Survey Ireland .

    Professor Carol O'Sullivan

    Updated: 04/10/2016

    I am interested in supervising projects related to physical interaction and response in virtual and augmented reality. If you have an idea for another related project, we can also discuss that.

    Virtual Reality (VR) involves immersing yourself in a virtual world, perhaps wearing a head-mounted device such as the Oculus Rift or the Microsoft HTC Vive. It can also involve a large projected environment or even a normal desktop monitor.

    Augmented Reality (AR) usually involves remaining fully or partially in the real environment, but augmenting that real environment with virtual objects, characters and other graphical elements. These virtual augmentations can be delivered through a device such as the new Microsoft Hololens or similar headset. It need not, however, involve wearing any glasses or device, and the virtual objects can be projected onto the real world surfaces.

    I am interested in interactions between the user, objects and the environment in such systems, and in particular in ensuring that the physics of objects is plausible and/or accurate. Therefore, I am proposing several project areas to explore these issues further. You will need to have a reasonable understanding of computer graphics and/or computer vision in order to undertake one of these projects. If you are interested, and have the necessary experience, please email me and we can meet to discuss.

    Capturing the physics of real objects

    Check out this paper for an idea of what might be involved. (Image from the paper)

    Interactions with real objects

    How do you drop or throw a virtual object in AR or VR so that the physics are believable or accurate? (Image from Ironman movie)

    Interactions with real environments

    How do you create the illusion of physical objects interacting with a real environment and the user? (Photo of Goofy's playhouse in Disneyland Tokyo. Checkout the video here)

    Interactions with characters

    How do you create the illusion of virtual characters physically interacting with a real environment? (Image from MotionBuilder, though we may not necessarily use this system...)

    Prof Owen Conlan

    Location: O'Reilly Institute, Room F.29 Phone: +353-1-8962158

    On-Mobile Privacy-sensitive Personalisation

    Available Current personalisation techniques, e.g. the tailoring of content to an individual user's preferences, rely heavily on server solutions that require potentially sensitive information about the user to be stored remotely. With the advent of more powerful mobile devices the potential to achieve high degrees of personalisation, using existing approaches, on the mobile device is significant. This project will explore the design and development of a personalisation framework that is deployed on device and does not share user model information with third parties.

    In the first instance, contact Prof Owen Conlan

    Augmented Video Search

    Available Searching for content within videos is difficult. Current techniques rely heavily on author-created metadata to discover the video as a whole, but there few solutions to searching within the video. This project will explore how off-the-shelf multimodal (i.e. image analysis, audio feature detection, speech-to-text) techniques may be used to support search within a video.

    In the first instance, contact Prof Owen Conlan

    Visual Search

    Available Modern internet based search prizes precision over recall, striving to present users with a select few relevant resources. Users can quickly examine these resources to determine if they meet their needs. There are other situations, such as patent search or performing research on Medieval corpora, where recall, i.e. retrieving all relevant documents, is essential. This project will examine visual techniques to support users in determining and refining recall in a search environment. The project builds on over 10 years of Personalisation, Entity-based Search and Visualisation work surrounding the 1641 Depositions and more recent work on the 1916 Pension Statements.

    In the first instance, contact Prof Owen Conlan

    Supporting the construction of Visual Narratives

    Available Research in narrative visualisations or visual narratives has been growing in popularity in the Information Visualisation domain and in online journalism. However there is limited support offered to authors in constructing visual narratives, specifically non-technical authors.

    This project will aim to advance the state of the art in visual narrative construction by supporting authors to build visual narratives, namely the visualisation in the narrative including automatic sequencing between the visualisations.

    In the first instance, contact Dr Bilal Yousuf


    Available The ethical implications of modern digital applications is growing as they encroach on more and more aspects of our daily lives. However the techniques available for analysing such ethical implications struggle to keep up with the pace of innovation in digital businesses, and tend to require the mediation of a trained ethicist. The Ethics Canvas is a simple tool to enable application development teams to brainstorm the ethical implications of their designs, without oversight of a trained analysts. It is inspired by Alex Osterwalder’s Business Model Canvas which is now very widely used in digital business formation. The Ethics Canvas exists both as a paper-based layout and as a responsive web application (see Currently the online version can only be used by individuals, but cannot be used in the collaborative mode that is a key benefit of paper version. This project will extend the ethic canvas implementation to support remote collaborative editing of the canvas. User should be able to form teams and then review, make changes, comment on discuss, accept/reject changes and track/resolve issues. Further, the digital application development community could benefit from sharing previous ethical analyses using the online ethics canvas. The benefit of such sharing would be magnified if it led to a convergence in the concepts used in different canvas analyses. Therefore the project will allow teams to publish their canvas into a public repository and to annotate its content with tags from a shared structured folksonomy, i.e. a community-formed ontology capturing concepts such as different types of users, user groups, personal data, data analyses, sensor data, and risks. Within an individual canvas, tags can be used to link entries in different boxes to provide more structure to the canvas. The aggregation of tags from different completed canvases forms a folksonomy that can be made available as an open live linked-data data set and searchable by ethics canvas users."

    In the first instance, contact Prof Owen Conlan

    Dr. Rob Brennan

    Senior Research Fellow, ADAPT Centre, School of Computer Science and Statistics.
    My projects are in the areas of data quality, data governance and data value with an emphasis on graph-based linked data or semantic web systems. Please note that I am unlikely to supervise your own project ideas due to current commitments. These projects are only for MSc students.


    1. Extracting Data Governance information and actions from Slack chat channels

    Main contact: Dr Alfredo Maldonado(address corrected on 26/9/2017)
    Data governance means controling, optimising and recording the flow of data in an organisation. In the past data governance systems have focused on formal, centralised authority and control but new forms of enterprise communication like Slack need to be leveraged to make data governance more streamlined and easier to interact with. However systems like Slack produce vast amounts of unstructured data that are hard to search or process, especially months or years later. Thus we need a way to extract the most relevant conversations in Slack and turn them into structured data or requests for specific data governance actions like a change in a data sharing policy. This project looks at ways to extract relevant conversations and turn them into data governance actions via an interactive Slack bot that uses machine learning and natural language processing to identify relevant conversations and then interjects in Slack conversations to prompt users to interact with a data governance system.
    This project is conducted in collaboration with Collibra Inc., a world-leading provider of data governance systems.
    Keywords: Natural Language Processing, Machine Learning, Python, Data Governance

    2. Automated Collection and Classification of Data Value Web Content

    Main contact: Dr Rob Brennan
    Jointly supervised with: Prof. Seamus Lawless
    This research aims to automate the collection and classification of discussions of data value (e.g. "How much is your data worth?", "Data is the new Oil!") on sites like Gartner or This will compliment our traditional survey of academic papers discussing data value managment. The project will attempt to identify from the web content: the most important dimensions of data value (eg data quality), metrics for measuring them, the different models of data value proposed by authors and applications of data value models.The research will explore new ways to classify and conceptualise the domain of data value. Ranking dimensions for importance is also an interesting potential challenge. the project may also consider how to best structure the conceptualisation of the domain for different roles or types of consumers.
    Keywords: Information Retrieval, Natural Language Processing, Knowledge and Data Engineering

    3. Adding W3C Linked Data Support to Open Source Database Profiling Application

    Main contact: Dr Rob Brennan
    Jointly supervised with: Dr. Judie Attard
    The Data Warehousing Institute has estimated that data quality problems currently cost US businesses more than $600 billion per year. Everywhere we see the rise in importance of data and the analytics based upon it. This project will extend open source tools with support for new types of web data (the W3C’s Linked Data) and sharing or integrating tool execution reports over the web.
    Data profiling is an important step in data preparation, integration and quality management. It is basically a first look at a dataset or database to gather statistics on the distributions and shapes of data values. This project will add support for the W3C’s Linked Data technology to an open source data profiling tool. In addition to providing traditional reports and visualisations we want the tool to be able to export the data profile statistics it collects using the W3C’s data quality vocabulary, and data catalog vocabulary. These vocabularies allows a tool to write a profile report as Linked Data and hence share the results with other data governance tools in a toolchain. This will be an opportunity to extend the use of this vocabulary beyond pure linked data use cases to include enterprise data sources such as relational databases.
    Keywords: Knowledge and Data Engineering, Java programming, Linked Data, Data Quality

    4. Ethical Web Data Integration

    Main contact: Dr Rob Brennan
    Jointly supervised with: Prof. Declan O'Sullivan
    In an era of Big Data and ever more pervasive dataset collection and combinationhow do we know the risks and if we are doing the right thing? This project will investigate the characteristics and requirements for an ethical data integration process. It will examine how ADAPT's semantic models of the GDPR consent process can be leveraged to inform ethical decision-making and design as part of the data integration process. This work will extend the ADAPT M-Gov mapping framework.
    Ethics, Knowledge and Data Engineering, Java programming

    5. Automatic Identification of the Domain of a Linked Open Data Dataset (New 25/9/2017)

    Main contact: Dr Rob Brennan
    Jointly supervised with: Dr Jeremy Debattista
    As the Web of Data grows, there are more and more datasets that are becoming available on the web [1]. One important challenge in selecting and managing these datasets is to identify the domain (topic area, scope) of a dataset. Typically a dataset aggregator (such as will mandate that minimal dataset metadata is registered along with the dataset but this is often insufficient for dataset selection or classification (such as the dataset types used by the LOD cloud).
    The aim of this dissertation topic is to create a process and tools to automatically identify the topical domain of a dataset (using metadata, querying the dataset vocabularies and clustering using ML algorithms). Thus it will go beyond traditional Semantic Web/Linked Data techniques by using a combination of ontology reasoning or queries and machine-learning approaches. Given an input dataset from, LODlaudromat or the weekly dynamic linked data crawl (, the datasets should be categorised in a specific topical domain so that consumers can filter this large network according to their needs.
    Keywords: Knowledge and Data Engineering, Machine Learning
    Further Reading

    6. Automated Selection of Comparable Web Datasets for Quality Assurance (New 25/9/2017)

    Main contact: Dr Rob Brennan
    Jointly supervised with: Dr Jeremy Debattista
    Many open Linked Data datasets suffer from poor quality and this limits their uptake and utility. There are a now number of linked data quality frameworks, eg Luzzu[1], designed to address the need for data quality assessment and publication of quality metadata. However in order to apply some quality measures, e.g. "Completeness Quality"[2], it is necessary to have a comparable dataset to test against. For example, the comparable dataset could form a Gold Standard or benchmark which can be used to compare with other similar data.
    This project will investigate the methods required to (1) identify the requirements for a comparable dataset based on a specific set of quality checks and a dataset to be tested, and (2) then use these requirements to find the best possible dataset to act as a Gold Standard from a pool of open datasets such as Example requirements may include matching the domain, ontology language, presence of specific axiom types, ontology size, ontology structure, data instances present and so on.
    Keywords:Knowledge and Data Engineering, Data Quality

    Dr. Marco Ruffini

    Final year projects:

    End-to-end capacity reservation in Software Defined Networks

    Software Defined Networks have revolutioned computer networks, by introducing a means to enhance network programmability through the use of standardised and open access interfaces. The aim of this project is to implement an end-to-end capacity reservation mechanism across aggregation and core networks based on the use of stacked Multi-Protocol Label Switching (MPLS) labels. User requests are forwarded to a centralised controller that takes into account available capacity to allocate the requested capacity over an end-to-end link. A background in network programming and python programming language is strongly advised

    Prof. Siobhan Clarke



    Lloyd 1.17


    I am interested in software systems that make cities smarter! Below are some examples that I am co-supervising. If you have any ideas of your own for smart cities - e.g., smart transport, smart energy management, smart water management, do please contact me, as I am happy to supervise projects in this area.

    Co-Supervised with Dr. Mauro Dragone

    HeatMap App (Participatory version)

    The goal of this project is to build an Android application that can be used to assess the number of users present at the entrance of museums, shopping malls, in buses and around bus stops, art exhibitions, car parks, or any other public, shared places where people occasionally congregate and/or queue. To this end, the student will build a solution using one of the available frameworks for peer-to-peer communication between multiple handsets [1].

    HeatMap App (Vision version):

    The goal of this project is to build a system that is able to estimate the length  of the queue of visitors waiting to enter a museum, art exhibition or other place of public interest,  such as the Old Library and the Book of Kells Exhibition in Trinity College. The student will use a Galileo single-board computer and a pan & tilt camera, and will develop a computer vision algorithm using the OpenCV library [2] to segment, track and count people in the queue. There is also scope to develop adaptive solutions to account for different visibility conditions, and to build an Android application.

    Bus Tracker:

    The goal of this project is to build an Android application to infer and gather useful knowledge about the travel habits of users carrying smart mobile phones. Specifically, the target application should be able to recognize which public transport route (e.g. train, bus, LUAS), and between which stops  the user is currently traveling. The student will use current publish/subscribe and middleware for location-aware applications, such as [3][4][5], and ]investigate the adoption of machine learning techniques, such as neural networks, to classify routes based on the analysis of  streams of noisy sensor data.

    Extension of Funf:

    The Funf Open Sensing Framework [6] is an extensible sensing and data processing framework for Android mobile devices. The core concept is to provide an open source, reusable set of functionalities, enabling the collection, uploading, and configuration of a wide range of data signals accessible via mobile phones. The goal of this project is to extend Funf with support for peer-to-peer communication between multiple handsets, in order to enable the coordination of the efforts of multiple users involved in participatory sensing campaigns.

    Urban GeoLocation:

    The goal of this project is to assess and improve the ability to locate users carrying smart mobile phones while driving, cycling, or simply walking along urban pathways. In particular, the student will tackle the problems suffered by GPS-based location in urban environments, where the signals from the positioning satellites are often blocked or bounced off buildings and other structures. Contrary to existing approaches which try to explicitly account for these  phenomena, the student will assess the benefits of using multiple sensor data and the feedback gathered from multiple users over time, to build solutions that are able to exploit the power of the crowd to acquire complex models and improve their accuracy over time. The work will require the student to familiarise themselves with Particle Filter [7] as the overall framework that is likely to be used to integrate the various components of this project.


    The goal of this project is to develop an Android application using the sensordrone kit [8]. Sensordrone is a modular device the size of a key-chain, equipped with temperature, luminosity, 3-axis accelerometer and air-quality sensors. The device can be paired with the users' mobile phone over low energy bluetooth. A number of useful applications may be built by exploiting the combination of sensors available on the Sensordrone and the sensors and the geolocation functions available on the user's smart phone. Of particular interest are applications targeting:

    • Road quality information - is the road deteriorating in specific locations? E.g. early pothole formation identification
    • Bike scheme monitoring - real time info. on where and when the cycle fleet is being used and what the cycles are encountering.
    • Map urban pollution data information - noxious gases, noise, temperature.
    • Cyclist routing - using information on pollution, journey times for bikes, stats on areas where cyclists swerve or brake suddenly.
    • Localised weather alerts for cyclists (and potentially data collection on the device)

    Smart Home Projects:

    Project ideas are also welcome for projects addressing the development of smart home services and their integration within city-wide participatory sensing frameworks [9]. The student will be required to develop software prototypes for the OpenHAB open source software platform for home automation [10]. A range of hardware is available for these projects, including a  single board computer and home automation sensors and actuators, such as occupancy sensors, energy monitors and wireless switches.

    Links to relevant technologies and further readings:

    [1] Peer-to-peer frameworks for Android:,,,
    [2] OpenCV: http://www,
    [3] MQTT:
    [4] OwnTracks:
    [5] Google Pay services for Android developers:
    [6] Funf:
    [7] Particle Filter: ,
    [8] SensorDrone:
    [9] CityWatch:
    [10] OpenHAB:

    Co-Supervised with Dr. Ivana Dusparic

    Smart energy grid: Intelligent Residential Demand Response

    The European Union's 2050 roadmap is resulting in the increasing penetration of renewable energy sources and electric vehicles (EVs) in Europe. In Ireland, it is expected that 80% of electricity will come from renewable sources by 2050, and 60% of new cars sold in 2050 will be electric. As a consequence, the electrical energy grid is facing significant changes in the supply of resources as well as changes in the type, scale, and patterns of residential user demand.

    In order to optimize residential energy usage, demand response (DR) techniques are being investigated to shift device usage to the periods of low demand and to the periods of high renewable energy availability. DR refers to modification of end-user energy consumption with respect to their originally predicted consumption patterns.

    This project will investigate use of intelligent learning-based techniques in implementation of large-scale DR aggregation techniques suitable for residential customers. Some of the aspects to be addressed within the scope of the project include: household energy use learning and prediction (as enabled by e.g., smart meters or smart heating devices like Nest and Climote), evaluation of centralized vs decentralized DR approaches, responsiveness of techniques to different usage patterns and different renewable energy generation patterns, types of devices most suitable for DR programmes (e.g., heating, EVs) etc.

    Smart energy grid: Home energy usage prediction and optimization based on sensor data

    This project will investigate how can home energy usage be learnt, predicted and optimized. Patterns of energy use can be learnt and predicted based on historical occupants' behaviours (e.g., learning that the user generally leaves for work at 8:15am, plays football after work on Wednesdays, goes out straight after work on Fridays etc), combined with various sensors and data sources to provide more accurate amended predictions (e.g., mobile phone calendar, GPS location of the user, level of battery charge in electric vehicle, outside temperature etc). Use of learning and intelligent agent techniques will be investigated and applied to learning the observed patterns and establishing demands and constraints on user device usage (e.g., the duration of charging an electric vehicle will require based on the duration of daily trip, the time heating needs to be turned on to achieve optimal temperature by user arrival time, estimated time why which hot water is required for the shower etc). Multi-objective optimization techniques will then be applied to schedule required device usage so as to satisfying their use requirements and constraints as well as desired policies set by users (e.g., minimize energy price, maximize use of renewable energy etc).

    Untitled Document

    Prof. Seamus Lawless

    I supervise projects in the areas of Information Retrieval, Personalisation and Digital Humanities. The common focus of these projects is the application of technology to support enhanced, personalised access to knowledge. If you would like to talk about a project or suggest one in these areas, email me at

    Project Details

    Information Retrieval and Web Search

    Personalised Talent Search on LinkedIn
    This research will aim to design and implement complex personalized search approaches for the purpose of helping employers to locate individuals with desirable talents on LinkedIn. This research will be based on a preliminary study that has explored and validated the effectiveness of the use of machine learning techniques in personalized talent search. The focus of the research will be to further explore and implement more complex and effective machine learning methods for delivering an expertise search experience.

    Triple Linking from Unstructured Text
    Identifying the people, places, events and dates mentioned in news articles, blog posts and other unstructured text on the web is a difficult task. Linking these entities using Linked Open Data presents a further challenge. This project will investigate the generation of “Triples” from online text content. The input to this process will be a webpage or document, and the output should be a list of DBpedia triples which describe the entities that are mentioned in the text, in addition to a confidence score. There are some existing tools to help like openIE which extracts triples from unstructured text, and entity linking tools (like Tagme, Spotlight or Nordlys) can be used for mapping entities to DBpedia.

    The Search for the Searcher: User Identification using Web Search Log Mining
    When making decisions about how best to support users in web search, it is extremely helpful if we can determine who they are, why they are searching and how they interact with the search site. This type of information can be difficult to obtain, particularly when users are anonymous. Server log files contain a huge amount of information about user actions: what pages they viewed; what queries they issued; how much time elapsed between interactions with the site. But there is a deeper meaning behind this data. Buried within the contents of a log file is a rough outline of who each user is. Are we watching a novice investigator gradually learn about a topic? Do we see them refine the precision of their question based on new knowledge? Are we witnessing an expert scholar, rapidly issuing specific, targeted queries as they gather sources for their research? Did they find an answer to their question? If so, how much effort did it take for them to complete their search? How does their behaviour change depending on what the site presents? How does the relevance change as the user expertise and interactions change? This project will investigate the mining of a large search log with the aim of deriving information about the users' interests and levels of expertise. A number of search logs will be made available to the student as part of this project.

    Word Embeddings for Improved Search in Digital Humanities
    If I gave you a document that was written entirely in Japanese Kanji and asked you to tell me which symbols had a similar meanings, how would you do it (assuming you cannot read Kanji)? Even for a human, finding a realistic answer to this question is extremely difficult. Yet this question reflects a fundamental problem with how computers perceive texts. Unless a human annotator provides some form of descriptive mark-up, a computer simply does not understand the meaning behind the text it curates. Word embeddings are a recent development in the text analysis community. By applying a family of algorithms collectively known as Word2Vec a computer is able to examine a large collection of documents and derive relationships between words based solely on their contextual usage (e.g. the word "King" has some strong association with the word "Queen”. Also, the vectors produced are additive and subtractive - by subtracting "Man" from "King" and adding "Woman" to the result, we will obtain a vector which is extremely close to the the vector for "Queen”). This Masters project aims to investigate the use of word embeddings in supporting better search and exploration of a collection of 17th century historical documents. This may involve generating suggestions for alternative query formulations in a search interface. In more advanced terms, we may seek to build a retrieval model based on the word vectors generated by Word2Vec.

    Data Analysis and Data Science

    #ITK #DoneDeal #FakeNews
    The summer Transfer Window is a busy time for Football clubs, supporters and the media. This is reflected in the volume of activity on Social Media platforms such as Twitter. Clubs are continually linked with signing and releasing players, rumours circulate that clubs are interested in particular players and making moves to recruit them. A very small fraction of these rumours actually come to pass. There are lots of Twitter accounts which claim to be #ITK - "In The Know", is this actually the case? I am interested in collecting a large Twitter dataset related to the Summer Transfer Window, particularly focused on the English Premier League. I would like to apply Machine Learning techniques to that dataset to look for patterns in the tweets and to preform an analysis of the preformance of certain accounts in predicting transfers.

    Credibility and Trust

    Perception of Trust in Search Result Presentation
    When searching for content online, trust in the impartiality of the search algorithm and the ranking and presentation of results is of paramount importance. Different users display varying levels of trust, and this trust can be impacted by the design of a search interface and the visual aspects of results presentation. This project will investigate user characteristics and the design and common features of search interfaces which impact users' perceived level of trust. An evaluation will be conducted on a collection of well-known search sites. Each site will be 'distorted' by removing or adding visual features of the website design. Users will then be asked to rate their trust of each website. An in-depth statistical analysis of the results will be performed.

    Kicking them while they're down
    Previous research has been conducted into bias in the images used in news. It was shown that certain newspapers were prone to using unflattering images of politicians whose position they opposed, and more flattering images of politicians they supported or had endorsed [1]. We propose updating and extending this study. We aim to examine news media's likelihood of using good or bad images based on individual politicians' popularity ratings. The intuition is that it is more probable that news media will use unflattering images of politicians whenever they have a 'bad week'. It would also be interesting to look at the use of images of politicians in relation to specific political issues, for example, images of An Taoiseach Leo Varadker used in articles related to the "Repeal the 8th" when compared to pictures of Mr. Varadker on news articles related to jobs announcements.

    [1] Barrett, A. W., and Barrington, L. W. (2005). Bias in newspaper photograph selection. Political Research Quarterly, 58(4), 609–618.

    Imagery in Political Propaganda
    Previous research has been conducted into bias in the images used in news. It was shown that certain newspapers were prone to using unflattering images of politicians whose position they opposed, and more flattering images of politicians they supported or had endorsed [1]. However, limited research has been conducted into the imagery that political parties themselves use in the content they generate. We propose a study into the images used in election material to investigate which political parties used the least flattering images of their opoosition party members, for political purposes. This study does not need to be limited to Ireland, however, an archive of election material from GE16 does exist, including leaflets related to contentious issues such as Irish Water [2].

    [1] Barrett, A. W., and Barrington, L. W. (2005). Bias in newspaper photograph selection. Political Research Quarterly, 58(4), 609–618.


    Credibility in Graphical Presentation of News Content
    This project will investigate the credibility of graphics used in news media content to assess if there is a pattern of Accuracy / Bias / Trust / Credibility / Fairness etc. Previous research has demonstrated that it is possible to design and communicate misleading graphics and other visual representations of statistical information. It has also been shown that bias exists in the content of news articles, what about the information news media provide us in other formats? This project will examine a number of news items which have been covered in the media to a significant extent. e.g. Ireland's Bailout etc. We will identify articles which have used graphics to summarise and explain the issue. The researcher will then create, as much as is possible, an independent and accurate depiction of the same information using current information presentation guidelines. A comparison study can then be conducted to measure peoples' opinion of each representation with relation to any/all of Accuracy / Bias / Trust / Credibility / Fairness etc. The intuition is that certain news media sources are more likely to use biased graphics than others.

    2017/2018 FYP/MSc project topics for Stephen Farrell

    If interested send mail

    Detailed project scope and goals can be adjusted to fit student skills and the level of effort available.

    1. Implement and test a proof-of-concept for MPLS opportunistic security

      Multi-Protocol Label Switching (MPLS) is a sort-of layer 2.5 that carries a lot of Internet traffic in backbone networks. There is current no standard for how to encrypt traffic at the MPLS "layer." I am a co-author on an Internet-draft that specfies a way to opportunistically encrypt in MPLS. The task here is to implement and test that which will likely result in changes to the specification (and adding the student's name to the eventual RFC). There are existing simulation/emulation tools that support MPLS and IPsec that should make implementation fairly straightforward, for example Open vSwitch. A proof-of-concept demonstration with performance figures vs. cleartext and IPsec is the goal. Comparison against MACsec would also be good but may be too hard to test in this environment.

    2. Compare and contrast existing domain/web-site security/privacy measurement web sites

      A number of web sites (see below for examples) offer one the opportunity to "score" or test some other domain or web site for security and privacy issues with their deployment of HTTPS, SSH, DNS, BGP or other visible behaviours. Typically, these are intended for site administrators to help them know if their current site configuration is good or bad, and if bad, in what respect, and how to mitigate that. The sets of tests applied differ, and it is unclear if individual tests applied are the same over different test sites and over time. (The sets of tests certainly change over time as new vulnerabiities are discovered.) The goal of this project is to identify and describe such test sites, to compare their various sets of tests, and to establish whether or not tests that appear to be the same, are in fact the same (likely via setting up a web site with known flaws as a test article). A "test-the-testers" web site may be a good outcome here, that can be updated as the test-sites evolve. The scope here may be limited to https or could be extended to SSH, DNSSEC, BGP or other Internet technologies depending on effort and student interest.

    3. Deploy and run a local instance and test against Irish web sites is an open source tool (currently in public beta) and web site testing platform that aims allow users to score web sites on how well or badly they are implementing visible security and privacy features. The goal here is to deploy a local instance of the tool, and run that to test the security and privacy properties of some local (Irish) web sites, (possibly within, so you'll likely get to tell college it fails:-). As well as deploying and testing the tool, potential improvements to the tool will likely be identified and possibly implemented and fed back to the developers. It'd be a fine thing to identify local sites that are interesting to test, to interpret the tool output, communicate that to site owners and help make the web a bit better.

    4. Deploy a local DPRIVE recursive resolver and test performance

      DPRIVE is a specification for how to run the DNS protocol over TLS in order to attempt to mitigate the privacy problems with the use of the Domain Name System. There are implementations of DPRIVE available now that are ready for experimental deployments. The goal of this project is to deploy a local DNS/TLS recursive resolver and to test it's effectiveness and efficiency via artificial test queries and responses but also, where possible, handling real DNS queries from clients that have opted in to being part of the experiment.

    5. Solar-powered LoRa gateway power management

      In January 2017 we deployed a solar-powered LoRa gateway in TCD. Power managment for that is relatively simple - the device aims to be "up" from 11am to 4pm each day, but handles low-power situations by sleeping until batteries have been sufficiently charged. The goal here is to analyse the power consumption and traffic patterns recorded since January 2017 in order to improve system up-time via enhanced power managment. (For example, the device could decide to not sleep from 4pm if the battery level is above a new threshold.) The current power managment daemon is a simple C program that monitors battery voltage and sets the device to sleep or wake according the simple policy described above. Modifications to that can be designed, validated based on existing data, and then implemented, deployed and tested with the existing hardware.

    6. Foo/QUIC prototype

      QUIC is a new transport protocol being developed in the IETF aiming to provide the same properties that TLS/TCP provides for HTTP but for other applications. The goal here is to select and prototype some application ("Foo") that could benefit from running over QUIC but where that Foo/QUIC combination has yet to be investigated. For example, at the time of writing, I'm not aware of any work on NTP/QUIC (though that might make less sense than it seems, not sure). The goal is to identify and prototype an interesting application/protocol that hasn't been widely tested running over QUIC in order to provide more input into the development of the new transport protocol.

    7. Prototype TLS1.3 SNI Encryption proposals/fronting

      The Subject Name Indicator (SNI) extension to TLS is extremely widely used to support multiple web sites on the same host (e.g. VirtualHosts in Apache2) but represents a major privacy leak as SNI has to be sent in clear, given that no scalable way of hiding SNI has been found. The TLS working group have just adopted a draft that describes various opt-in ways in which SNI can be protected should a web site wish to "front" for another. (Think of a CDN like akamai being "cover" for some human-rights organisation that would be censored in the relevant jurisdiction.) The goal here is to prototype and test the various SNI encryption schemes currently under study in order to assist in selecting the eventual mechanism to standardise.

    8. Critically analyse proposals for weakening the TLS specification

      (This one is a tad political but may be attractive for just that reason:-) There have been ongoing attempts to weaken the security guarantees provided by the TLS protocol by standardising ways of making plaintext visible to third parties. Those are typically claimed to be needed for network management or content scanning, and often involve a man-in-the-middle attack on TLS or leaking keys to a third party. All such proposals could also be used for pervasive monitoring/wiretapping, censorship or other "unintended" use-cases. In reaction to the most recent attempt to break TLS in this way, I documented a range of arguments against breaking TLS. The goal here is to provide a more rigorous analysis. The study should analyse both the technical and social/political pitfalls and claimed benefits of such schemes. The scope could be extended to include the deployed TLS-MITM products that are used in various networks and on some hosts. (Or that could be a separate project itself.)

    9. Play with leaked password hashes

      How fast can you check for the prescence of a password (hash) in a list of 320 million leaked hashes? The naive shell scipt I wrote in a few minutes takes 30 seconds on my laptop. The goal here is speed, without requiring any networking (so no sending password hashes over any network), on a reasonably "normal" machine. The list takes 12GB to store in a file, one hash per line. The list may also be updated occasionally, though in bulk, not as a trickle. Some side-channel resistance (e.g. considering timing, OS observation) is also a goal here. As you'd expect, the list was mostly reversed within a few weeks.

    10. Develop a useful survey tool of Irish mail server deployments collate Internet-scale surveys and make the results of those public via their web site and an API. A quick search there says there are 12582 email servers in Ireland, of whom 62% do some form of STARTTLS for mail transport security. The goal here is to use the API to develop a tool that can be run periodically, that analyses that data and produces a list of email addresses to which one might send useful advice as to how they could improve their mail transport security. For example, if the data indicate some deployments are running a server that could easily be fixed, (say by just updating a certificate), then crafting a mail with deployment-specific instructions as to how to do that could be good. Actually contacting the postmasters involved is not a part of this project but may be done later based on results found.

    Prof. Aljosa Smolic



    Room 2-004, Stack B

    DIN 676 Geschäftsbrief B

    Thesis and Final Year Projects proposals

    Quality Control in 360-Video

    360-degree video, also called live-action virtual reality (VR), is one of the latest and most powerful trends in immersive media, with an increasing potential for the next decades. However, capturing 360-degree videos is not an easy task as there are many physical limitations which need to be overcome, especially for capturing and post-processing in S3D. In general, such limitations result in artifacts which cause visual discomfort when watching the content with a HMD. The artifacts or issues can be divided into three categories: binocular rivalry issues, conflicts of depth cues and artifacts which occur in both monocular and stereoscopic 360-degree content production.

    Within V-SENSE, we developed a framework for quality control in 360-videos which should be improved by new analysis methods or extended with additional artifact detection and, if possible, correction methods.

    The following topics are available for student projects (BSc or MSc thesis):

    • Disparity- / Optical Flow Estimation in 360° Spherical Representation

    • Detection of Stitching and Blending Artifacts in 360° Video / Detection of Local Pseudo-3D in Stereoscopic 360° Video

    Real-time 3D skeleton reconstruction using multi-view sequences

    This project will investigate the possibilities of real time 3D skeleton estimation using multi-view video sequences. As it is already possible to detect the skeleton of different people in in real time video sequences using deep learning, the key components of the project would be creating the mathematical models and 3D optimisation algorithms needed for the real time 3D triangulation.

    3D point cloud motion tracking