Trinity College Dublin

Skip to main content.

Core links

FYP Projects 2016/17 and Proposals for 2017/18

1. Introduction

The information on this page applies to students taking final year projects, Year 5 dissertations, and M.Sc. dissertations in the School of Computer Science and Statistics under the following programmes:

  • BA (Mod) in Computer Science
  • BA (Mod) in Computer Science and Language
  • BA (Mod) in Computer Science and Business or Business and Computing
  • BAI in Computer Engineering (D Stream)
  • BA Management Science and Information System Studies (MSISS)
  • BAI in Computer Engineering and Microelectronics (CD stream)
  • BA (Mod) Mathematics
  • Master in Computer Science (MCS)
  • Master in Computer Science (M.Sc.)
  • MAI in Computer Engineering

2. Guidelines for students

Important dates and deadlines for academic year 2017/18

Course Activity Date
Integrated Computer Science (Yr 4) Project Selection Fri Oct 13, 2017
Final Year Project Project Demonstration Period April 16-20, 2018
  Project Presentation Material and Poster Submission Date Mon Apr 30, 2018
  Project Report Due Thu May 3, 2018
  Project Presentation and Poster Session Fri May 4, 2018
     
Integrated Computer Science (Yr 4) Internship Details Form Submission Date Fri Dec 15, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Tues Aug 7, 2018
  Final Submission of Reflective Diary Tues Aug 7, 2018
     
Master in Computer Science (Integrated, Yr. 5) Project Demonstration Period Mon May 21, 2018 & Tues May 22, 2018
  Project Presentation Material and Poster Submission Date Mon May 21, 2018
  Project Presentation and Poster Session Thu May 24, 2018
  Dissertation Submission Date Fri May 25, 2018
     
Master in Computer Science (M.Sc.) Research Supervisor Confirmed Wed Nov 15, 2017
  Research proposal written up and shared with supervisor for signing off
Thur Dec 14, 2017
  Signed off research proposal submitted Fri Dec 15, 2017
  Ethics applications deadline for any dissertation where a human study/trial is an integral part of the dissertation. Fri March 2, 2018
  Project Demonstration Period Mon 6th - Friday 17th August, 2018
  Submission date for printed and bound copies of the dissertation Thur August 30, 2018
     
Computer Engineering (Yr 4) CS4E2/CE4E2 Ethics Clearance Application Deadline Mon Dec 11-Fri Dec 15, 2017
  Project Demonstration Period April 16-20, 2018
Final Year Project Project Report Due Thu May 3, 2018
     
Computer Engineering (Yr 4) Internship Details Form Submission Date Fri Dec 15, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Tues Aug 7, 2018
  Final Submission of Reflective Diary Tues Aug 7, 2018
     
Master in Computer Engineering CS5E2 Research Methods - Preparation of a Research Proposal Mon Nov 6- Fri Nov 10, 2017
  CS5E2 Research Methods - Presentation of Research Proposal Mon Nov 6- Fri Nov 10, 2017
  CS5E1 Ethics Clearance Application Deadline Mon Dec 11-Fri Dec 15, 2017
  CS5E1 Interim Report Due Mon Dec 11-Fri Dec 15, 2017
  CS5E2 Research Methods - A short discussion on research ethics related to CS5E1 Fri March 2, 2018
  CS5E2 Research Methods - Research paper submission Mon April 16, 2018
  Project Demonstration Period Mon May 21, 2018 & Tues May 22, 2018
  Dissertation Submission Date Fri May 25, 2018
     
Management Science and Information System Studies Interim Presentations Mon Dec 4, 2017 - Fri Dec 8, 2017
  Project Report Due

Thu March 22, 2018

     
Computer Science & Business / Business and Computing Project Demonstration Period April 16-20, 2018
  Project Report Due Thu May 3, 2018
     
Computer Science Linguistics and Language Project Demonstration Period April 9-13, 2018
  Project Report Due Thu May 3, 2018

 

* Due to scheduling constraints it may be necessary to hold some demonstrations later in the week.

When to choose a project

An initial list of project proposals (from lecturing staff) will be released on the Thursday of the last week of Semester 2 in your Junior Sophister year. Supervisors will not accept supervision requests before this time. Further project proposals may be added to this list by lecturing staff over the summer vacation.

Students should select a final year project before the end of the third week of Semester 1. Where students have not selected a project by the deadline, a project supervisor will be allocated to them in consulation with the relevant course director out of the supervisors who have not yet reached their supervision limits. The chosen supervisor will assign the student a project or help them to specify a project in an area selected by the supervisor.


How to choose a project

Students may either

  • select a project from the list of project proposals put forward by the lecturing staff, or
  • alternatively propose their own projects. If you have a project proposal of your own and if you are having trouble finding an appropriate supervisor, then contact your course director:


In either case students must get the agreement of a supervisor before they will be considered as having selected a project. Supervisors may require a meeting with the student to discuss the project before accepting a supervision request. Once a supervisor agrees to supervise a project, details of the project assignment will be recorded centrally by the supervisor.

Students may only select a single project, but they may change their minds and select an alternative project before the end of the third week of Semester 1. However, if a student selects a new project, they must notify both the old and new supervisors that their previously chosen project is to be cancelled.


Choosing a project supervisor

Students should note that each supervisor will only take a limited number of students. If you find the information is incorrect please send details to Final.Year.Project.Coordinator@scss.tcd.ie

Students should also note that there are only a limited number of supervisors in any area. Hence students are not guaranteed a project in their area of choice.


Project demonstrations and reports

See the following documents:


 

3. Supervisors' project areas

The following table indicates the broad areas within which projects are generally supervised, together with the potential supervisors in these areas. Each name is linked to a list of projects proposed by that lecturer.

Subject AreaSupervisors willing to supervise projects in this area
Artificial Intelligence Michael Brady, Vincent Wade, Martin Emms, Tim Fernando, Rozenn Dahyot, Carl Vogel, Khurshid Ahmad, Ivana Dusparic, Joeran Beel
Computational Linguistics Martin Emms, Tim Fernando, Carl Vogel, Khurshid Ahmad
Computer Architecture Jeremy Jones, David Gregg, Michael Manzke, John Waldron, Jonathan Dukes
Computer Vision Kenneth Dawson-Howe, Gerard Lacey
Distributed Systems Vinny Cahill, Stefan Weber, Mads Haahr, Dave Lewis, Jonathan Dukes, Melanie Bouroche, Siobhan Clarke, Ivana Dusparic
Foundations and Methods Hugh Gibbons, Andrew Butterfield, Glenn Strong, Tim Fernando, Vasileios Koutavas
Graphics, Vision and Visualisation Kenneth Dawson-Howe, Fergal Shevlin, Gerard Lacey, Michael Manzke, John Dingliana, Carol O'Sullivan, Rozenn Dahyot, Khurshid Ahmad, Rachel McDonnell, Aljosa Smolic
Health Informatics Lucy Hederman, Gaye Stephens, Mary Sharp, Joeran Beel
Information Systems Mary Sharp, Joeran Beel
Instructional Technology Brendan Tangney, Mary Sharp, Glenn Strong, Richard Millwood
Knowledge and Data Engineering Vincent Wade, Lucy Hederman, Mary Sharp, Declan O'Sullivan, Dave Lewis, Owen Conlan, Khurshid Ahmad, Rob Brennan, Seamus Lawless, Kris McGlinn, Kevin Koidl, Alex O'Connor, Joeran Beel
Networks and Telecommunications Donal O'Mahony, Hitesh Tewari, Stefan Weber, Eamonn O'Nuallain, Meriel Huggard, Ciaran McGoldrick, Jonathan Dukes, Stephen Farrell, Melanie Bouroche, Marco Ruffini, Douglas Leith, Lory Kehoe, Georgios Iosifidis
Other David Abrahamson, Michael Brady, Stephen Barrett, Khurshid Ahmad, Melanie Bouroche, Marco Ruffini, Vasileios Koutavas, Douglas Leith, Joeran Beel
Statistics Mary Sharp, Rozenn Dahyot, John Haslett, Simon Wilson, Brett Houlding, Jason Wyse, Arthur White, Douglas Leith, Bernardo Nipoti, Mimi Zhang


4. Project proposals for the academic year 2017/18

The following is a list of suggested projects for final year BA (CS), BA (CSLL), BA (CS&B /B&C), BAI, MAI, MCS, M.Sc., and MSISS students for the current year. Note that this list is subject to continuous update. If you are interested in a particular project you should contact the member of staff under whose name it appears.

This is not an exhaustive list and many of the projects proposed can be adapted to suit individual students.


Dr. Arthur White

Updated 29/09/17. email: arwhite@tcd.ie or phone +1062. I am based in room 144, Lloyd Institute.

I am interested in problems in computational statistics, where we use algorithms to infer the parameters of a model. The following project areas are suitable for MSc students taking the Data Science strand. I'm afraid that I'm unable to supervise final year undergraduate students this year. In all cases a good working knowledge of statistical methods, e.g., maximum likelihood estimation, Bayesian inference, and Monte Carlo methods will be helpful, and a general interest in statistics will be essential. Each project will be expected to involve:

  • A review of methodology in the area.
  • Implementing an inference routine for the model, probably using R.
  • Applying the model to data in a detailed analysis.


Scalable clustering methods

Mode-based approaches are a popular way to perform clustering in a principled and coherent framework. The standard approach to clustering involves running an iterative algorithm that computes summary statistics using the entire dataset at every iteration. In this project we would investigate alternative approaches that locally re-assign observations to different clusters. There is scope to parallelise elements of this algorithm, or to cluster only a subset of the data at a single iteration. This would make it possible to scale up the clustering method to much larger datasets.

Social network analysis PROJECT NOW TAKEN

Social network analysis involves studying the relationships between a set of objects. In many situations, there are patterns to the types of relationships that are formed - for example, communities of people who are more likely to link to each other than to other people in the network, and leader/follower dynamics. The stochastic blockmodel is a popular statistical method for detecting these patterns. The project would involve investigating novel several areas of interest, including overlapping community detection, degree corrected blockmodels, or non-binary edges for example, looking at email exchanges between users. Reference: Arthur White and Thomas B. Murphy, "Mixed-Membership of Experts Stochastic Blockmodel" Network Science, Volume 4, Issue 1 March 2016, pp. 48-80

Clustering with distal outcomes

A recent area of research involves using the output of a clustering method as a predictor variable for a regression. For example, we cluster students by study habits, then use the clusters to predict their module grade. Estimation for such methods is fundamentally divided into two steps: 1) performing the clustering, and 2) performing the regression. For statistical inference to be valid, the second step of the estimation process has to take into account the estimation uncertainty of the first step. The project would involve investigating new approaches to valid inference for this problem. Reference: Stephanie T. Lanza, Xianming Tan, and Bethany C. Bray: "Latent Class Analysis With Distal Outcomes: A Flexible Model-Based Approach" Struct Equ Modeling. 2013 Jan; 20(1): 1-26.

Probabilistic record linkage

Data linkage is the activity of matching data from multiple sources that correspond to the same individual. As more and more sources of data become available, this activity has increasingly popular. The goal of this project will be to investigate statistical approaches to record linkage, so that even when imperfect matches pccur between data sources, the uncertainty surrounding a match can be quantified. The project will apply these methods to data in the AVERT programme. Reference: Rebecca C. Steorts, Rob Hall, and Stephen E. Fienberg: "A Bayesian Approach to Graphical Record Linkage and De-duplication" Journal of the American Statistical Association Volume 111, 2016 - Issue 516.

Dr. Joeran Beel

Position: Ussher Assistant Professor
Affiliation: ADAPT Centre & Intelligent Systems Discipline / Knowledge and Data Engineering Group (KDEG)
Contact: If you want to do one of the projects, or have your own idea, please read about how to continue in my WIKI (you need to register to get access, and be signed in to read the WIKI, otherwise you will get a 404/dead-page error).
Last update: 2017-10-013

The following projects are only suggestions, and I am open for your own ideas in the areas of:

  • Recommender Systems
  • Machine Learning
  • User Modelling
  • Information Retrieval
  • Artificial Intelligence
  • Information Extraction
  • Natural Language Processing
  • Text Mining
  • Citation Analysis
  • Bibliometrics
  • Altmetrics
  • Scientometrics
  • Plagiarism Detection
  • Blockchain
  • Digital Libraries
  • Digital Humanities
  • Finance (FinTech)
  • LegalTech
  • Tourism
  • Healthcare
  • Business start-ups

Please note that I am not always sure that the following ideas are novel and feasible. It is your responsibility to do some research before you start the project to find out if the idea is novel and if you are capable of completing the project in the given time frame. Many of the projects are suitable for business start-ups. If you are interested in doing a business start-up based on one of the project ideas (as part of your FYP or in some other context), contact me to discuss the details.

Improving Research-Paper Recommendations with One of Various Methods (Machine Translation, Machine Learning, Natural Language Processing, ...)

One of my main projects is Mr. DLib, which is a recommender-system as-a-service that delivers every month some million research-paper recommendations via an API to partners such as JabRef, Sowiport, MediaTUM and soon also TCD's TARA. In the context of Mr. DLib there are many projects you could do. The advantage of participating in Mr. DLib is that you will work with a real-world system that is used by real-users, i.e. you can evaluate your work with thousands of users instead of evaluating it with a small user study as you would probably do in many other projects. In addition, you will work closely with the team of Mr. DLib, i.e. you are involved in an active ongoing project instead of sitting alone on your desk and pursuing your FYP. To work with Mr. DLib you need good JAVA programming skills and basic Linux knowledge. Knowledge in APIs and (REST) Web Services, Python, and web standards (XML, JSON, HTTP, ...) are helpful, though not a requirement. A few project ideas are outlined in the following.

Self-Learning Recommender Systems

Problem/Background: Our different partners have different needs when it comes to calculating and displaying recommendations. Currently, we manually tune our recommender system to find the ideal algorithm(s) for each partner. However, especially in the long run, when we have hopefully dozens or even hundrets of partners, manually tuning the algorithms is not feasable any more.

Solution/Goal of the Project: We aim at developing a "self-learning" recommender system that identifies for each partner (and each user/item of the partner) the potentially most effective algorithm. The idea is that once we delivered some recommendations to a partner, machine-learning is able to identify in which scenario (i.e. which partner, which user of the partner, ...) which recommendation algorithm is most effective. If a partner then requests recommendations again in a similar scenario, our self-learning recommendation framework should use the potentially most promising algorithm. To accomplish the goal, there are two challenges to solve (you would focus on one of the two in your FYP). First, we need a method to effectively collect data, i.e. doing A/B testing and gain insights about what parameters of algorithms etc. might be most effective. Second, we need to find out which machine-learning algorithms are best suited to learn from the collected data.

Machine-Translation-Based Recommendations: improve our content-based recommendations with machine-translations

Problem/Background: In Mr. DLib, we have millions of documents in various languages (English, German, French, Russian, ...). This leads to a problem when users look at e.g. a German document, and Mr. DLib should recommend related documents. Assuming that every researcher speaks English, it would make sense to recommend English documents, even when a user currently looks at a German document. However, Mr. DLib's current content-based recommendation approach can only recommend documents in the same language as the input document.

Solution/Goal of the Project: You apply different machine-translation frameworks to translate all non-English documents (title and abstracts) to English. This way, all documents are available in the same language in our database, and we can also recommend e.g. English documents when a user looks at a German document. You will find out, which machine translation frameworks are best suitable for this task, and analyze how this approach overall improves the effectiveness of Mr. DLib.

Personalized Research-Paper Recommendations for JabRef Users

Problem/Background: We have integrated Mr. DLib into the reference management software JabRef already. However, so far, users can only receive non-personalized related-article recommendations. This means, a user looks at one article, and receives a list of similar articles. The problem of such recommendations are that they don't take into account what articles a user has previously looked at.

Solution/Goal of the Project: You extend the integration of Mr. DLib in JabRef, so that more comprehensive data of the users is transfered to Mr. DLib's servers, and on the servers persoanlized recommendations are generated. You will work with REST web services during this project, JAVA, MySQL, and recommendation frameworks.

Taken: "Nobel Prize or Not?" (Academic Career/Performance Prediction)

Problem / Background: Researchers' performance needs to be assessed in many situations - when they apply for a new position (e.g. a professorship), when they apply for research grants, or when they are considered for an award (e.g. the Nobel Prize). To assess a researcher's performance, the number of publications, the reputation of the journals and conferences of the publications where published in, and the citations a researcher has received are often used. However, up to now, these numbers are rather focused on the past. For instance, knowing that a researcher has published 50 papers by now and accumulated 2,000 citations says little about how the researcher will perform in the future.

Solution / Goal of the project: Your goal is to develop a tool (website or desktop application) that predicts how well a researcher will perform in the future, i.e. in which venues the researcher will publish, how many citations s/he will receive, etc. Ideally, a user could specify a name in that tool, and then a chart or table would be shown with the predicted citation counts etc. In addition, the tool might predict what the next university could be the researcher will work at.

Methodology (how to achieve the goal): One way to achieve the goal is to 1. create a data set that contains historic data of many researchers. The data should include the researchers' publication lists, citation counts, and ideally a work history (at which universities did the researcher work, and when) and the reputation of the venues the researcher has published in. Such data could be obtained from Google Scholar, SemanticScholar, LinkedIn, ResearchGate, and Scimago. 2. Based on the data, you train a machine learning algorithm that learns how well researchers perform based on the collected data 3. You apply the machine learning algorithm to predict a researcher's performance.

Infer/predict demographics based on data such as names, location, email adress, or text

Problem/Background: There are many applications that utilize users' demographics (age, gender, nationality, ...) such as recommender systems or personalized advertisement. However, demographic data is not always available.

Solution/Goal of the Project: Your goal is to develop a method that can predict a user's demographics (age, income, gender, education, ...) based on some input data (e.g. a name, email address, address, IP, tweet). For instance, the gender should be rather easily inferable based on a person's name. Similarly, an email like ...@gmail.de strongly indicates that a user is German while the email ...@aol.com probably indicates that the user is 30+ years old. You task would be to collect suitable data, and train machine-learning algorithms so they can predict a user's demographics.

"Pimp That Voice" (Eliminate Annoying Accents from Audio/Video or Live Talks)

Problem / Background: You watch a movie on YouTube or on Coursera, and the person that talks in the video has a horrible voice. For instance, the person might have a terrible accent (e.g. German), a super-high pitch voice that hurts your ears, or the person starts every sentence with ‘so’, ends every sentence with ‘right?’ and uses the word ‘like’ in every sentence twice.

Solution / Goal of the project: You develop either a software library or an entire application that gets a video (or audio) file as input and returns the file with the “pimped” audio. The “pimping” could focus on:

  1. Removing the speaker’s accent or simply replacing the voice with a completely different one. This means, from a user’s perspective, a user could either select to “make the original voice more pleasant” or “replace the original voice with a completely new one”.
  2. Improving the speaker’s grammar, i.e. remove unnecessary words such as ‘so’ or unnecessary uses of ‘like’.

Ideally, the tool works in real-time, i.e. it could process videos streams e.g. from YouTube while watching a video, or even when a public speaker talks with a microphone. However, for this project it is also ok to work with normal videos files, and if the processing takes a while. You could also slightly shift the focus of the application from improving the voice to helping people to become better speakers. This means, a user would talk with a microphone in private, and whenever the user starts a sentence with e.g. "So, " or ends a sentence with a rhetorical "right?), a beep would occur to remind the speaker to not do that. A very simple version of this project could be that you simply count how often a user says some of the "prohibited" words and at the end of the talk you display some statistics. Other variations include to change the problem from video/audio to phone calls (e.g. with customer support centres). Your goal would then be to develop a tool that allows a company having e.g. a call centre in India and giving all employees a British accent, or the same voice, when they talk to customers. There is a huge business potential in this. See also http://nationalpost.com/news/canada/canadian-speech-software-could-make-thickly-accented-overseas-operators-easier-to-understand

Mass Job-Application Detector

Problem/Background: Every professor and every HR person know the problem: You get an email from someone asking for a job, and, among others, you need to assess how much the applicant really wants to work with you, or if the applicant has sent the application to dozens of other companies/professors.

Solution/Goal of the Project: Develop a tool that detects mass-job-applications. The tool could be, for instance, an add-on for Gmail and when there is job application, the tool warns the current user that the application (probably) is a mass application. I see two options to realize such a tool (but there might be more)

  • Provide a probability score that the email is a mass application. This score could be machine learned based on the emails text and maybe other features. This task would be similar to detecting spam mails.
  • Check how many other users have received the same (or very similar) email. The more users received the email, the more likely it is a mass application. For professors, it would probably already enough if just a few other colleagues at the department also use the add-on to get reliable results.

Detect Drones with Crowdsourced Distributed Cameras

Problem/Background: There are more and more drones flying around, and the risk of e.g. terror attacks through drones is increasing. Consequently, the need to being able to detect drones as early as possible increases.

Solution/Goal of the Project: Imagine a distributed network of house owners who all install cameras (video or photos) on the roofs of their houses. The cameras could communicate over the Internet and together detect if an object in the air is a drone or a bird, cloud, airplane, ... Such recognized drones would be shown on a map and if they enter no-go zones, the policy would be informed. There are many extensions possible. For instance, cameras could be mounted on rotatable installations. Once, one camera detects a suspicious object, other cameras could rotate to that direction and take additional photos.

Plag.me: The automatic plagiarism creator

Problem/Background: None :-)

Solution/Goal of the Project: Develop a tool that takes a text as input (optionally a text with additional data such as citations, figures, tables, ...) and then rephrases the text and/or translates it, replaces (some) references, and redraws the figures and tables. All this should be done with machine learning. The goal of this project would be to demonstrate how good machine learning is nowadays and how difficult it is to detect such plagiarism. This project would be for demonstration purposes only, and/or to eventually support the detection of plagiarism. You would have to think of a way to minimize the risk of your tool being abused for actually creating plagiarism (one solution could be that all submitted texts and returned output are publicly available).

The Kaggle Machine-Learning Competition Solver

Problem/Background: Kaggle is a platform for machine learning and many companies offer competitions on Kaggle with the winner receiving significant prizes. Participating on these competitions is not really difficult in theory as it's usually always about cleaning the data, trying different machine learning algorithms and calculating whatever evaluation metrics the company wants. However, this is a time consuming process.

Solution/Goal of the Project: You write a tool that automates the process of participating in Kaggle competitions as much as possible. One potential solution would be to use "Automated Machine Learning Frameworks", which are capable, to some extent, to automatically run a large number of algorithms on a given dataset and find the most effective one. Alternatively, such a tool might also be used by Kaggle as a baseline in the long run, i.e. other participants need to be better than the automated machine learning libraries.

"Photo2Location" (Guide a User to the Location a Photo was taken)

Problem / Background: When people visit a nice restaurant or sight seeing spot, they usually take photos. However, when they look at the photos some days, weeks, or even months later they often do not remember where the photo was taken exactly, or, how to get there to e.g. eat again in that nice restaurant.

Solution / Goal of the project: You develop an app for iOS and/or Android that allows opening a photo in that app, and then the app displays the location of the photo on a map, and guides you there. This way, users can easily find e.g. the nice restaurant where they have eaten some weeks ago.

The project as describes here is probably not comprehensive and novel enough to justify a Final Year Project. So, you would have to come up with some additions to extend the project scope and make it a bit more original (maybe add a recommender system?).

Time-Normalized TF-IDF Term Weighting for Enhanced Search and Document Similarity Calculations

Problem/Background: TF-IDF is one of the most common term-weighting schemes to calculate how relevant a document is for a given search query. 'TF' stands for 'term frequency' and the rationale is that the more often a query term appears in a document, the more relevant that document is for the query. 'IDF' stands for 'inverse document frequency' and the rationale is that a document is the more relevant for a query term, the less documents in the corpus contain that term. The problem with this approach is that it does not consider since when the term is used in the documents. For instance, the term 'bitcoin' appears only since few years in documents because Bitcoin was only introduced in 2009. In contrast terms like "search engines" are used since decades, consequently there are much more documents to expect containing that term.

Solution/Goal of the Project: You modify the traditional TF-IDF formula (and/or other term weighting schemes) to consider since then the query term is used, and then you evaluate if the retrieval performance can be improved.

Voice2Text2Voice2Video2...

Problem/Background (1): You get a text message from your dearest and wish she/he had send that message as voice or video message

Problem/Background (2): You voice or video talk with your dearest and the quality is too low to really hear or see the other person

Solution/Goal of the Project: You develop a tool, e.g. a WhatsApp add-on, that is capable to play a text message in the voice of the person who sent the message. This means, for instance, when your mother sends you a text message, you can press a button "Read out loud" and then the message is played in your mother's voice. To accomplish this, the voice of your mother would have to been learned e.g. from previous WhatsApp voice calls with her. Or, even cooler, the text message would not be only read out loud but you could see a video of your mother. Similarly, to address the second problem, you could develop a tool, e.g. a WhatsApp add-on, that transforms the speach of user 1 to text, sends the text user 2, and on user 2's phone the text is then played as voice or video.

There are variations possible. For instance, it would not necessarily have to be Text2Voice or Text2Video. It could also be Voice2Video.

The Journal and Conference Health Monitor (How active/"healthy" are academic conferences and journals?)

Problem/Background: Researchers publish their work usually in academic journals, or on academic conferences. The reputation of such venues is an important metric to judge the reputation and impact of a researcher. To estimate the reputation of a conference or journal, typically citation-related measured are used, and there are many platforms that provide such rankings that typically range from A (top venue) to C (mediocre venue) and unranked. However, these rankings provide rather a static image of a conference or journal. Recently, researchers calculated the "health" of a few selected conferences. The researchers calculated the health not only based on a static citation measure, but considered the development over time, and used additional criteria including the number of authors who publish on a conference, the number of new authors, and the number of programm commitee members. However, this was a semi-manual process that involved lots of work.

Goal of the project: The goal is to develop a method that can automatically calcualte the "health" of academic venues on a large scale. Your tasks would include to define what exactly makes an academic venue "healthy", collect the required data, and calculate the health for a large number of conferences. Ideally, you would also develop a method that predicts how the health will continue to evolve in the next years.

Sketcha: Captchas based on Sketches

Problem/Background: Captchas are common tools to detect spammers and bots and there is a constant battle between the two groups. There are audio captchas, voice captchas, text captchas and so on and so forth. Also sketches have been used to detect spammers and bots http://gfx.cs.princeton.edu/pubs/Ross_2010_SAC/index.php. However, Google recently released a huge dataset with sketches https://techcrunch.com/2017/08/25/google-releases-millions-of-bad-drawings-for-you-and-your-ai-to-paw-through/amp/. To the best of our knowledge, this data has not been used to implement a novel captcha method.

Solution/Goal of the Project: Be creative and find a way to develop a novel captcha method based on sketches (either the ones of Google, or some other method).

A Generic Machine-Learning Based Website Parser/Scraper (for research articles)

Problem/Background: In many situations it is necessary to parse a website and identify e.g. the title, authors or abstract of the text being displayed on a web page. This is important for web crawling but also to e.g. import research articles from the web to your reference manager (see e.g. https://scraper.bibsonomy.org ). As far as I know (please let me know if I am wrong), current parsers use heuristics and templates to identify e.g. the title of a website. With the huge advances in machine learning this seems not appropriate any more.

Solution/Goal of the Project: Develop a web page scraper/parser that identifies certain elements automatically from a web page. The parser should be trained with machine learning, and compared against the state-of-the art parsing tools. I am particularly interested in parsers for academic content, similar to https://scraper.bibsonomy.org but am also open for other disciplines (e.g. parsing news websites).

Extending Word-Embeddings with Citation Context

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: The idea is to replace citations with e.g. the titles of the cited documents. So, while normally a text being used for learning word embeddings would look like this:

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

... the "extended" approach would add the title of the cited document to that text and use this extended text for learning. Hence, the text would change to

One of the most common recommendation approaches is content based filtering. Research Paper Recommender Systems: A Literature Survey found that 53% of all research-paper recommender systems use content-based filtering.

Citation-Embeddings: Applying the Idea of Machine-Learned Word-Embeddings to Citations in the Context of Research-Paper Recommender Systems

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: Instead of terms, citations are used for the embedding. The approach would either use citations only, or a hybrid approach of citations and terms. Citation embeddings can rather easily be created when for each citation a unique document ID is given. For instance, if two documents both cite the same document ...

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering [4].

these texts would be converted to

One of the most common recommendation approaches is content based filtering. unique_document_id-4854564 found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering unique_document_id-4854564.

A Citation-Proximity-Based Ground-Truth to Train Text-Based Recommender Systems / Learning to Predict Citation Proximity based on Terms

Problem/Background: Many recommender systems (e.g. for news, web pages, research articles, ...) need to be able to identify related documents for a given input document. For instance, if a user reads a news article, the website might want to recommend related news articles to keep the user reading. Calculating document relatedness is not a trivial task. Often, text similarity is used (e.g. cosine similarity) or relatedness is learned from some ground-truth. For instance, a machine-learning-based recommender system for research articles, might learn that all articles published in the same journal or the same author are somewhat related. However, this approach often does not achieve satisfying results.

Solution/Goal of the Project: A solution could be to learn text-based document similarity based on citation proximity analysis (CPA) https://en.wikipedia.org/wiki/Co-citation_Proximity_Analysis. You would need to find a document corpus that contains the full-text of research articles and their in-text references. You would then train a machine learning algorithm with the terms and citation proximity. This means, the algorithm should be able to learn how closely two documents will be cited, given their text as input. When you later will have a new un-cited document, you will be able to predict based on the document's text, which other documents potentially would be cited in close proximity to that article.

Taken: The 1-billion Citation Dataset for Machine-learning Citation Styles and Entity Extraction from Citation Strings

Problem/Background: Effective citation parsing is crucial for academic search engines, patent databases and many other applications in academia, law, and intellectual property protection.  It helps to identify related documents, or calculate the impact of researchers and journals (e.g. h-index). “Citation parsing” refers to identifying and extracting a reference like [4] in the full text, and author names, journals, publication year etc. from the bibliography. For instance, in the following example the citation parser would have to identify the citation markers [1] and [2] and [3], [4], and then extract from the bibliography that for the first entry "K. Balog", "N. Takhirov" etc are the atuhors.


1 Introduction
Retrieving a list of ‘related documents’ for a given source document – e.g. a web page, patent, or research article – is a common feature of many applications, including recommender systems and search engines (Figure 1 ). Document relatedness is typically calculated based on documents’ text (title, abstract, full-text) [1] and metadata (authors, journal, …) [2], or based on citations/hyperlinks [3], [4].



6 Bibliography
[1]    K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg, “Multi-step Classification Approaches to Cumulative Citation Recommendation,” in Proceedings of the OAIR’13, 2013.
[2]    D. Aumueller, “Retrieving metadata for your local scholarly papers,” 2009.
[3]    B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 2, pp. 571–575.
[4]    S. Liu and C. Chen, “The Effects of Co-citation Proximity on Co-citation Analysis,” in Proceedings of the Conference of the International Society for Scientometrics and Informetrics, 2011.

Over the years many approaches to reference parsing have been proposed, including regular expressions, knowledge-based approaches and supervised machine learning. Machine learning-based solutions, in particular those falling into the category of supervised sequence tagging, are considered a state-of-the-art technique for reference parsing. Unfortunately, they still suffer from two issues: the lack of sufficiently big and diverse data sets and problems with generalization to unseen reference formats. Especially for deep learning, much larger datasets would be needed than exist today.

Solution/Goal of the Project: Your goal would be 1) to create a massive citation dataset 2) use that dataset to train (deep) machine learning approaches to parse citation strings.

Methodology: To achieve the goal, you could do the following

  1. Download/parse millions of structured metadata of academic publications e.g. from ACM Digital Library, IEEE, PubMed, ... (they all offer their metadata as BibTeX, Endnote, ...). For instance, you would have millions of entries like this:

    @INPROCEEDINGS{Beel2017g,
    author = {Beel, Joeran and Aizawa, Akiko and Breitinger, Corinna and Gipp, Bela},
    title = {Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia},
    booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
    year = {2017}
  2. Use http://citationstyles.org/ and https://github.com/michel-kraemer/citeproc-java to create millions or even billions of citation strings based on the parsed metadata. "Citationstyles.org" is a collection of thousands of citation styles and citeproc-java a framework to convert e.g. bibtex into one of the thousand styles, i.e. you could output the previously parsed metadata in thousands of different citation styles, e.g.
    • J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • Beel, J., Aizawa, A., Breitinger, C. & Gipp, B. Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017).
    • Beel, Joeran et al. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • ...
  3. In addition, it might make sense, to create further artificial citations with a knowledge base. For instance, you download a list of journal names (e.g from http://www.scimagojr.com/) and names (first name, last name), create random page numbers etc. and then create billions of new citation strings.
  4. It could also make sense to create the citation strings, make a Word or LaTeX document that contains a bibliography with e.g. 5-30 citation strings, create a PDF out of it, Parse the PDF with one of the many PDF parsing tools to identify the bibliography and citation strings, and then use that data for learning. This would be a more realistic scenario because probably the PDF creation and parsing would introduce some errors/noise into the citation strings.
  5. Use machine learning frameworks like scikit-learn or Tensorflow to learn the elements of a citation string.

"Unshredd Me" (Reconstruct Shredded Documents)

Problem/Background: Criminal investigators and others often face the problem that suspects shredded documents, i.e. destroyed evidence. Hence, the investigators need to restore the shredded documents, which is a lot of work and sometimes impossible.

Solution/Goal of the Project: You develop an "unshredder" tool (website or desktop application) that takes as input a photo of a shredded document and then returns the unshredded documents. To accomplish the project, you probably have to create a dataset of photos showing shredded documents and the original unshredded versions. With this dataset, you can train a machine learning algorithm and then evaluate how well your algorithm works. When doing the project, you should decide if you want to focus on machine-shredded documents (probably easier) or documents that were torn apart.

"ASEO Me" (Optimize Research Articles for Academic Search Engines)

Problem / Background: Researchers have an interest that their research articles are indexed by academic search engines such as Google Scholar and SemanticScholar as this increases the articles' visibility in the academic community. In addition, researchers should be interested in how well their articles rank for certain keyword searches. Some years ago, I published an article about "academic search engine optimization" that gave advice on how to optimize research articles to make them easily indexable by academic search engines. Nowadays, ASEO is being used by many publishers. However, many researchers are not yet aware of the importance of ASEO and/or they do not have the skills to perform ASEO.

Solution / Goal of the project: Your goal is to develop a website, on which researchers can upload a research article, and your website is analyzing the article and 1. Predicts how well the article will rank for certain keywords 2. makes suggestions on how to improve the article's ranking and 3. (optionally) modifies the article automatically to make it better readable/indexable by academic search engines.

ASEO Reloaded (Academic Search Engine Optimization 2)

Problem / Background: As described in the previous project, I published an article about "academic search engine optimization" that gives advice on how to optimize research articles to make them better indexable by academic search engines. The article was published some years ago, hence not all advices might be sensible, or, due to advances in search engine ranking, some additional aspects might need to be considered.

Solution / Goal of the project: Your goal is to find out how to optimize research articles for academic search engines. In contrast to the other project, the focus here is on the research, i.e. you will do some experiments to find new ways of optimizing research articles, while the focus of the previous project is more on the application (i.e. enabling a user to upload a paper, and getting advice).

Deep Information Extraction

Background: Information extraction from documents allows to automatically obtain information such as document title, author names or dates directly from the file content. Extracting information is usually done by a sequence of steps. For example, first we might recognize objects such as words, text lines and blocks within a PDF document, then such objects could be classified into categories (such as "title", "authors", "date", "section title", "paragraph", etc.), and finally we might perform additional tasks such as splitting authors lists into individual names and surnames. Usually, the most important step is the classification of document fragments (blocks of text) into categories. Traditionally this is done by supervised machine learning, which operates on features such as the words or phrases in the text, the formatting of the text, its font, size and position on the page, etc. These features are manually developed by researchers, which takes time and effort.

Goal: In this project, you will explore the possibility of applying deep neural networks to the problem of classifying such document fragments into categories. One of the strengths of deep networks is that they do not require "smart" hand-crafted features on the input, but are able to work very well even with the raw input. For example, in image classification, we do not have to write code for recognizing basic objects like edges or other geometric shapes in order to provide this information to the network. Instead, it is enough to feed the neural network with only RGB values of the image pixels, and the layers of the network will automatically learn to recognize basic shapes needed to accurately classify the image.

The project will focus on developing a good alternative representation of a text block, such that will result in high classification accuracy without the need to manually engineer features. For example, a text block can be represented as an image from the original PDF file, which could make the network automatically learn features such as text size or justification. To capture the position of the block on the page, as well as its neighborhood, it could be useful to also provide an image of the entire page with the block of interest highlighted in some way. It's also possible to add work embeddings as input features, to capture the textual layer of the block as well.

Why cited?

Background: Referencing other researcher's work is very important in scholarly communication. Usually, we assume that the more citations a certain paper has, the better it is. On the other hand, there are many reasons why we cite a paper: maybe we use the data or tool described in another publication, maybe we extend someone's work by adding new functionality, or we are comparing our results to the competitor. Having such detailed information about why papers are cited might help to better assess their quality.

Goal: The goal of the project is to develop an automated tool able to determine the reason why a paper was cited in another paper. The most natural solution for such task is supervised machine learning, and textual sentiment analysis. A supervised classifier could use features related to the context of a citation (the sentence the citation appears in, these could be words, phrases, word embeddings, etc.). Other features might also be useful, for example, how many times the paper is cited, whether this is a self-citation, in which section the citation appears ("state of the art", "methodology", "evaluation"?). In addition, it might also be interesting to take a large set of citation data and try to automatically (or semi-automatically) discover what are some common reasons for citing papers, without assuming we already know those reasons. This task could be approached with unsupervised machine learning (clustering).

Named Entity Recognition for Computer Science

Background: Named Entity Recognition refers to finding entities like people's names, companies, places or dates in textual documents. Traditionally this can be achieved using supervised classifier which analyses the text represented as a sequence of tokens (words or similar units). Tokens are labeled sequentially based on various features, as well as their neighborhood in the text.

Goal: In this project, you will build an entity recognizer for Computer Science domain, able to process for example academic papers or other documentation. This recognizer will be able to find entities such as library name, algorithm, complexity, dataset, operating system, device, license, programming language, etc. The project will require building a training set of documents with marked entities (this can be automated to some extent using, for example, existing knowledge bases and searching for known entities in the document corpus). The second part will aim at training and evaluating a token tagger. Various features from the traditional named entity recognition, as well as word embeddings, could be used.

This project can be also extended to extract relations between entities, such as "library uses programming language", "algorithm has given complexity", etc. This part would also be based on supervised machine learning.

Extraction of Phrases Answering the Five Journalistic W-Questions using ML

Problem / Background: News articles typically answer the five journalistic W-questions (5W) within the first few sentences, i.e., who did what, when, where, and why. The 5W phrases answering these questions describe the main event of an article. Thus, the 5W phrases are used for various applications in automated analysis and processing of news, including news clustering (finding related articles), news aggregation (such as Google News), and summarization.

Solution / Goal of the project: Your goal is to find proper features that can be used by machine learning (ML) and deep learning methods to extract from a given article for each 5W question one phrase that best describes the main event. Thereby, you can start by experimenting with features and methods that we implemented in a previous, non-ML-based 5W extractor. After you have devised your method, you should compare the results to our previous system using a gold-standard dataset.

Dr Mélanie Bouroche

I am happy to supervise projects in the Smart Cities area, if you have any idea that might make our cities smarter, get in touch! I am particularly interested in (connected) autonomous cars and their effect on cities, addressing questions such as how can such smart cars share the road with human-driven cars? What proportion of them is needed to make traffic safer and more efficient for everybody? While those are big research questions, a number of projects can be carved out depending on students' specific skills and interests.

Traffic Analysis from Advanced Bus Transportation Systems (ABTS)

    This project would be to take publicly available datasets generated from ABTS and possibly elsewhere to estimate where congestion and delays might be occurring.

Open Street Map Contribution.

    This project would be to condition, process and upload good-quality road grade information to publicly-available maps

Open Street Map Contribution.

    Dashboard-type Visualisation for Advanced Bus Transportation Systems (ABTS)-related data.

Dr. Andrew Butterfield

Room

Extension

ORI G.39

2517

Background

My research interests are in the areas of Formal Methods, for verifying the correctness of hardware and software systems through the use of mathematical modelling and proof. The current focus is on the use of the so-called Unifying Theories of Programming paradigm (UTP) for linking together theories that model different types of programming languages. Also of interest is the use of Functional Languages, particularly pure lazy ones like Haskell and Clean.

Most of these projects can be scoped to suit Senior Sophister projects, or 5th year MCS/MAI Dissertations

Application Areas

Spaceflight Software

Work for the European Space Agency (ESA) has raised the issue of using formal techniques to reason about the correctness of C code that handles interrupts in hypervisors. Of interest is seeing how high level models might be connected to tools that analyse code with appropriate formal annotation (e.g. Frama-C).

A set of baseline requirements for a Seperation Kernel are now available. Of interest is building formal models of these using Haskell and/or CSP in conjunction with the FRD4 Refinement Checker.

A connected project is to take the CSP parser associated with FDR4, which is open-source at libcspm, and revise it to support so-called literate programming, in a similar fashion to that available in Haskell. The parser is written in Haskell, by the way.

Work Processes

I have been working with John Noll who is a colleague in Lero, the Irish Software Research Centre, based in UL. He and colleagues from his former job in Santa Clara had developed a language originally for describing software development processes, called Process Modelling Language (PML). They developed some tools to analyse PML descriptions. I have been collaborating with him to give PML a formal semantics, and to develop new tools written in Haskell. The language can be used to describe a wide range of processes, from general business, through to so-called clinical healthcare pathways. We are also interested in using them to model software development standards for medical device software. Projects could range from progressing the analysis tools, to using it to model some real process.

Facebook's infer

Facebook have open-sourced their radical new approach to verifying memory safety in Android and iOS code (see Infer). It would be interesting to explore the use and extension of this technique.

Haskell based Projects

In addition to specific projects listed below, I am willing to supervise projects that use Haskell or similar functional languages (e.g. OCaml, Lisp).

UTP Calculator (a.k.a. "Theory Hacking")

Developing UTP theories involves a lot of test calculations. A rapid theory prototyping tool UTPCalc has been developed in Haskell that allows the user to code up theory definitions using the Haskell language itself. Possible projects include extending and improving the calculator, and/or applying it to check new or existing UTP theories.

Theorem Prover Tool Support

Of particular interest at the moment is the development of proof support tools for UTP, most notably "Unifying Theories of Programming Theorem-Prover - U(TP)2" , a proof assistant written in Haskell.

I am willing to supervise projects that either use the tool to build a theory of some interesting language, or help to improve the power and quality of the tool in some way. Example project ideas include:

  • Improved pretty-printing of large expressions and predicates, using nice fonts, and with every component "clickable".
  • Enhancing (induction) proofs through the use of "rippling"
  • Connecting the proof-engine to known secure theorem provers to justify/verify basic proof steps.

Working with the tool requires knowledge of Haskell, as would be obtained by taking module CS3016 in JS year and ideally followed up with CS4012 in the SS year.

Dr Ciarán Mc Goldrick

URL: www.scss.tcd.ie/Ciaran.McGoldrick :: www.ciaranmcgoldrick.net

Room

Extension

Lloyd 1.10a (inside 1.11)

3626

I am happy to supervise projects at both senior undergraduate and MSc level. In recent years I have predominantly been mandated to supervise MSc projects.

In general you should have a strong academic record, an interest in networking, communications and control/signal processing, be motivated to succeed and solve problems, be a solid programmer and have some affinity for hardware.

I will be a variety of projects in 2017/18, some of which may include the opportunity to collabrate with colleagues in UCLA.

Vehicular Communications

I will be offering two project on vehicular communications.
One will be a continuation of a project on the use of Visible Light Communications and associated systems as a side channel for secure V2V and V2I communications. This project will involve the evolution and development of existing hardware circuits and control systems.
A second project will focus on efficient, low-loss medium switching in response to rapidly changing and dynamic vehicular mobility.

Underwater Networking

I will be offering two projects on Underwater Networking.
Both will involve (contribution to) the development of a community accessible undewater networking test and evaluation platform. There will be two separate development, integration and practical evaluation strands that will complement activites in our H2020 project.

Control

I will be offering a project involving the development and evolution of a new form of primitives for use in distributed control modalities.

Your project ideas ...

If you have an interesting or compelling idea in the networking, communications, security,control or STEM education domains please feel free to get in touch. In doing so please be able to clearly and concisely tell me: i) what you propose to do; ii) why you want to do it; iii) what the interesting (research) hypothesis is; iv) how or why anyone be interested in your completed project.

Further info: Ciaran Mc Goldrick

Last updated: 12/7/2017

Prof Dave Lewis

email with "PROJECT IDEA" in the subject line.

Privacy Canvas:

The Business Model Canvas is a popular tool for the iterative modeling of business ideas. Recently we have adopted the affordances of the business model canvas (simplicity, graphical layout, ease of design iteration) to the problem of modelling the ethical issues of an digitial application project. This has resulted in the Ethics Canvas design and application, which has been used to help teach ethics considerations at undergrad, postgrad and postdoc levels. A similar tool may be useful when considering and teaching privacy and data protection concerns. This project will refactor or redesign the ethics canvas code to offer a canvas style interface for brainstorming the data protection issues in a digital application design, in a way suitable for supporting training in this topic in remote groups.

Multilingual GDPR annotator:

With multiple approaches emerging to support compliance to the EU’s new General Data Protection Regulation, supporting the linking of different privacy policy or privacy impact assessment documents back to the GDPR source text becomes of interest to those needing to demonstrate compliance. This project will provide web annotation tool support for linking GDPR text with organisation specific data protection documents, and enable this for different languages. This could then be used for other regulations or standards requiring compliance tracking internationally. The approach should follow a standardized web annotation approach and should build on the linked data representation of the GDPR text developed in the school. This project would suit a student with strong language skills in a European language in addition to English.

Generic Data Management Artefact Meta-Management:

Data management is becoming an increasingly complex and vital part of any organisation attempting to leverage big data assets. Declarative data objects using standard vocabularies and data manipulation languages provide powerful data management features, but as they become popular these objects themselves must be managed over their useful lifecycle, so they can be indexed, discovered, revised, corrected etc. This project will explore open vocabularies and tools to provide support for such lifecycle management over a small sample set of artefacts, namely, semantic mapping in SPARQL and it explicit representation in SPIN, data uplift mapping in R2RML, data protection compliance queries in SPARQL/SPIN.

Open Data for Research Ethics:

Research ethics clearance needs to be secured for scientific studies within research institutes, but the details and provenance of such data is typically not available if experimental data is later shared with other researchers. This project will explore a linked open data vocabulary to complement existing open science data models (e.g. that of OpenAire) to allow the ethic clearance associated with that data to be recorded and shared in an interoperable manner between research institutes via an open API.

Asserting Collective Control over the Means of Cognition:

Big web-based companies, often referred to as digital ‘platforms’, are able to leverage personal data on a massive scale for use in targeted advertising and other opaque behavioural influencing activities. Modern machine learning techniques lead to a massive information asymmetry between user and such companies, i.e asymmetry between what they know about us and what we know about how they leverage, share and use our data. While data protection regulation aim to redress this balance, it only operates at the level of the rights of individuals, so this power asymmetry may not be greatly impacted for the population of users overall. This project will explore ways in which social media groups can be used to share concerns about the aggregation, sharing and processing of personal data and to organise collective action around these concerns, upto and including mass transfer of personal data to another platform. Tools to enable mass, collectively organised transfer of data to another platform can exploit both the enhanced right to portability users now enjoy and interoperability standards from the World Wide Web Consortium’s working group on the Social Web.

Digital Ethics Observatory:

News stories about Big Data and AI ethics appear in the media daily. However, there are few resources available for those wishing to monitor these fast moving issues. This project will develop an application that allow news stories to be archived and then annotated by interested volunteers using the ethics canvas tool (ethicscanvas.org), to provide an open, searchable index of digital ethics news stories for researchers, journalists and concerned citizens alike.

Data Protection Process Browser Widget:

The EU’s new General Data Protection Regulation offers users across EU common rights on how their data is processed by organisations. This project will develop and evaluate a web widget that can be integrated into different web sites and offer a simple graphical, process-oriented visualization for exploring the rights offered by a specific service’s privacy policy, based on an existing model developed in the ADAPT Centre for Digital Content Research.

Blockchain for Value-chain Consent Management:

The EU’s new General Data Protection Regulation offer users right to rectify or erase data previously provided to a service provider. Response to requests that exercise this right must be propagated to any other organisations with whom that user’s data has been shared and its implementation must be recorded for regulatory compliance purposes. This potentially adds significant complexity to systems for sharing data along business value chains. This project will explore the level to which existing blockchain platforms can reduce this complexity and the cost involved, especially in order to mitigate the risk of this regulation becoming an excessive burden on small to medium enterprises data sharing.

Visualising provenance for data and consent lifecycles:

The upcoming General Data Protection Regulation requires companies and organisations to maintain a record of the user’s compliance and data lifecycles. These lifecycles can be complex as the same consent and data can be used in several activities which makes it difficult to track their usage. Visualisations are a great way to display information in a concise and simpler manner, and can prove to be helpful in navigating complex pathways such as the lifecycles. The project explores various ways to visualise provenance traces in a granular manner so as to enable tracing data and consent from an user to all the activities that use it, based on an existing model developed in the ADAPT Centre for Digital Content Research.

Integration of Building Data Sets in a Geospatial Context:

Currently, building information is often dispersed and fragmented across different storage devices, different file formats and different schemas. This data must be integrated in order to support a range of use cases relevant to smart buildings and cities, such as those related to navigation, building control and energy efficiency. In this project you will explore available standards and data sets, and using established methodologies for data uplift, convert these datasets into Linked Data, making them available over the web and linking them to other available data sets, such as geospatial data. You will answer the question, can existing open datasets be used to derive useful information about buildings to support the aforementioned use cases.

Exploratory technologies for supporting data uplift - https://opengogs.adaptcentre.ie/debruync/r2rml

Conversion of building information geometry into geospatial geometric data:

The Industry Foundation Classes (IFC) is a standard for exchanging building information. Currently, a large part of the standard is dedicated to storing and exchanging geometric data about the geometry of the building and building elements. A complex set of relations are maintained within the IFC schema to support geometry, which when converted to RDF leads to significant overhead in terms of storage as triples. In this project you will explore methods for reducing the size of the geometry of IFC models, in particular, through their conversion to Geographical Information Systems standards such as Well Known Text answering the question, are GIS geometry models a suitable way to store building geometries.

Exploratory technologies for working with IFC geometry (removes geometry from an IFC OWL conversion - https://github.com/pipauwel/IFCtoSimpleBIM)

Visualisation of building geometry in a geospatial context:

Open and accessible building information can support multiple use cases relevant to smart buildings and cities. The OSi has a large dataset of building data, which includes geospatial data about building location, and other properties like its current use, the type of building (its form and function). In this project you will explore an interface for the visualization of the OSi building data to support the querying of buildings, but also interaction with the building geometry through a web interface, e.g. point and click selection of buildings (with HTML5 and the three.js WebGL library). You will examine what is an appropriate way to visualise building data so that it can support users when generating and exploring queries. Exploratory technologies for visualising building information, is available that shows a very simple three.js GIS model which integrates OSi county data.

Online questionnaire tool for GDPR compliance assessment:

The General Data Protection Regulation (GDPR), agreed upon by the European Parliament and Council in April 2016, will replace the EU Data Protection Directive (EU DPD). Organizations dealing with personal data of EU citizens must ensure that they’re compliant with the new requirements of the GDPR before it becomes effective on 2018. It is important for the organization dealing with personal data to assess their compliance with GDPR to identify risks before regulatory violations occur, as the fines under GDPR can be upto 4% of a company's global turnover. This project will build a online support tool for GDPR compliance based on assessment questionnaires. The tool will show the important aspects of GDPR and based on answers to the questions of compliance assessment, the tool will show whether they are fully compliant or they need to work in that area to improve compliance.

Prof. David Gregg

Room

Extension

027 Lloyd Institute

3693

Note: I will not be supervising projects in the 2017/18 academic year. Sorry about that.

Dr. John Dingliana

https://www.cs.tcd.ie/John.Dingliana/

Room

Extension

02-014 Stack B

+3680


Some general info

I am a member of the graphics, vision and visualisation research group interested in the areas of:

  • 3D Visualisation
  • Computer Graphics and Virtual Reality
  • Graphical aspects of Mixed and Augmented Reality
  • Stylised Rendering / Non photo-realistic rendering
  • Physically-based Animation

Suggested Projects:

  1. Topics in Visualization: I am interested in visualization of spatialized 2D and 3D data structures (i.e. data that has a geometric structure). Some possible topics:
    • Multi-modal spatialized data visualization e.g. fused visualization of data from different sources
    • Multi-variate visualization e.g. visualizing a vector field with many variables.
    • Time-varying spatialized data visualization
    • Spatio-temporal visualization
    • Visualization using Virtual and Augmented Reality devices
    • Perception in Visualization
  2. [TAKEN]Remote telemetry using head mounted Virtual Reality/Augmented Reality displays: video and 3D information acquired from a remote sensor will be transmitted live to a user wearing an HMD (an occulus rift or metavision AR head mounted display will be available for use). Instead of the user merely seeing through "the eyes of the camera", the spatial information gathered will be displayed asynchronously in order to minimize motion sickness effects and allow the user to explore the data independently.
    Potential challenges include the following (each of which could be the focus of a project):
    • reducing latency between acquisition, processing and display of 3d environments on AR/VR displays
    • adaptive level of detail in interactive AR/VR
    • blending different modalties (e.g. fusion of different sensors or seamless integration of real and virtual).
    • ensuring accuracy/fidelity of the visualization
  3. [TAKEN]Spatial perception in AR : This project will explore how users perceive relative distances of objects (e.g. real vs virtual) in mixed environments. Can users reliably judge which object or feature is closer, do users have an accurate sense of scale, can users be convinced that a real and virtual object are collocated/connected? In particular there is limited work in up-and-coming "see-through AR" devices such as the Microsoft Hololens.
    The effort in the project will be in using one or more AR displays to render experimental 3D graphical scenes wherin virtual objects are embedded in the real world; implementing a number of strategies (mostly from the existing literature) to improve spatial perception in such scenes; implementing a testing scenario to compare spatial percpetion from different strategies; and potentially running a pilot experiment.

  4. [TAKEN] Interactive Anatomy Illustration from Real-world 3D Data: In anatomy education, it is quite common to use artistic illustrations such as the one pictured to the right by Frank Netter who created thousands of images, used in most of the anatomy textbooks worldwide. Illustrations are purported to be easier to understand, learn, and remember when compared to scanned or photographed images of anatomy. However such illustrations require a skilled artists also well versed in anatomy and a large amount of effort and time to produce.
     
    This project looks at challenges of creating art-like illustrations (also called non-photorealistic renderings) directly from 3D data such as MRI scanners. The images generated should be created in real-time from arbitrary anatomy datasets with little or no pre-processing or parameter tuning. In addition it should be possible to interactively manipluate the views for instance by instance zooming, clipping, rotating the rendered data.
     
  5. [TAKEN]Abstraction of images and videos Most forms of illustration or art imply some degree of abstraction, in other words simplifying the image to an optimal degree so that the most significant parts of the scene are retained, whilst extranous elments removed. In hand drawn art, the artist makes this choice based on skill and intuition. In computer graphics a number of techniques have been proposed to abstract images such as by retaining detail in important edges whilst smoothing other areas. This project will examine how Principal Components Analysis (PCA) could be used effectively as an automated means of abstracting images, models and 3D animations.
  6. [TAKEN]Spatial perception in games [This is an implementation project, best suited for a 4th Year FYP but could be extended for an MSc Dissertation]: The objective of the project is to implement a simple game or several mini-games to test how different rendering styles and display techniques affect user performance at spatial perception tasks. Some simple examples are 3D versions of the classic games Pong or Breakout. Many variants of these have been implemented but a major challenge for the user is accurately guaging how far away an object is supposed to be in the z-direction (coming out/ going into the screen). Proposed solutions to enhance a sense of depth include shadows, focal blur with distance (depth-of-field), stereoscopy, size, motion, parallax etc.
    Pre-requisities: students must have taken (or be in the process of taking) a computer graphics module or have some experience in 3D graphics programming.
  7. Output-driven optimization of 3D visualizations More details soon

Dr. Ivana Dusparic


Please note: Due to project selection timelines, I am currently accepting MSc/MAI projects only.

I am open to supervising projects developing novel artificial intelligence techniques and applying these techniques in intelligent urban mobility and transport systems, and smart cities in general.

In particular, I am interested in learning-based agents and multi-agent systems, with particular focus on reinforcement learning, including deep reinforcement learning, transfer learning, multi-agent collaboration and self-aware systems . I am interested in applying these techniques to emerging urban transport models and their impact on cities, eg intelligent urban traffic control, car sharing, ride sharing, mobility as a service, multi-modal travel planning, learning-based personalization of travel etc. I am particularly interested in applications leveraging large amounts of diverse sensor data for learning-based optimization.

I am also open to proposals applying learning-based multi-agent techniques to management and optimization of other large-scale infrastructures - if you have an interest in learning-based optimization and have an application area in mind, let me know!

Stephen Barrett

Social Software Engineering

My research is focussed on the identification of the unique contribution and impact of the software engineering practice of individuals and the teams they work in. My approach is to treat software engineering as a sociological phenomenon, with source code as the primary artefact of a constructive social network process. My aim is to develop toolsets that capture expert domain knowledge regarding software engineering practice and management in a form that enables us, in the language of Daniel and Richard Susskind, to externalise its qualitative assessment.

The technical basis of the approach is the application of defeasible/non-monotonic argumentation schemes, borrowed from proponents of the strong AI model such as John L. Pollock, but applied to the assessment of human behaviour rather than the replication of human decision making. We apply this method to infer judgements regarding software engineering practice, this analysis being grounded in data derived from code quality measurement, software development process monitoring, and a social analysis of software construction.

This research work is being conducted in the context of a Haskell based software platform that gathers and processes 'ultra-large' scale data sets regarding active opensource and closed source software development. Projects sudents will thus need to be willing at least to take on Haskell as a programming language. Prior experience is not necessary but you should consider yourself to be a strong programmer to work with me.

Some example topics from which suitable projects can be developed include:

  • Automation of Software Development Methodology Adherence Testing: the use of fine grained behavioural measurement regarding software engineering to quantify adherence to development methodology. In this topic, we are interested in delivering practical tools and methods by which software teams can encourage and monitor process development goals.
  • Privacy Preserving Gamification of Software Engineering Processes: the use of gamification in the assessment and management of software engineering processes. In this topic, we are interested in exploring how gamification can positively impact on the performance of teams and individuals.
  • Situated Learning Framework for Software Engineering Community of Practice: the development of a model for the automated identification and recording of engineering activity for practice learning. In this topic, we are interested in developing ways in which the best practice and skill of senior and experienced team members can be automatically packaged as learning resources for the organisation.
  • Sociometrics in Software Engineering: the use of sociometric and biometric data to predict individual and team performance in software engineering. In this topic, we are interested in studying the environment and social network structure of software engineering teams in order to provide actionable measures of team performance and health.
  • A Platform for Social Software Engineering Research: the development of a scalable platform for social software engineering analysis. In this topic, we are interested in developing high level domain specific languages to enable sophisticated bespoke analysis by non-technologists of social network and software quality data pertaining to the software engineering effort.
  • High Scale Code Quality Measurement: a data evolution based cloud platform for the efficient computation and continuous re-computation of code quality metrics. In this topic, we are interested in exploring how predictive relationships might exist between various possible ways of measuring software engineering, such that more efficient and rapid result computation can be achieved.

Please note that I am unfortunately unable to take on projects outside this broad research space.

If these topics interest you, do send me an email, briefly summarising your interest, and software development experience.

thanks,

Stephen.

Fergal Shevlin, Ph.D.

Room

Email

Lloyd UB.73

fshevlin@tcd.ie

Note!

    I feel that project ideas conceived by students are usually the most interesting. If you have any ideas related to the following then let's talk about them to see whether we can specify an appropriate project tailored to your own unique interests and abilities.

    Android Vision

    Projects in the general area of "Computer Vision" (viz. image processing and analysis,) implemented on the Android platform for mobile devices. Thus the programming language(s) required would be at least Java with possibly some C/C++.

    Mathematical Methods

    Projects in the general area of "Mathematical Methods for Computer Graphics, Computer Vision, Robotics, Physical Simulation, and Control" implemented using appropriate method libraries. The most appropriate programming languages are likely to be C/C++ or Python.

Dr Hugh Gibbons

Room

Extension

LG20

1781

Support for Literate Programming in Java

Literate Programming is defined as the combination of documentation and source put together in a fashion suited for reading by human beings. It is an approach to programming which emphasises that programs should be written to be read by people as well as compilers.
There are many tools available to support Literatre programming but they are mostly available on Unix systems and for programming languages such as Pascal and C. While Javadoc is available to document Java programs, the aim of the project is investigate the benefit of using Literate Programming in Java.

Using CASL to Model a software system

CASL (Common Aalgebraic Specification Language) offers the possibility of formal specification and development while allowing a more informal working style. The project would investigate using CASL to develop a formal model of some software problem which may or may not have already been previously presented in formalisms such as VDM or  Z. This model could then be informally translated into a programming language such as Java.

Developing Programs using Perfect Developer or How to develop a Java program without writing Java.

Perfect Developer is a program development system which allows one to develop provably correct programs. First one develops the program in the notation of Perfect Developer and then the system can verify the program written. Once one has a correct program, Perfect Developer can automatically translate the notation to Java, C++ or Ada.
(See Perfect Developer: How it works)

Imperative Programming within a Functional Programming Setting

While Functional Programming (FP) supports Lists better than Arrays, it is possible to write FP programs that are based on arrays. Since FP programs are side-effect free, it is usually easier to prove FP programs correct than Imperative programs. The aim of the project is to develop Java type programs within an FP setting.

Simulating Puzzles or Games in Functional Programming

Over the last many years there have been successful projects in using Functional Programming to provide animations or simulations of puzzle type programs. Since Functional Programming languages such as Haskell are very high level language, expressing solutions to puzzle type problems may prove easier than in imperative languages or declarative languages such a Prolog or Lisp/Scheme. Possible puzzle problems would be cryptarithms where one has to fill in the missing digits in a arithmetic calculation, logic puzzles or puzzles involving Graph Theory. Puzzles and games from the works of Martin Gardner would be an interesting starting point.

Support Systems for Teaching Logic and Logic Proofs

Systems such as Tarski's World and Hyperproof have proved very valuable in teaching an understanding of both propositional and predicate logic. These systems are part of a more general Logic project Openproof at Stanford's Center for the Study of Language and Information (CSLI). An alternative logic proof system is provided by Jape, a system developed by Richard Bornat and Bernard Sufrin which supports Bornat's book Proof and Disproof in Formal Logic . A more modern Logic Proof System, KE, has been developed by Marco Mondadori and Marcello D'Agostino with associated computer program systems WinKE by Ulle Endriss and LogicPalet by Jan Denef. It would be useful to provide support tools for these system so that these systems could be more widely used. An example of a logic support system is given by the Logicola system by Harry Gensler to support his logic book Introduction to Logic.

Ruler and Compass Construction within Vivio

Vivio is an animation tool that allows one to animate algorithms and simulations. The project would involve investigating the use of this tool for creating classical Euclidean constructions, for example, the construction of a pentagon using a compass and ruler.

Program Transformation of Z into JML

The development of the Java Modelling Language (JML) was influenced by specification languages such as Z. Many software projects make use of tranforming specifications into imperative programs. An example of this approach can be seen, in particular, in the book "Introduction to Z", by Wordsworth. The examples in Wordsworth's book could be used as a starting point in transforming Z specifications to programs into JML.

Annotated Java with JML

The Java Modeling Language (JML) is a behavioral interface specification language that can be used to specify the behavior of Java modules. It is based on the approach of Design By Contract  (DBC) The draft paper Design by Contract with JML (by Gary T. Leavens and Yoonsik Cheon) explains the basic use of JML as a design by contract (DBC) language for Java. See also Joe Kiniry (University of Copenhagen)  presentation, Introduction to JML. A given project would investigate the use of JML providing examples of its use.  For example, how would a program for Binary Searching an array be implemented in JML.

 

Developing High Integrity Code in Spark

Spark is a high level programming language designed for developing software for high integrity applications. Spark encourages the development of programs in an orderly manners with the aim that the program should be correct by virtue of the techniques used in construction. This 'correctness by construction' approach is in marked contrast to other approaches which aim to generate as much code as quickly as possible in order to have something to demonstrate. Quoting from the book on Spark  "High Integrity Software, the Spark approach to Safety and Security " by John Barnes
" There is strong evidence from a number of years of use of Spark in application areas such as avionics and raliway signalling that indeed, not only is the program more likely to be correct, but the overall cost of development is actually less in total after all the testing and integration phases are taken into account"
SPARK will be familiar to programmers with knowledge of imperative languages such as C, Java and Ada. There is some effort involved with learning how to use the annotations correctly. 
A project using Spark would involve the development of reliable programs that can be proved correct by the Spark system.

Dr Gerard Lacey

Room

Extension

TTEC

087 2396567

Project

I am currently a part time academic as I am currently working in a Trinity spinout www.surewash.com My main research areas are computer vision, robotics and augmented reality. My research focus is the development and empirical evaluation of mixed media solutions using real world problems.

Problem / Background

Augmented reality(AR) is the overlay of interactive graphics onto live video such that it reacts to the content of the video image e.g. selfie filters that track face movement. Mobile phones are becoming one of the main platforms for AR. This project focuses on the tracking of hands in mobile phone images for gesture recognition, content overlay and gaming. General purpose hand pose tracking is a complex problem but custom hardware solutions: www.leapmotion.com , www.usens.com and complex software libraries: www.manomotion.com are available.

One of the biggest challenges is achieving high speed and reliable segmentation of the hands against real-world backgrounds and under variable lighting conditions. The next main challenge is the identification of the fingers and matching them to a hand pose model. If the hand gesture problem can be constrained this may be simplified and good performance achieved.

Solution / Goal of the Project:

This project will aim to develop a mixed reality application that will allow someone to “try on a glove” using their mobile phone. The goals of this project are to:

  • Reliably segment the hands on a mobile phone
  • Recognise and track the orientation of the hands
  • Render a 3D glove model over the live video image aligned to the hands
  • Develop a solution for finding an accurate measure of hand size
  • Develop a prototype application on iOS or Android
  • Perform user testing and formal evaluation of performance

The application will be developed using www.unity3d.com and www.emgu.com .

Glenn Strong

Room

Extension

ORI G.15

3629

Many of these projects involve some knowledge of functional programming. No prior knowledge is needed before starting the projects, we provide support for learning this new programming paradigm for students who have not been previously exposed to it. Of course, if you already know a language like Haskell you'll be able to start the project a little quicker.

Functional programming and Live Music Composition

This project would involve working with the Haskell embedded DSL "Tidal" to produce one or more variants on the existing system. Some interesting options could include:
  • A Tidal server which would allow performers to publish (stream) their live performances so that users in other locations could experience them.
  • A more ambitious project would allow for users to remotely collaborate on live performances, perhaps using one of the existing collaboration frameworks (ShareDB, TogetherJS, etc)

Drag and drop Python

This project would involve building a structured programming editor for the Python programming language, probably using an existing framework such as Google's Blockly. While there are some tools that can generate Python from Blockly programs the source that the users work with don't tend to look much like Python programs. The goal with this project would be to provide a Python-oriented system (perhaps in the style of the old Carnegie Mellon structure editors, or the Raise Toolkit)

An initial version of this wouldn't need to parse existing Python programs, only provide the editing environment, and perhaps capture the output of a running program. There are a lot of potential extensions possible for this project, depending on whether the student's interests.

Other projects

I am happy to discuss project ideas in the Functional Programming, Literate Programming, Theoretical Computer Science, or other similar areas. If you have a specific project in mind then send me an email. I am also willing to discuss software implementation projects with a bias towards rigour (using formal techniques, or design-by-contract ideas). I am also interested in creative ways to support novice programmers and in the study of Free Software related projects,

Mads Haahr

Room

Extension

Stack B, 2013

1540

AI for card-based game

Gambrinous is an indie game studio based in Dublin, best known for Guild of Dungeoneering. They are now working on their second game and have an opportunity for someone to improve the AI for the project.

The game is a 1-on-1 card game in the style of Magic: The Gathering, Ascension, or Hearthstone. The player plays against an AI opponent taking turns playing cards until one wins. We have a prototype built in Unity with a very simplistic AI.

Gambrinous are looking for the following improvements:

  • an improved tactical AI that plays smarter with the cards it has in a given turn
  • a strategic AI that works towards a plan or strategy over the course of many turns in a single match
  • the game involves blind-bidding for resources using the cards from your hand each turn. We'd like to see an AI that did things a human opponent would do (like bluffing or delaying tactics to see what the opponent is going for)
  • varied levels of AI strength (eg for easy/hard difficulty) and AI playstyles (eg for aggressive / rushing / slow build styles of play)
  • as part of this work it would also be interesting to pit AIs against each other in automated matches (also a great way of testing AI tweaks).

Lucy Hederman

Room

Extension

ORI G.13

2245

My proposed projects/areas should suit MSc CS students on the Intelligent Systems and Data Science strands. Some projects may be suitable for Final Year Projects for ICS and CSB students. To discuss, please email me at hederman@tcd.ie

Broadly I am interested in "data wrangling" for health IT and clinical research purposes.

The following projects relate to the AVERT project which is concerned with predicting relapses (or flares) of ANCA vasculitis, a relapsing and remitting rare autoimmune disease that results in rapidly progressive kidney impairment and destruction of other organs. Epidemiological data seem to show a strong environmental impact on relapse in ANCA vasculitis, though it is unclear which exactly environmental factors are responsible for this. The rapidly emerging discipline of data science - alongside massive increases in computing capability, machine learning and artificial intelligence - is poised to allow the incorporation of such highly complex health big data environments, and the generation of outputs with potential applicability in personalised medicine. We aim to integrate a wide array of unstructured data streams to define the signature of relapse of the disease. We believe this approach will represent a new paradigm in managing chronic conditions governed by interaction between patient-level factors and their environment, and could be scaled up if successful for use for other autoimmune diseases.

Data integration for AVERT is using linked data principles. Different streams of data are combined in an RDF triple store.

RDF modelling of data coming from the AVERT app - TAKEN

For the AVERT project, patients with the disease are using an app to capture data which may help to determine patterns which lead to flares of the disease (see above). We wish to "uplift" this data into the AVERT RDF data store to connect it with clinical data, weather data and pollution data. Issues which may lead to interesting research questions are around anonymity, appropriate representation of location data, appropriate provenance metadata, etc.

AVERT uplift workbench

This project involves developing a suite of Eclipse plugins to support uplift of data into RDF. (Details to follow.)

Making an air pollution model available as a service for AVERT:

Environemntal engineering colleagues have devleoped a model of air pollution in Ireland using ArcGIS. This project involves turning that model into (1) a service that can answer queries such as "What was the level of silica at location X on date D?" and (2) a user interface to facilitate engagement with the model.

Making clinical research project data shareable outside the project

The AVERT project hopes to make its data available, in structured, semantically interoperable, de-identified, form, to other researchers, as part of an "information commons". This project will need to explore a broad range of technical and non-technical issues in devising a safe, useful and usable solution. How do bio-scientists work with data? How does data protection impact science? How do we ensure shared meaning of data? How do we protect patient identity? etc.

AVERT application prototype - TAKEN

The end goal of AVERT is to have a "realtime" decision support system to predict flares from realtime patient and environment data and advise clinicians on treatments. This project will develop a prototype of such a system, in anticipation of a future flare prediction model. It will combine data from mobile app, clinical records, and location based weather and pollution data (where available), feed it to a "black-box" flare model, and present the output in some form that would be useful to clinicians, and possibly patients.

Driving AVERT app user engagement

(Suitable for an MSc CS (Information Systems) student).

This project seeks to use state of the art (machine learning) techniques to source and serve content of relevance for the vasculitis patient group, with a view to increasing patient engagement with the app.

Other project - TAKEN

The final project is not AVERT related, but is not dissimilar.

Serve probability data to an infectious disease risk prediction system - TAKEN

We are developing a service to predict a person’s risk of being infected by a disease, given personal details (e.g. age, occupation) and environmental factors (weather, terrain) at their location. The probabilistic prediction at the core needs a service to provide probability data (e.g. probability of being under 5 in Indonesia; current incidence of TB in Indonesia) from online sources such as WHO, UNSD. The project could be broadened to provide a range of tools over the demographic and public health data, including an interactive user interface with visualisations. It would involve researching current systems and state of the art approaches. (The narrower project might suit a final year student).

Updated 18 October 2017

Dr. Hitesh Tewari

Room

Extension

Lloyd 131

2896

Last Updated 30th May 2016


Fully Anonymous Transferable E-Cash (Assigned)

In this project we will further develop and implement an anonymous E-Cash scheme which allows for the unlimited transfer of coins within the network. The work builds upon the foundations of David Chaum's "Blind Signature Protocol"and combines it with the "Discrete Logarithm Problem" to allow for delegated signatures and anonymous transfer of coins. The system also makes use of "Blockchain" technology for the network participants to collectively verify the authenticity of coins in the system to prevent double-spending of coins.

The student who undertakes this project will be required to familiarize themselves with the number theory aspects of the cryptographic primitives used within the FATE system. They will be required to study and enhance the existing FATE protocols. More details about the FATE system can be found in this paper. Finally, they will be required to implement a working prototype of the system.

SSLChain - Secure Login App (Assigned)

The basic premise of the SSLChain technology is to make use of "Blockchain" technology to store and disseminate X.509 certificates. This project is a continuation of work that was carried out in a FYP in 2015/16 whereby the student built a basic prototype of the backend using the Bitcoin distribution, and developed a cryptographic add-on for Gmail.

In this project we would like to refactor the backend and make it more robust. We would then like to build a secure login application that allows for a one-time-password (OTP) to be sent to a user who is trying to login to a web service. This secure OTP mechanism will eliminate the need for passwords to be remembered by users for various websites that they use of a daily basis.

The student undertaking this project will be required to have a good understanding of cryptography, and in particular the specific protocols used within the SSLChain system (X.509, Blockchain, XMPP etc.). They will quickly need to familiarize themselves with the Bitcoin distribution and the current codebase. Finally, they will be required to develop a robust prototype that we hope to trail on the College network.

Secure Voting Protocols in the Irish Context of a Single Transferable Vote (Assigned)

You may remember the fiasco of the electronic voting machines that the state bought in 2002 at a cost of millions of Euros, and which were finally scrapped in 2012 because they were deemed to be unsafe, as no one could guarantee the correctness of the results produced by the machines.

Keeping that in mind in this project we would like to explore the state-of-art in secure voting protocols today, with a view of developing a system for the Irish context of a single transferable vote (STV) i.e. the proportional representation system that we have in Ireland. We would like to explore the possibility of using "blind signature" protocols, Blockchain technology, threshold cryptography etc. to try and solve the problem of electronic voting in Ireland.

The student that takes up this project will be required to develop a solid understanding of the Irish electoral system. They will be required to study in detail the state-of-art in the area of secure voting protocols which in itself is a challenge as this will require additional study of cryptographic primitives and number theory. As a result of our understanding of the above two we will then develop a secure STV protocol, and develop mathematical proofs to backup our claims.

Banning Cookies - What's Next? (Assigned)

As users have become both more aware and wary of cookies - the technology that tracks browsing activity for advertising purposes many people, therefore, avoid cookies, either by turning them off or using services that block them. Companies in turn have started experimenting with new tracking methods that don't use cookies.

One of those concepts includes "user-agent fingerprinting" a technique that allows a web site to look at the characteristics of a computer such as what plugins and software you have installed, the size of the screen, the time zone, fonts and other features of any particular machine. Another mechanism is to track users using some other persistent local/client side storage. Your browser transmits all sorts of information that has nothing to do with cookies. All of those things put together form a unique identity that can be assigned an identifying number and then used just like a cookie.

In this project we would like to explore the various new techniques that are currently being developed or used to track users. Armed with this knowledge we would like to develop a browser add-on (similar to ad-blockers) that will prevent transmission of machine/user specific information which allows organizations to track users.

Meriel Huggard

Room

Extension

Lloyd 1.09 (inside lab 1.11)

3690

I'm happy to supervise projects at both senior undergraduate and MSc level. In recent years I've predominately supervised MSc projects.

I'm offering a variety of project in 2017/18, some of which include the opportunity to collaborate with colleagues in the USA.

Project I and II: Using Machine learning to monitor and predict Quality of Experience AVAILABLE

    The separate areas of machine learning and wireless/cellular connection quality management (through both Quality of Service(QoS) and Quality of Experience(QoE)) have rarely been combined. Recent research has led to algorithms which focus on providing solutions which employ machine learning models to enable distributed QoE monitoring and prediction. This project seeks to evaluate the performance of these novel algorithms and to compare their performance with existing QoE estimation techinques. .

Project III: Quality of Experience based Admission Control in Cellular Networks AVAILABLE

    5G networks are expected to be able to accommodate a very large number of wireless devices and users. One way of handling the high volume of traffic expected on these networks is through the use of small cells and by handover to wifi and other networks. This project will use simulation methods to evaluate the performance of admission control and handover algorithms for these systems for different traffic mixes and loads.

Project IV: Control Systems AVAILABLE

    Dynamic Watermarking in Cyper-physical systems
As we deploy large scale networked cyber-physical systems, they become more vunerable to attack. For example, sensor data may be tampered with, causing actuators to become malicious agents. One approach that can be adopted to mitigate this is to watermarking the data transmitted by the sensors, so that the system can detect when signals have been tampered with. This project will explore and evaluate watermarking techniques for such large scale distributed systems.

Project V: Your project ideas... AVAILABLE

    If you have interesting ideas in the networking/communications/control domains pelase feel free to get in touch. You will need to be able to clearly articulate (i) what you are proposing to do, (ii) what the underlying research question/hypothesis you intend to explore is, (iii) why the outcomes of your work will be of interst to others and (iv) why you want to do this project.
For futher information or to arrange a meeting: :wq:wqMeriel Huggard

Georgios Iosifidis

SCSS and CONNECT Centre

Lloyd Institute

 

 

Project 1

 

Title: Resource Orchestration in Hybrid Cloud-Fog Networks

 

Background: The emerging fifth-generation (5G) wireless networks are expected to offer various mobile services such as ultra high definition video delivery, augmented reality  services, and machine learning-based applications. Due to the limited capabilities of the users devices, these demanding mobile services can only be delivered with the support of cloud computing and storage resources. The latter can be located in distant data-centers, or in proximity with the end-users e.g., in Cloudlets or even nearby mobile devices. According to Forbes and Economist [1, 2], Cloud and Fog computing solutions are attracting increasing interest as a promising and cost-efficient solution for next generation communication networks. However, these hybrid architectures induce substantial network bandwidth costs as well as very high energy consumption in data centers, especially under high-load conditions. This is currently one of the largest obstacles hampering the large-scale adoption of these promising solutions.

 

Goals: In this project, we will design algorithms for jointly optimising the allocation of computation, storage and communication resources that are located at the Fog (in proximity with the devices) or the Cloud, aiming to increase the quality of the offered services and reduce the systems expenditures. We will leverage android programming tools (e.g., https://developer.android.com/studio/index.html based on Java) to make mobile  applications and MATLAB programming to execute trace-driven large-scale simulations and data processing. In the final step, this project will analyse the quality of service of various cloud-based applications and efficiency of the resource management by performing experiments using mobile devices, local computing/storage servers and cloud platforms, e.g., Microsoft Azure [3]. 


Student Info: This project is particularly suitable for M.Sc and MAI students interested in Cloud/Fog architectures and (i) system modeling and analytical methods (i.e., optimisation) and/or (ii) system design and performance evaluation. The student will collaborate with Dr. J. Kwak (https://sites.google.com/site/jeonghokwak/home) and Dr. G. Iosifidis (www.FutureNetworksTrinity.net); and will have the opportunity to participate in ongoing (fast-paced) research projects and acquire important analytical and technical skills.

 

References

[1] Economist, Shifting computer power to the cloud brings many benefits—but dont ignore the risks, URL: https://www.economist.com/news/leaders/21674714-shifting-computer-power-cloud-brings-many-benefitsbut-dont-ignore-risks-skys-limit

[2] Forbes, Is Fog computing the next big thing in Internet of Things? URL: https://www.forbes.com/sites/janakirammsv/2016/04/18/is-fog-computing-the-next-big-thing-in-internet-of-things/#64279faf608d

[3] Microsoft Azure, URL: https://azure.microsoft.com/en-gb/?&wt.mc_id=AID623280_SEM_

 

 

Project 2

 

Title: Economics of the Internet of Things

 

Background: The promise of the Internet-of-Things (IoT) is to enhance our physical world with connected and intelligent devices that can respond in real-time to environmental conditions, perform tasks with increased precision, augment human capabilities by operating in a semi-autonomous fashion, and improve resource utilization. Applications of IoT can be found in manufacturing, traffic control, energy grid, electric vehicles, environment monitoring and many other domains [1], [2]. IoT is expected to have a profound impact on our economy and society and is currently subject to intensive research in industry and academia. A particularly promising feature of IoT devices is their capability to interact with each other so as to jointly perform a task, or coordinate the execution of their missions. For example, consider a set of sensors that jointly monitor certain environmental parameters (e.g., air pollution) in a given area. The sensors might belong to the same or different business entities (e.g., different companies) and might have overlapping coverage. This enables them to cooperate and exchange measurements or support each other in case of failure, improving this way their performance and reducing their costs.

 

Goals: In this exciting era of ubiquitous connectivity that extends from humans and large systems to small-scale devices, a new type of cyber-physical economy emerges offering novel opportunities for fruitful collaboration among the users and their devices. In this project we will design and evaluate algorithms that enable IoT devices to cooperate by exchanging resources (such as energy and wireless bandwidth) and jointly improve their performance. We will combine tools from dynamic optimization and game theory to develop solutions that achieve efficient equilibriums [4]. Different market scenarios will be considered, ranging from fully decentralized (peer-to-peer) to hierarchical markets where more resourceful users/devices sell their resources or services to smaller IoT nodes. The designed algorithms will be thoroughly evaluated in Matlab and/or R.

 

Student Info:  This project is suitable for students interested in algorithms, optimisation, market mechanisms (game theory) and IoT business models. The student will collaborate with Dr. G. Iosifidis and CONNECT [3].

 

References

[1] L. Atzori, A. Lera, and G. Morabito, The Internet of Things: A Survey, Elsevier Computer Networks, vol. 54, no. 15, 2010.

[2] Cisco, Fog Computing and the Internet of Things: Extend the Cloud to Where the Things Are, White Paper, 2015; URL: https://www.cisco.com/c/dam/en_us/solutions/trends/iot/docs/computing-overview.pdf 

[3] CONNECT Centre, Pervasive Nation IoT Platform; URL: https://connectcentre.ie/pervasive-nation/

[4] G. Iosifidis, and L. Tassiulas, Dynamic Policies for Cooperative Networked Systems, in Proc. of ACM NetEcon Workshop, 2017, Boston, USA.

 

Dr Jonathan Dukes

Email

jdukes@scss.tcd.ie

Room

Room F.27, O'Reilly Institute

Firmware Updates for Bluetooth Low Energy Devices in the Internet of Things

Several projects related to this topic are proposed. Traditional firmware updates for the constrained devices we find in the Internet of Things (IoT) employ a very simple approach. An update controller transmits the new firmware image in its entirety to the target device. This is wasteful of constrained target communication and memory resources.

Alternative approaches that aim to reduce this cost are based on modular updates (updating a small part of the firmware only) and incremental updates (transmitting only the differences between the old and new versions.)

Projects in this area will investigate both of these approaches for constrained devices that communicate using Bluetooth Low Energy (BLE).

  • [TAKEN] Incremental Updates: This project will investigate the use of incremental over-the-air firmware updates for BLE Devices. The key contributions of the project will be (i) identification of one or more appropriate representations of the binary difference between two firmware viersions, (ii) extension of an existing BLE over-the-air firmware update protocol to support incremental updates, (iii) impementation of a prototype firmware update controller and target, including (iv) impementation of an algorithm to apply the firmware update in flash memory on the target device and (v) a performance evaluation of the prototype system.
  • [TAKEN] Modular Updates: This project will investigate the potential to perform modular updates of parts of the firmware on a target device. A number of approaches will be considered, but all approaches are likely to involve linking firmware modules on the target device, representing a significant departure from the "monolithic" firmware approach widely used in industry. The contributions of the project will be (i) identification of a suitable model for developing modular firmware (granularity), (ii) extension of an existing BLE over-the-air firmware update protocol to support modular updates, (iii) impementation of a prototype firmware update controller and target, including (iv) impementation of an algorithm to apply the firmware update in flash memory on the target device and (v) a performance evaluation of the prototype system.

Opportunities for a collaborative comparison of the prototypes developed by the above two projects will be explored, including opportunities to implement hybrid (incremental/modular) approaches.

Remote Patient Monitoring in the Internet of Things

Two projects are proposed in this area. Both will explore the application of emerging, standards-based Internet of Things transport protocols to remote monitoring of patients (biometric data). It will be assumed that monitoring (e.g. temperature, motion, blood pressure, heart rate, ECG, EMG) is performed by resource constrained, wearable devices.

In both cases, communication will be based on RFC7668 (IPv6 over BLUETOOTH(R) Low Energy) and other suitable higher-level protocols, making the constrained sensor nodes addressible IPv6 devices.

  • [TAKEN] On-Demand and Event Driven Remote Patient Monitoring: This project will explore two models for communicating biometric data from the constrained sensor device to a central or cloud-based service. In the on-demand model, sensors will store a small amount of recorded data in flash memory. By implementing a simple (standard-based?) query language, central services will be able to query the stored data, with query results transmitted back to the central service. In the event-driven model, in addition to storing data, sensors will communciate events of interest (e.g. when the measured heart rate exceeds some configurable threshold.) Again, the application of a standards-based protocol (e.g. MQTT) will be explored.
  • [TAKEN] Real-Time (live) Remote Patient Monitoring: This project will explore the feasibility of transmitting real-time biometric data (e.g. ECG, EMG, motion) data from strained sensor devices to a remote central server. Communication will be over Bluetooth Low Energy initially to a router, which will forward the data to a central or cloud-based monitoring service.
  • [TAKEN] Remote Patient Monitoring using LPWAN Communication

Last update: Friday 02 June 2017

Dr Jeremy Jones (updated 25-Sep-17)

F.11 O'Reilly Institute top floor

1.      A typical bioinformatics ancient DNA analysis program (eg BWBBLE which is written in C) involves finding the location of short DNA sequences in a reference genome. The reference genome contains approximately 3x109 (billion) base pairs and the short sequences 30 - 300 base pairs. The DNA sequences are stored in text files. At first sight, the algorithms are naturally parallel. This objective of this project is to speed up the analysis by investigating a number of different approaches (i) more efficient implementation of the multi-threaded algorithms by, for example,  taking advantage of  the streaming SIMD extended instruction set (SSEx) (ii) using a compute cloud (iii) using a graphics processor and (iv) using an external FPGA. Each approach could be a separate project. This project is being carried out in consultation with the Genetics Department.

2.      In 2013, Intel introduced the Haswell CPU which supports retricted transactional memory (RTM) through the TSX extension to its instruction set. TSX can greatly simplify the implementation of many parallel algorithms. The objective of this project is to develop a self-tuning parallel implementation of a skip list (or any other interesting data structure or algorithm) that can determine dynamically the optimum settings needed to maximise throughput. This project would particularly suit students taking CS4021.

3.      Most of you will have used the Vivio animations as part of the CS3021/3421 Computer Architecture II module. The Vivio animations have now being re-implemented using JavaScript and HTML5 so that they are truly portable and can run in any web browser. The objective of this project is to implement a touch interface (in JavaScript and HTML5) so that the animations can be controlled on a phone or tablet where there is no mouse or mouse wheel.

4.      Have you ever had difficulty remembering someone’s name? Imagine what it’s like being a lecturer standing in front of a large class trying to remember student names. The object of this project is to use a Google Glass equivalent and face recognition software to recognise student faces and project their names onto a heads up display in real time.

These projects could form the basis of a final year project, year 5 dissertation or taught MSc dissertation.


 

Prof. Khurshid Ahmad

    My principal area of interest is in artificial intelligence, including expert systems, natural language processing, machine learning and neural networks. My other area of interest is in the ethical issues of privacy and dignity in the realm of social computing. I have recently finished a largish EU project on the impact of social media (monitoring) in exceptional circumstance (slandail.eu) C this brings together social media analytics (natural language processing/image analysis), geolocation, and ethics. My research has been applied to price prediction in financial markets and in forecasting election results. The projects on offer in this academic year are:

  • 1. Social Media Analytics and monitoring:

    Microbloggs and social networks are large and noisy source of data that is valuable for marketing and sales specialists, law enforcement agencies, disaster NGOs, and policy makers. This project will help you in acquiring social media data, in using natural language processing techniques for processing this data, and techniques for visualise the results of the analysis. You will be expected to include a brief discussion of questions of privacy and ownership of social media users. A proficiency in Java and/or Python is required for this project.

  • 2. Sentiment Analysis:

    This is an exciting branch of computer science that attempts to discover sentiment in written text, in speech fragments and visual image excerpts. The sentiment is extracted from streams of texts (messages on social media systems, digital news sources) and quantified for inclusion into econometric analysis or political opinion analysis systems that deal with quantitative data like prices or preferences: the aim is to predict price changes or ups and downs of a political entity. You will write a brief note on questions of divulging identities of people and places. You have an option of developing your own sentiment analysis system or use a system developed in my research group.

  • 3. Machine Learning and Big Data:

    Large data sets, for example genomic data, high-frequency trading data, meteorological data, and image data sets, pose significant challenges for curating these data sets for subsequent analysis and visualisation. Automatic categorisation systems, systems that have learnt to categorise arbitrary data sets, are in ascendance. One fast way of building such systems is to integrate components in large machine learning depositories like Googles TensorFlow, MATLAB, Intel Data Analytics, to build prototype systems for text, videos, or speech streams for instance. Issues of data ownership will be briefly outlined in your final year project report.

Dr. Kenneth Dawson-Howe

I supervise projects which have a computer vision component (and most particularly in the area of object tracking at the moment!!). To give you a feel for this type of project have a look at some previous projects. For further information (about project organisation, platform, weighting for evaluation, etc.) see this page. If you want to talk about a project or suggest one in the area of computer vision my contact details are here.

The following are my proposals for 2017-18. If any of them catch your imagination (or if you have a project idea of your own in computer vision) come and talk to me or send me an email.

Illustration. Project Details.
Image linked from another site

(TAKEN) Queueing time for the Book of Kells. This project aims to determine how long people have to wait from joining the queue for the Book of Kells until they get admitted to the library building. We will have to apply for permission to take videos of the queue probably from somewhere in the Arts Block....

Image linked from another site

(AVAILABLE) A tool for organising personal photographs. Many people have years of digital photographs and videos organised into folders (or not) for each event or period of time. Most of the pictures will have peculiar names which are generated by the camera. The pictures

  • may come from multiple cameras,
  • probably have almost useless names (generated by the camera),
  • could have incorrect date information (as many digital cameras easily lose the current date and time),
  • may or may not have location metadata,
  • may be out of focus
  • may have poor contrast
  • may exhibit red eye
This project aims to create a tool to manage personal digital photos allowing photos to be grouped, renamed, identified with different levels of importance (so that short or long photo slide shows can be created), enhanced, annotated (describing what/who is pictured), etc. The scope of this project is up to the student Note, though, that there is no real research component to this project so it is only appropriate for a year 4 student.
Image linked from another site

(TAKEN) CPR Assistant & Assessment.Cardiopulmonary resuscitation (CPR) is a technique used to keep oxygenated blood flowing in the human body when the heart and breathing of a person have stopped. It is a repeated combination of chest compressions (30 in a 15 second period) and artifical breaths (2 within a 5 second period), and has to be continued until other measures are taken to restore spontaneous operation of the heart and breathing. This project aims to create a mobile phone based assistant which (through analysing a video of CPR being performed) will guide the person giving CPR in the rates of compression and the rates & timings of breathing.

Image linked from another site

(TAKEN) Detecting product placement OR smoking. Increasing advertising is being embedded into images and videos, and the detection of product logos has become of significant importance. Datasets of logo images are available (See http://www-sop.inria.fr/members/Alexis.Joly/BelgaLogos/BelgaLogos.html and http://www.multimedia-computing.de/flickrlogos/) and this project aims to develop software to automatically detect product placement in videos (e.g. in movies), and perhaps develop a metric quantifying the amount of embedded advertising... Or we could look at how much smoking happens in a video...

Prof Doug Leith

The following projects are related to recent research in online privacy and recommender systems being carried out here in TCD, see www.scss.tcd.ie/doug.leith/. They could form the basis of a Year 5 dissertation or of a final year project.

Recommending WiFi/4G Access Points/CellsTAKEN

A recommender system collects information on items that a user likes/dislikes e.g. items which the user has bought, rated, viewed or clicked (such as a movie, hotel review, news item). By comparing this information with the information for other users, the system tries to predict which new items the user might like, to predict the rating which the user might give for a new item, what adverts/services are most likely to be of interest to the user etc. Such recommendations are often a core part of personalised web services. One popular approach to making recommendations (and which won the netflix prize) is based on matrix factorization, where a matrix R containing user-item ratings is approximated by product UV where the inner dimension of U and V is much smaller than the number of users or items (see reference below for more details). In this project the aim is apply this approach to recommending WiFi hot spots to users based on ratings supplied by other users plus information on their location etc. We will also investigate use of recent clustering approaches to enhance privacy, e.g. with regard to location. This project will require familiarity with matrices/linear algebra, probability and ideally python programming.
Reference: Y.Koren, R.Bell, C.Volinsky, 2009, "Matrix Factorization Techniques for Recommender Systems", http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf

Pluggable Tor Transport

The Tor browser supports use of pluggable transports to allow shaping of traffic sent over the network so as to obfuscate its nature. These transports can also be used with VPN tunnels, e.g. see Shapeshifter-dispatcher. The aim of this project is to implement a pluggable transport based on recent research here in TCD on making making traffic resistant to timing analysis attacks without the need to introduce high latency or many dummy packets. We'll take one of the existing Tor transports as a starting framework and then modify it as needed. The project will require good programming skills, but its a great chance to contribute to Tor's development and improve existing VPNs.
Reference: Feghhi,S., Leith,D.J., "On Making Encrypted Web Traffic Resistant to Timing-Analysis Attacks", https://arxiv.org/abs/1610.07141

Privacy-Enhanced HTTPS

Recently a number of successful attacks have been demonstrated against encrypted HTTPS web traffic. Even though the contents of packets are encrypted, the packet size and timing information is often sufficient to allow details of the web pages being browsed to be inferred with high probability. Similar approaches can also be used to successfully attack VPNs and Tor. In this project we will look at possible defences against such attacks, focussing particularly on defending against timing attacks since these are amongst the hardest to defeat. The aim will be to implement recent server-side defences exploiting the push feature in HTTP/2 on either apache or node.js. The project will provide a good opportunity to learn about the next generation of web technology http/2. It will require good programming skills.
Reference: Feghhi,S., Leith,D.J., 2015, "A Web Traffic Analysis Attack Using Only Timing Information", Technical Report, http://arxiv.org/abs/1410.2087

Mobile Handset Anomaly DetectionTAKEN

Mobile handsets are largely black boxes to users, with little visibility or transparency available as to how apps are communicating with the internet, trackers etc. Handsets are also potentially compromised devices in view of the relatively weak security around apps, and so monitoring activity in a reliable way is important. This project aims to carry out a measurement study to record actual mobile phone network activity with a view to making this more visible to users (via a dashboard) and highlighting anomalies/potentially interesting activity. By routing traffic through a VPN we can log traffic externally to the phone in a straightforward way, so the main challenges are (i) organising a study to collect data for multiple users, (ii) developing a usable dashboard for them to inspect their own traffic and (iii) exploring machine learning methods for classifying traffic and anomaly detection. This project will include the opportunity to collaborate with Dublin-based startup Corrata.
References:
AntMonitor: Network Traffic Monitoring and Real-Time Prevention of Privacy Leaks in Mobile Devices
Haystack: A Multi-Purpose Mobile Vantage Point in User Space

OpenNymTAKEN

The project will involve implementing a peer-to-peer system for web cookie sharing/management. Our online "identity" is largely managed via cookies set by web sites that we visit, and these are also used to track browsing history etc. With this in mind, recently a number of approaches have been proposed for disrupting unwanted tracking by sharing and otherwise managing the cookies presented publicly to the web sites that we visit, the aim being to allow personalisation to still occur (so simply deleting cookies is not a solution) but avoiding fine-grained tracking of individual user activity. The project will involve implementing a browser plugin for cookie management and selection plus a service for cookie sharing. The project will require good programming skills.

Wireless Augmented RealityTAKEN

There is currently much interest in adding augmented reality to mobile handsets. Augmented reality systems typically involve heavy computational burdens e.g. using deep learning to tag objects in the scene currently being viewed. This computation is offloaded to a server or the cloud. Currently most augmented reality systems are tethered i.e. connected to this server via wires, in order to ensure that there exists a high bandwidth low delay connection between mobile handset and server. In this project we will investigate the impact of replacing this wired link with a wireless WiFi/LTE link and using local processing within the mobile handset to mask network impairments (latency, loss). The project will involve android app development, image processing and simple socket/WebRTC programming.

Dr. Michael Manzke

https://www.cs.tcd.ie/Michael.Manzke/

Room

Extension

02-012 Stack B

2400


Some general info

I am a member of the Graphics, Vision and Visualisation research group interested in the areas of:

  • michael
  • peter
  • tbd
  • tbd
  • tbd

Suggested Projects:

  1. Topics
    • tbd
    • tbd
    • tbd
    • tbd
    • tbd
  2. tbd
    • tbd

Dr. Kris McGlinn

3D Building Information Modeling with Googles Tango

Googles project tango is a tool for smart phones and tablets for creating 3D models of buildings (or apartments) by simply carrying the device around! In this project you will explore the use of tango to develop 3D models, and examine methods for annotating and linking these models to existing building information models standard like IFC (Industry Foundation Classes). You will examine how objects are identified and labeled using the tango sdk, you will see if you can tag those objects and export them as IFC entities. You will see whether walls and other objects can be given additional properties, like materials and thickness. This integrated data can then be used for different applications, ranging from navigation too energy simulations.

Dr Richard Millwood

Room Lloyd Institute 0.29
Extension 1548

Keywords: education - programming - HCI - emulation - learning design

My background is in education and technology, I am course director for the MSc Technology and Learning. You can read more about me at richardmillwood.net and you can also have a look at my recently completed PhD by Practice.

Here are some areas of interest which may inspire a project, some based on my research plan:

1. Learning computer programming

Two ideas here:

  1. Developing in Blockly to meet some of the challenges made in my blog on Jigsaw Programming
  2. Constructing an online research instrument for tapping in to teachers' tacit knowledge about teaching computational thinking. This may be an extension using Python to an existing content management system such as Plone to add functionality for innovative interactive forms of survey.

2. Collaborative support in learning computer programming

This is directed at 'educational github' to suit young learners. The development will create an interface that better clarifies and support the roles and workflow in collaborative work online so that this can be more readily learnt in use. It is not clear exactly what software development would be appropriate and would suit someone with imagination and drive to be very creative.

3. UK National Archive of Educational Computing apps

The design and development of device-responsive educational apps (for mobile, tablet and web) based on historical educational programs, such as Snooker:

  1. Original Snooker - from 1978 in BASIC
  2. Prototype Snooker Angles - from 2013 as iPhone Web app

Key features are that the app includes a faithful emulation of the original educational program as a historical account, and that the modern app maintains similar educational objectives but may be updated to take advantage of new technology and new pedagogy. The app must be able to scale appropriately and work on phone, tablet and web page. This is an HTML5 development project using Scaleable Vector Graphics for interactive visuals.

I have a list of apps that I have prioritised to support the the UK National Archive of Educational Computing.

Dr Martin Emms

Room

Extension

O'Reilly LG.18

1542

I would be interested in supervised FYPs which centre around applying computational techniques to language.

Machine Learning and Word Meanings

An interesting question is the extent to which a machine can learn things about word meanings just from lots of textual examples, and there has been a lot of research into this (see Wikipedia intro), all based on the idea that different meanings show up in rather different contexts eg.

move the mouse till the cursor ...

dissect the mouse and extract its DNA ...

Several kinds of project could be attempted in this area, starting either from scratch or building on code that I could supply.

  1. One kind of system would do what people call unsupervised word sense disambiguation, learning to partition occurrences of an ambiguous word into subsets all of which exhibit the same meaning.

    Someone last year added the diachronic twist of attempting to recognise, from time-stamped text, that semantic matters concerning that word have undergone a change over a period of time: mouse (1980s) and smashed it (last 10 years?) have acquired novel meanings.

  2. Another possibility is to investigate to what extent it is possible to recognise that a particular word combination has a non-compositional meaning, that is a meaning not entirely expectable given its parts, for example that shoot the breeze means chat

There are a number of corpora that can be used to drive such systems, such as the Google n-grams corpus, spanning several hundred years (viewable online here, also available off-line)

Others

I have interests in (and sometime knowledge in!) a number of areas in which a project would be possible

Projects Exploiting Treebanks
We have copies of several so-called treebanks, which are large collections of syntactically analysed English. Each treebank contains a large number of items of the following kind

\includegraphics[width=3in]{grace.eps}

One issue that such corpora allow the empirical exploration of is whether or not Multiple Centre Embedding occurs in English (left, right and centre embedding is illustrated below):

[S I think [S he said [S I'm deaf]]] right embedded
[NP$_{gen}$ [NP$_{gen}$ john's] father's] dog left embedded
[NP the candidate that [NP the elector that I bribed] chose] centre embedded

Tree distance
Ways to calculate the difference between two trees, and things you might do with that (a paper)

Lambek Calculus Categorial grammar
for anyone keen on Logic, a logic inspired way to do grammar

Continuations of projects from 15-16

Virtual Erasmus
Erasmus students spend a year of their study in a distant land. The idea is to simulate as much as possible of that process into a program that outgoing students could interact with, probably telescoping a year out into a few weeks. One part would be to emulate the bureaucratic obstacle course that such a year entails: you want to open a bank account, is it open Tuesday afternoons, have you got your residence permit, you want your residence permit, have you got your insurance, have you got another 3 passport photos ....

There is the possibility of continuing a development of this from last year

Scrabble
Someone wrote an interactive scrabble game, with computer opponent. This could be reworked and developed further.

DOM Projects

Dr. Donal O'Mahony

Room

Extension

Dunlop-Oriel House

8445

My projects (for 2017-2018 ) will center on themes related to networking, computer security and sustainability.

Using the Blockchain as a general purpose identity mechanism

For many years now, users have used Public Key encryption to assert their identity on the net. A vital support tool for this has been Public Key Infrastructure (PKI) which allows people to link their 'identities' to their public key. One of the main thinkgs holding back digital identity is the difficulty in managing this public key infrastructure: establishing the link in the first place, dealing with lost or stolen keys etc. 
Blockchain technology has potential to vastly improve on this - opening up a new digital identity system that can not only be used for electronic payment and participation in smart contracts, but can open up a whole new world of trusted network interaction. This project will investigate how this might be done on the Ethereum blockchain. It will draw lessons from the experience in PKI and also the work of startup companies in this space (e.g. uport.me)  as well as previous TCD projects in this area.  This year, one major focus will be on investigating the possibility of making a blockchain system that is in some ways PKI-compatible.

Electricity Trading Between Smart Nano-Grids using Blockchain


Today, we draw all of our electrical power from the electicity grid.  This is largely powered by fossil fuels and distributes power to consumers 24/7 based on a fixed price per kWh.  In an energy constrained future, it is more likely that energy will be generated in a distributed fashion from renewable sources such as wind and solar with limited amounts of storage (batteries) to help even out the peaks and valleys of supply and demand.  Managing the energy flows in such an environment will be challenging.  This project will invove developing computer processes that will control small inter-connected sub-sets of the grid known as Nano-Grids.  These will be implemented as Python programs running on the popular Raspberry Pi  computing device.    You will develop processes that communicate with each other in real-time over sockets/wi-fi.  
Since the primary way of communicating scarcity and surplus will be through so-called 'price-signals' - a real-time payment method will be key to the process.  For this we willl investigate an idea known as state-channels - which allow micropayments to be made from node to node with settlement on the blockchain.  This part of the project will investigate the ideas from the bitcoin lightening network and the Ethereum Raiden network, with possibly a look at newer systems like plasma.  Ideally, the project will develop a prototype state-channel system to implement payment between the interacting nano-grids.


Big Data Analysis of the Blockchain

[Note: These projects are formulated specifically for students of the new MSc Computer Science (Data Science) ]

All blockchains since Bitcoin have maintained their 'state' in an ever-growing chain of blocks that are cryptographically linked. This is a historical record of every single transaction that has taken place within the system since inception.  This is a large amount of data - the Bitcoin blockchain (August, 2017) is 154GB and Ethereum is 122.23 GB.
I will supervise three projects that will treat this information as a Big Data set and attempt to analyse it to address questions such as:
  • determine traffic patterns e.g. distinguish exchange transactions from individual transactions
  • focus on frustrating money-laundering activites e.g. examine clusters of activity around sites like shapeshift and other coin mixers.
  • identify fraudulent activity e.g frontrunning by exchanges during Initial Coin Offerings(ICOs); tracing criminal acvity such as extorsion
  • Tapping the power of Events - these are unique to Ethereum and may offer great insights into blockchain activity

The first part of this project will be to source or create data sets representing blockchain acvitiy on Amazobn Web services - this may be done collaboratively by all students involved in these projects.  The 2nd step is devising data analytic approaches that can deliver insights within a sensible computing horizon.


Dr. Eamonn O Nuallain

13.3.1

3691

WR 13.3.1

3691

Radio Tomography, RF Coverage Prediction and Channel Modelling in Wireless Networks My postgraduate students and I research Radio Tomography (seeing through things with Radio Frequency (RF)), RF propagation and wireless channel modelling. The objectives of this research are to enable the rapid and accurate prediction of RF Coverage, frequency domain response, data throughput and interference mitigation in wireless networks - in particular as it relates to Cognitive Radio and MANETS. We are also interested in intelligent handoff algorithms. Such information is of great interest to wireless network planners and regulators. We are concerned with both fixed and ad-hoc networks. The methodology we employ is largely computational mathematics, computational electromagnetics, RF propagation, and mathematics. We research, develop and code numerical techniques to acheive our objectives. We test our results against field-measurements. We are also very interested in exploiting the capabilities of paralell computing and software radio. Most of our programming work is done in C, C++ and Matlab and using the NS3 simulator. These projects are available either as Final Year Projects or Masters dissertations. You should be interested in mathematics, ad-hoc networks and RF. If you are interested in pursuing a project in this area then contact me by e-mail and we can meet to discuss.

Declan O'Sullivan

Prof. Declan O'Sullivan

I am generally interested in projects in the area of data/information integration.

I also have an interest in the ethics of the use of technology in digital content technology. (see http://adaptcentre.ie/ethics-privacy/).

Typically the projects I supervise use techniques such as linked data, semantic web, semantic mapping, and XML-based technologies such as RDF, SPARQL, OWL and SPIN.

There are also opportunities to work on aspects of research ongoing within the Science Foundation Ireland ADAPT centre focusing on Digital Content Technology, and ongoing digital projects with the TCD Library and Ordnance Survey Ireland .



Professor Carol O'Sullivan

Updated: 04/10/2016

I am interested in supervising projects related to physical interaction and response in virtual and augmented reality. If you have an idea for another related project, we can also discuss that.

Virtual Reality (VR) involves immersing yourself in a virtual world, perhaps wearing a head-mounted device such as the Oculus Rift or the Microsoft HTC Vive. It can also involve a large projected environment or even a normal desktop monitor.

Augmented Reality (AR) usually involves remaining fully or partially in the real environment, but augmenting that real environment with virtual objects, characters and other graphical elements. These virtual augmentations can be delivered through a device such as the new Microsoft Hololens or similar headset. It need not, however, involve wearing any glasses or device, and the virtual objects can be projected onto the real world surfaces.

I am interested in interactions between the user, objects and the environment in such systems, and in particular in ensuring that the physics of objects is plausible and/or accurate. Therefore, I am proposing several project areas to explore these issues further. You will need to have a reasonable understanding of computer graphics and/or computer vision in order to undertake one of these projects. If you are interested, and have the necessary experience, please email me and we can meet to discuss.

Capturing the physics of real objects


Check out this paper for an idea of what might be involved. (Image from the paper)

Interactions with real objects


How do you drop or throw a virtual object in AR or VR so that the physics are believable or accurate? (Image from Ironman movie)

Interactions with real environments


How do you create the illusion of physical objects interacting with a real environment and the user? (Photo of Goofy's playhouse in Disneyland Tokyo. Checkout the video here)

Interactions with characters


How do you create the illusion of virtual characters physically interacting with a real environment? (Image from MotionBuilder, though we may not necessarily use this system...)

Prof Owen Conlan

Location: O'Reilly Institute, Room F.29 Phone: +353-1-8962158

On-Mobile Privacy-sensitive Personalisation

Available Current personalisation techniques, e.g. the tailoring of content to an individual user's preferences, rely heavily on server solutions that require potentially sensitive information about the user to be stored remotely. With the advent of more powerful mobile devices the potential to achieve high degrees of personalisation, using existing approaches, on the mobile device is significant. This project will explore the design and development of a personalisation framework that is deployed on device and does not share user model information with third parties.

In the first instance, contact Prof Owen Conlan

Augmented Video Search

Available Searching for content within videos is difficult. Current techniques rely heavily on author-created metadata to discover the video as a whole, but there few solutions to searching within the video. This project will explore how off-the-shelf multimodal (i.e. image analysis, audio feature detection, speech-to-text) techniques may be used to support search within a video.

In the first instance, contact Prof Owen Conlan

Visual Search

Available Modern internet based search prizes precision over recall, striving to present users with a select few relevant resources. Users can quickly examine these resources to determine if they meet their needs. There are other situations, such as patent search or performing research on Medieval corpora, where recall, i.e. retrieving all relevant documents, is essential. This project will examine visual techniques to support users in determining and refining recall in a search environment. The project builds on over 10 years of Personalisation, Entity-based Search and Visualisation work surrounding the 1641 Depositions and more recent work on the 1916 Pension Statements.

In the first instance, contact Prof Owen Conlan

Supporting the construction of Visual Narratives

Available Research in narrative visualisations or visual narratives has been growing in popularity in the Information Visualisation domain and in online journalism. However there is limited support offered to authors in constructing visual narratives, specifically non-technical authors.

This project will aim to advance the state of the art in visual narrative construction by supporting authors to build visual narratives, namely the visualisation in the narrative including automatic sequencing between the visualisations.

In the first instance, contact Dr Bilal Yousuf

Ethics-by-Design

Available The ethical implications of modern digital applications is growing as they encroach on more and more aspects of our daily lives. However the techniques available for analysing such ethical implications struggle to keep up with the pace of innovation in digital businesses, and tend to require the mediation of a trained ethicist. The Ethics Canvas is a simple tool to enable application development teams to brainstorm the ethical implications of their designs, without oversight of a trained analysts. It is inspired by Alex Osterwalder’s Business Model Canvas which is now very widely used in digital business formation. The Ethics Canvas exists both as a paper-based layout and as a responsive web application (see https://www.ethicscanvas.org/). Currently the online version can only be used by individuals, but cannot be used in the collaborative mode that is a key benefit of paper version. This project will extend the ethic canvas implementation to support remote collaborative editing of the canvas. User should be able to form teams and then review, make changes, comment on discuss, accept/reject changes and track/resolve issues. Further, the digital application development community could benefit from sharing previous ethical analyses using the online ethics canvas. The benefit of such sharing would be magnified if it led to a convergence in the concepts used in different canvas analyses. Therefore the project will allow teams to publish their canvas into a public repository and to annotate its content with tags from a shared structured folksonomy, i.e. a community-formed ontology capturing concepts such as different types of users, user groups, personal data, data analyses, sensor data, and risks. Within an individual canvas, tags can be used to link entries in different boxes to provide more structure to the canvas. The aggregation of tags from different completed canvases forms a folksonomy that can be made available as an open live linked-data data set and searchable by ethics canvas users."

In the first instance, contact Prof Owen Conlan



Dr. Rachel McDonnell

Updated: 19/09/2017

I am an Assistant Professor in Creative Technologies and I am available to supervise projects in the area of Computer Graphics. I am interested in all aspects of computer graphics, but particularly in the animation, rendering and perception of realistic virtual humans.

I have a range of projects on offer (see below). These project involve developing an experiment in Virtual Reality (using a head-mounted device such as the Oculus Rift or HTC Vive), and a Game engine (such as Unreal Engine 4). I am also open to novel ideas for projects from students on virtual humans or Virtual Reality.

[TAKEN] Interpersonal distance in VR

Would you be afraid to walk close to a zombie in VR? Develop an experiment system which monitors the personal distance from the user to a virtual avatar. The avatar should react to the user to increase the believability of the interaction. Record the users heart-rate to determine if you can frighten them. Read this and this paper for an idea of what is involved.

[TAKEN] Virtual Embodiment

Can you make a user feel what their virtual avatar feels? Develop an experiment system which induces the illusion of ownership of a virtual body or face. Read this paper for an idea on how to do this.

[TAKEN] Giant in the city

If you controlled a virtual avatar that was the size of a giant, would your behaviour change in a game?. The Proteus Effect is a phenomenon seen in digital environments whereby an individual conforms to their digital self-representation, their avatar’s appearance, regardless of how others perceive them. Develop an experiment system to allow the user to move around an environment like a giant (bigger footsteps, different field of view, etc.). Image from here. Read this paper for an example of how this was done with a child-like virtual body.

[MSc Level] Cartoon Motion

In the entertainment industry, cartoon animators have a long tradition of creating expressive characters that are highly appealing to audiences. Creating these highly appealing animations is a labor-intensive task that is not supported by automatic tools. In this project, you will create a cartoon motion filter in order to make a motion-captured animation more cartoony. You can use either a data-driven approach (such as in this paper) or to design a novel filter (as in this paper.)

[MSc Level] Realistic Virtual Human Rendering

Virtual humans are becoming increasingly more realistic, even in real-time. In this project, you will focus on photorealistic faces, using the most up to date techniques for skin, eye and hair rendering. (e.g., using high resolution scanned data from the Digital Emily project)

Please e-mail me at ramcdonn [at] scss.tcd.ie if you are interested in any of my projects, or if you have your own graphics project proposal that you would like to discuss with me. Strong technical skills will be necessary for these projects.

Dr. Rob Brennan

Senior Research Fellow, ADAPT Centre, School of Computer Science and Statistics.
Email:rob.brennan@scss.tcd.ie
My projects are in the areas of data quality, data governance and data value with an emphasis on graph-based linked data or semantic web systems. Please note that I am unlikely to supervise your own project ideas due to current commitments. These projects are only for MSc students.

Projects

TAKEN 1. Extracting Data Governance information and actions from Slack chat channels

Main contact: Dr Alfredo Maldonado(address corrected on 26/9/2017)
Data governance means controling, optimising and recording the flow of data in an organisation. In the past data governance systems have focused on formal, centralised authority and control but new forms of enterprise communication like Slack need to be leveraged to make data governance more streamlined and easier to interact with. However systems like Slack produce vast amounts of unstructured data that are hard to search or process, especially months or years later. Thus we need a way to extract the most relevant conversations in Slack and turn them into structured data or requests for specific data governance actions like a change in a data sharing policy. This project looks at ways to extract relevant conversations and turn them into data governance actions via an interactive Slack bot that uses machine learning and natural language processing to identify relevant conversations and then interjects in Slack conversations to prompt users to interact with a data governance system.
This project is conducted in collaboration with Collibra Inc., a world-leading provider of data governance systems.
Keywords: Natural Language Processing, Machine Learning, Python, Data Governance

NO LONGER AVAILABLE 2. Automated Collection and Classification of Data Value Web Content

Main contact: Dr Rob Brennan
Jointly supervised with: Prof. Seamus Lawless
This research aims to automate the collection and classification of discussions of data value (e.g. "How much is your data worth?", "Data is the new Oil!") on sites like Gartner or CIO.com. This will compliment our traditional survey of academic papers discussing data value managment. The project will attempt to identify from the web content: the most important dimensions of data value (eg data quality), metrics for measuring them, the different models of data value proposed by authors and applications of data value models.The research will explore new ways to classify and conceptualise the domain of data value. Ranking dimensions for importance is also an interesting potential challenge. the project may also consider how to best structure the conceptualisation of the domain for different roles or types of consumers.
Keywords: Information Retrieval, Natural Language Processing, Knowledge and Data Engineering

TAKEN 3. Adding W3C Linked Data Support to Open Source Database Profiling Application

Main contact: Dr Rob Brennan
Jointly supervised with: Dr. Judie Attard
The Data Warehousing Institute has estimated that data quality problems currently cost US businesses more than $600 billion per year. Everywhere we see the rise in importance of data and the analytics based upon it. This project will extend open source tools with support for new types of web data (the W3Cs Linked Data) and sharing or integrating tool execution reports over the web.
Data profiling is an important step in data preparation, integration and quality management. It is basically a first look at a dataset or database to gather statistics on the distributions and shapes of data values. This project will add support for the W3Cs Linked Data technology to an open source data profiling tool. In addition to providing traditional reports and visualisations we want the tool to be able to export the data profile statistics it collects using the W3Cs data quality vocabulary, and data catalog vocabulary. These vocabularies allows a tool to write a profile report as Linked Data and hence share the results with other data governance tools in a toolchain. This will be an opportunity to extend the use of this vocabulary beyond pure linked data use cases to include enterprise data sources such as relational databases.
Keywords: Knowledge and Data Engineering, Java programming, Linked Data, Data Quality

TAKEN 4. Ethical Web Data Integration

Main contact: Dr Rob Brennan
Jointly supervised with: Prof. Declan O'Sullivan
In an era of Big Data and ever more pervasive dataset collection and combinationhow do we know the risks and if we are doing the right thing? This project will investigate the characteristics and requirements for an ethical data integration process. It will examine how ADAPT's semantic models of the GDPR consent process can be leveraged to inform ethical decision-making and design as part of the data integration process. This work will extend the ADAPT M-Gov mapping framework.
Keywords:
Ethics, Knowledge and Data Engineering, Java programming

TAKEN 5. Automatic Identification of the Domain of a Linked Open Data Dataset (New 25/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
As the Web of Data grows, there are more and more datasets that are becoming available on the web [1]. One important challenge in selecting and managing these datasets is to identify the domain (topic area, scope) of a dataset. Typically a dataset aggregator (such as datahub.io) will mandate that minimal dataset metadata is registered along with the dataset but this is often insufficient for dataset selection or classification (such as the dataset types used by the LOD cloud).
The aim of this dissertation topic is to create a process and tools to automatically identify the topical domain of a dataset (using metadata, querying the dataset vocabularies and clustering using ML algorithms). Thus it will go beyond traditional Semantic Web/Linked Data techniques by using a combination of ontology reasoning or queries and machine-learning approaches. Given an input dataset from datahub.io, LODlaudromat or the weekly dynamic linked data crawl (http://swse.deri.org/dyldo/data/), the datasets should be categorised in a specific topical domain so that consumers can filter this large network according to their needs.
Keywords: Knowledge and Data Engineering, Machine Learning
[1] http://lod-cloud.net
Further Reading
[2] http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/SchmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf
[3] http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/
[4] http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/

NO LONGER AVAILABLE 6. Automated Selection of Comparable Web Datasets for Quality Assurance (New 25/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
Many open Linked Data datasets suffer from poor quality and this limits their uptake and utility. There are a now number of linked data quality frameworks, eg Luzzu[1], designed to address the need for data quality assessment and publication of quality metadata. However in order to apply some quality measures, e.g. "Completeness Quality"[2], it is necessary to have a comparable dataset to test against. For example, the comparable dataset could form a Gold Standard or benchmark which can be used to compare with other similar data.
This project will investigate the methods required to (1) identify the requirements for a comparable dataset based on a specific set of quality checks and a dataset to be tested, and (2) then use these requirements to find the best possible dataset to act as a Gold Standard from a pool of open datasets such as datahub.io. Example requirements may include matching the domain, ontology language, presence of specific axiom types, ontology size, ontology structure, data instances present and so on.
Keywords:Knowledge and Data Engineering, Data Quality
[1] http://eis-bonn.github.io/Luzzu/
[2] http://www.semantic-web-journal.net/system/files/swj773.pdf

TAKEN 7. Data Quality Dashboard (New 29/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
The Luzzu data quality assessment framework is a flexible, open source Java-based toolset for assessing the quality of Linked Data that is now being maintained by the ADAPT Centre at TCD. Luzzu supports semantic reporting of quality assessments by using the dataset quality vocabulary [2], the quality problem ontology and the Luzzu metric implementation ontology. However it is still a command-line tool and the semantic reports it generates are optimised for machine readability. In this project we will build a data quality dashboard that visualises the semantic outputs of Luzzu and makes it easy for quality manager or data stewards to infer the implications of a data quality assessment task.
Keywords:Knowledge and Data Engineering, Data Quality, User Interface Design
[1] http://eis-bonn.github.io/Luzzu/
[2] http://theme-e.adaptcentre.ie/daq/daq.html

Dr. Marco Ruffini

Final year projects:

End-to-end capacity reservation in Software Defined Networks

Software Defined Networks have revolutioned computer networks, by introducing a means to enhance network programmability through the use of standardised and open access interfaces. The aim of this project is to implement an end-to-end capacity reservation mechanism across aggregation and core networks based on the use of stacked Multi-Protocol Label Switching (MPLS) labels. User requests are forwarded to a centralised controller that takes into account available capacity to allocate the requested capacity over an end-to-end link. A background in network programming and python programming language is strongly advised

Prof. Siobhan Clarke

Room

Extension

Lloyd 1.17

2224

I am interested in software systems that make cities smarter! Below are some examples that I am co-supervising. If you have any ideas of your own for smart cities - e.g., smart transport, smart energy management, smart water management, do please contact me, as I am happy to supervise projects in this area.

Co-Supervised with Dr. Mauro Dragone

HeatMap App (Participatory version)

The goal of this project is to build an Android application that can be used to assess the number of users present at the entrance of museums, shopping malls, in buses and around bus stops, art exhibitions, car parks, or any other public, shared places where people occasionally congregate and/or queue. To this end, the student will build a solution using one of the available frameworks for peer-to-peer communication between multiple handsets [1].

HeatMap App (Vision version):

The goal of this project is to build a system that is able to estimate the length  of the queue of visitors waiting to enter a museum, art exhibition or other place of public interest,  such as the Old Library and the Book of Kells Exhibition in Trinity College. The student will use a Galileo single-board computer and a pan & tilt camera, and will develop a computer vision algorithm using the OpenCV library [2] to segment, track and count people in the queue. There is also scope to develop adaptive solutions to account for different visibility conditions, and to build an Android application.

Bus Tracker:

The goal of this project is to build an Android application to infer and gather useful knowledge about the travel habits of users carrying smart mobile phones. Specifically, the target application should be able to recognize which public transport route (e.g. train, bus, LUAS), and between which stops  the user is currently traveling. The student will use current publish/subscribe and middleware for location-aware applications, such as [3][4][5], and ]investigate the adoption of machine learning techniques, such as neural networks, to classify routes based on the analysis of  streams of noisy sensor data.

Extension of Funf:

The Funf Open Sensing Framework [6] is an extensible sensing and data processing framework for Android mobile devices. The core concept is to provide an open source, reusable set of functionalities, enabling the collection, uploading, and configuration of a wide range of data signals accessible via mobile phones. The goal of this project is to extend Funf with support for peer-to-peer communication between multiple handsets, in order to enable the coordination of the efforts of multiple users involved in participatory sensing campaigns.

Urban GeoLocation:

The goal of this project is to assess and improve the ability to locate users carrying smart mobile phones while driving, cycling, or simply walking along urban pathways. In particular, the student will tackle the problems suffered by GPS-based location in urban environments, where the signals from the positioning satellites are often blocked or bounced off buildings and other structures. Contrary to existing approaches which try to explicitly account for these  phenomena, the student will assess the benefits of using multiple sensor data and the feedback gathered from multiple users over time, to build solutions that are able to exploit the power of the crowd to acquire complex models and improve their accuracy over time. The work will require the student to familiarise themselves with Particle Filter [7] as the overall framework that is likely to be used to integrate the various components of this project.

SensorDrone:

The goal of this project is to develop an Android application using the sensordrone kit [8]. Sensordrone is a modular device the size of a key-chain, equipped with temperature, luminosity, 3-axis accelerometer and air-quality sensors. The device can be paired with the users' mobile phone over low energy bluetooth. A number of useful applications may be built by exploiting the combination of sensors available on the Sensordrone and the sensors and the geolocation functions available on the user's smart phone. Of particular interest are applications targeting:

  • Road quality information - is the road deteriorating in specific locations? E.g. early pothole formation identification
  • Bike scheme monitoring - real time info. on where and when the cycle fleet is being used and what the cycles are encountering.
  • Map urban pollution data information - noxious gases, noise, temperature.
  • Cyclist routing - using information on pollution, journey times for bikes, stats on areas where cyclists swerve or brake suddenly.
  • Localised weather alerts for cyclists (and potentially data collection on the device)

Smart Home Projects:

Project ideas are also welcome for projects addressing the development of smart home services and their integration within city-wide participatory sensing frameworks [9]. The student will be required to develop software prototypes for the OpenHAB open source software platform for home automation [10]. A range of hardware is available for these projects, including a  single board computer and home automation sensors and actuators, such as occupancy sensors, energy monitors and wireless switches.

Links to relevant technologies and further readings:

[1] Peer-to-peer frameworks for Android:http://code.google.com/p/p2p-communication-framework-for-android/,https://code.google.com/p/peerdroid/,http://developer.android.com/guide/topics/connectivity/wifip2p.html, https://github.com/monk-dot/SPAN
[2] OpenCV: http://www,opencv.org/
[3] MQTT: http://mqtt.org/
[4] OwnTracks: http://owntracks.org/
[5] Google Pay services for Android developers: https://developer.android.com/google/play-services/location.html
[6] Funf: http://www.funf.org/about.html
[7] Particle Filter: www.igi.tugraz.at/pfeiffer/documents/particlefilters.pdf ,
[8] SensorDrone:
http://www.sensordrone.com/
[9] CityWatch: http://www.citywatch.ie
[10] OpenHAB: http://www.openhab.org

Co-Supervised with Dr. Ivana Dusparic

Smart energy grid: Intelligent Residential Demand Response

The European Union's 2050 roadmap is resulting in the increasing penetration of renewable energy sources and electric vehicles (EVs) in Europe. In Ireland, it is expected that 80% of electricity will come from renewable sources by 2050, and 60% of new cars sold in 2050 will be electric. As a consequence, the electrical energy grid is facing significant changes in the supply of resources as well as changes in the type, scale, and patterns of residential user demand.

In order to optimize residential energy usage, demand response (DR) techniques are being investigated to shift device usage to the periods of low demand and to the periods of high renewable energy availability. DR refers to modification of end-user energy consumption with respect to their originally predicted consumption patterns.

This project will investigate use of intelligent learning-based techniques in implementation of large-scale DR aggregation techniques suitable for residential customers. Some of the aspects to be addressed within the scope of the project include: household energy use learning and prediction (as enabled by e.g., smart meters or smart heating devices like Nest and Climote), evaluation of centralized vs decentralized DR approaches, responsiveness of techniques to different usage patterns and different renewable energy generation patterns, types of devices most suitable for DR programmes (e.g., heating, EVs) etc.

Smart energy grid: Home energy usage prediction and optimization based on sensor data

This project will investigate how can home energy usage be learnt, predicted and optimized. Patterns of energy use can be learnt and predicted based on historical occupants' behaviours (e.g., learning that the user generally leaves for work at 8:15am, plays football after work on Wednesdays, goes out straight after work on Fridays etc), combined with various sensors and data sources to provide more accurate amended predictions (e.g., mobile phone calendar, GPS location of the user, level of battery charge in electric vehicle, outside temperature etc). Use of learning and intelligent agent techniques will be investigated and applied to learning the observed patterns and establishing demands and constraints on user device usage (e.g., the duration of charging an electric vehicle will require based on the duration of daily trip, the time heating needs to be turned on to achieve optimal temperature by user arrival time, estimated time why which hot water is required for the shower etc). Multi-objective optimization techniques will then be applied to schedule required device usage so as to satisfying their use requirements and constraints as well as desired policies set by users (e.g., minimize energy price, maximize use of renewable energy etc).

Untitled Document

Prof. Seamus Lawless

I supervise projects in the areas of Information Retrieval, Personalisation and Digital Humanities. The common focus of these projects is the application of technology to support enhanced, personalised access to knowledge. If you would like to talk about a project or suggest one in these areas, email me at seamus.lawless@scss.tcd.ie

Project Details

Mobile Application Design and Development

Tracking the Altitude of Ireland
This project will develop a mobile application (iOS and Android) that, when installed, allows anyone in Ireland to keep track of the meters that they ascend while out walking/hiking/climbing. This data will then be grouped to enable league tables to be created for each school, college, town, county etc. The UI/UX of the app should be designed so as to be user friendly and intuitive. The data storage architecture should also be flexible and scalable. Data visualisations, including spatio-temporal visualistion, could be investigated. The idea behind the app is to generate friendly competition in order to promote an active lifestyle and get people out hiking.

Information Retrieval and Web Search

Personalised Talent Search on LinkedIn
This research will aim to design and implement complex personalized search approaches for the purpose of helping employers to locate individuals with desirable talents on LinkedIn. This research will be based on a preliminary study that has explored and validated the effectiveness of the use of machine learning techniques in personalized talent search. The focus of the research will be to further explore and implement more complex and effective machine learning methods for delivering an expertise search experience.

Triple Linking from Unstructured Text
Identifying the people, places, events and dates mentioned in news articles, blog posts and other unstructured text on the web is a difficult task. Linking these entities using Linked Open Data presents a further challenge. This project will investigate the generation of “Triples” from online text content. The input to this process will be a webpage or document, and the output should be a list of DBpedia triples which describe the entities that are mentioned in the text, in addition to a confidence score. There are some existing tools to help like openIE which extracts triples from unstructured text, and entity linking tools (like Tagme, Spotlight or Nordlys) can be used for mapping entities to DBpedia.

The Search for the Searcher: User Identification using Web Search Log Mining
When making decisions about how best to support users in web search, it is extremely helpful if we can determine who they are, why they are searching and how they interact with the search site. This type of information can be difficult to obtain, particularly when users are anonymous. Server log files contain a huge amount of information about user actions: what pages they viewed; what queries they issued; how much time elapsed between interactions with the site. But there is a deeper meaning behind this data. Buried within the contents of a log file is a rough outline of who each user is. Are we watching a novice investigator gradually learn about a topic? Do we see them refine the precision of their question based on new knowledge? Are we witnessing an expert scholar, rapidly issuing specific, targeted queries as they gather sources for their research? Did they find an answer to their question? If so, how much effort did it take for them to complete their search? How does their behaviour change depending on what the site presents? How does the relevance change as the user expertise and interactions change? This project will investigate the mining of a large search log with the aim of deriving information about the users' interests and levels of expertise. A number of search logs will be made available to the student as part of this project.

Word Embeddings for Improved Search in Digital Humanities
If I gave you a document that was written entirely in Japanese Kanji and asked you to tell me which symbols had a similar meanings, how would you do it (assuming you cannot read Kanji)? Even for a human, finding a realistic answer to this question is extremely difficult. Yet this question reflects a fundamental problem with how computers perceive texts. Unless a human annotator provides some form of descriptive mark-up, a computer simply does not understand the meaning behind the text it curates. Word embeddings are a recent development in the text analysis community. By applying a family of algorithms collectively known as Word2Vec a computer is able to examine a large collection of documents and derive relationships between words based solely on their contextual usage (e.g. the word "King" has some strong association with the word "Queen”. Also, the vectors produced are additive and subtractive - by subtracting "Man" from "King" and adding "Woman" to the result, we will obtain a vector which is extremely close to the the vector for "Queen”). This Masters project aims to investigate the use of word embeddings in supporting better search and exploration of a collection of 17th century historical documents. This may involve generating suggestions for alternative query formulations in a search interface. In more advanced terms, we may seek to build a retrieval model based on the word vectors generated by Word2Vec.

Data Analysis and Data Science

[Taken] #ITK #DoneDeal #FakeNews
The summer Transfer Window is a busy time for Football clubs, supporters and the media. This is reflected in the volume of activity on Social Media platforms such as Twitter. Clubs are continually linked with signing and releasing players, rumours circulate that clubs are interested in particular players and making moves to recruit them. A very small fraction of these rumours actually come to pass. There are lots of Twitter accounts which claim to be #ITK - "In The Know", is this actually the case? I am interested in collecting a large Twitter dataset related to the Summer Transfer Window, particularly focused on the English Premier League. I would like to apply Machine Learning techniques to that dataset to look for patterns in the tweets and to preform an analysis of the preformance of certain accounts in predicting transfers.

Credibility and Trust

Perception of Trust in Search Result Presentation
When searching for content online, trust in the impartiality of the search algorithm and the ranking and presentation of results is of paramount importance. Different users display varying levels of trust, and this trust can be impacted by the design of a search interface and the visual aspects of results presentation. This project will investigate user characteristics and the design and common features of search interfaces which impact users' perceived level of trust. An evaluation will be conducted on a collection of well-known search sites. Each site will be 'distorted' by removing or adding visual features of the website design. Users will then be asked to rate their trust of each website. An in-depth statistical analysis of the results will be performed.

Kicking them while they're down
Previous research has been conducted into bias in the images used in news. It was shown that certain newspapers were prone to using unflattering images of politicians whose position they opposed, and more flattering images of politicians they supported or had endorsed [1]. We propose updating and extending this study. We aim to examine news media's likelihood of using good or bad images based on individual politicians' popularity ratings. The intuition is that it is more probable that news media will use unflattering images of politicians whenever they have a 'bad week'. It would also be interesting to look at the use of images of politicians in relation to specific political issues, for example, images of An Taoiseach Leo Varadker used in articles related to the "Repeal the 8th" when compared to pictures of Mr. Varadker on news articles related to jobs announcements.

[1] Barrett, A. W., and Barrington, L. W. (2005). Bias in newspaper photograph selection. Political Research Quarterly, 58(4), 609–618.

Imagery in Political Propaganda
Previous research has been conducted into bias in the images used in news. It was shown that certain newspapers were prone to using unflattering images of politicians whose position they opposed, and more flattering images of politicians they supported or had endorsed [1]. However, limited research has been conducted into the imagery that political parties themselves use in the content they generate. We propose a study into the images used in election material to investigate which political parties used the least flattering images of their opoosition party members, for political purposes. This study does not need to be limited to Ireland, however, an archive of election material from GE16 does exist, including leaflets related to contentious issues such as Irish Water [2].

[1] Barrett, A. W., and Barrington, L. W. (2005). Bias in newspaper photograph selection. Political Research Quarterly, 58(4), 609–618.

[2] https://irishelectionliterature.com/index-of-electionsparty-literature/index-of-2016-election-leaflets-ge16/.

[Taken] Credibility in Graphical Presentation of News Content
This project will investigate the credibility of graphics used in news media content to assess if there is a pattern of Accuracy / Bias / Trust / Credibility / Fairness etc. Previous research has demonstrated that it is possible to design and communicate misleading graphics and other visual representations of statistical information. It has also been shown that bias exists in the content of news articles, what about the information news media provide us in other formats? This project will examine a number of news items which have been covered in the media to a significant extent. e.g. Ireland's Bailout etc. We will identify articles which have used graphics to summarise and explain the issue. The researcher will then create, as much as is possible, an independent and accurate depiction of the same information using current information presentation guidelines. A comparison study can then be conducted to measure peoples' opinion of each representation with relation to any/all of Accuracy / Bias / Trust / Credibility / Fairness etc. The intuition is that certain news media sources are more likely to use biased graphics than others.

2017/2018 FYP/MSc project topics for Stephen Farrell

If interested send mail

Detailed project scope and goals can be adjusted to fit student skills and the level of effort available.

  1. Implement and test a proof-of-concept for MPLS opportunistic security

    Multi-Protocol Label Switching (MPLS) is a sort-of layer 2.5 that carries a lot of Internet traffic in backbone networks. There is current no standard for how to encrypt traffic at the MPLS "layer." I am a co-author on an Internet-draft that specfies a way to opportunistically encrypt in MPLS. The task here is to implement and test that which will likely result in changes to the specification (and adding the student's name to the eventual RFC). There are existing simulation/emulation tools that support MPLS and IPsec that should make implementation fairly straightforward, for example Open vSwitch. A proof-of-concept demonstration with performance figures vs. cleartext and IPsec is the goal. Comparison against MACsec would also be good but may be too hard to test in this environment.

  2. Assigned: Compare and contrast existing domain/web-site security/privacy measurement web sites

    A number of web sites (see below for examples) offer one the opportunity to "score" or test some other domain or web site for security and privacy issues with their deployment of HTTPS, SSH, DNS, BGP or other visible behaviours. Typically, these are intended for site administrators to help them know if their current site configuration is good or bad, and if bad, in what respect, and how to mitigate that. The sets of tests applied differ, and it is unclear if individual tests applied are the same over different test sites and over time. (The sets of tests certainly change over time as new vulnerabiities are discovered.) The goal of this project is to identify and describe such test sites, to compare their various sets of tests, and to establish whether or not tests that appear to be the same, are in fact the same (likely via setting up a web site with known flaws as a test article). A "test-the-testers" web site may be a good outcome here, that can be updated as the test-sites evolve. The scope here may be limited to https or could be extended to SSH, DNSSEC, BGP or other Internet technologies depending on effort and student interest.

  3. Assigned: Deploy and run a local privacyscore.org instance and test against Irish web sites

    PrivacyScore.org is an open source tool (currently in public beta) and web site testing platform that aims allow users to score web sites on how well or badly they are implementing visible security and privacy features. The goal here is to deploy a local instance of the tool, and run that to test the security and privacy properties of some local (Irish) web sites, (possibly within tcd.ie, so you'll likely get to tell college it fails:-). As well as deploying and testing the tool, potential improvements to the tool will likely be identified and possibly implemented and fed back to the developers. It'd be a fine thing to identify local sites that are interesting to test, to interpret the tool output, communicate that to site owners and help make the web a bit better.

  4. Deploy a local DPRIVE recursive resolver and test performance

    DPRIVE is a specification for how to run the DNS protocol over TLS in order to attempt to mitigate the privacy problems with the use of the Domain Name System. There are implementations of DPRIVE available now that are ready for experimental deployments. The goal of this project is to deploy a local DNS/TLS recursive resolver and to test it's effectiveness and efficiency via artificial test queries and responses but also, where possible, handling real DNS queries from clients that have opted in to being part of the experiment.

  5. Assigned: Solar-powered LoRa gateway power management

    In January 2017 we deployed a solar-powered LoRa gateway in TCD. Power managment for that is relatively simple - the device aims to be "up" from 11am to 4pm each day, but handles low-power situations by sleeping until batteries have been sufficiently charged. The goal here is to analyse the power consumption and traffic patterns recorded since January 2017 in order to improve system up-time via enhanced power managment. (For example, the device could decide to not sleep from 4pm if the battery level is above a new threshold.) The current power managment daemon is a simple C program that monitors battery voltage and sets the device to sleep or wake according the simple policy described above. Modifications to that can be designed, validated based on existing data, and then implemented, deployed and tested with the existing hardware.

  6. Foo/QUIC prototype

    QUIC is a new transport protocol being developed in the IETF aiming to provide the same properties that TLS/TCP provides for HTTP but for other applications. The goal here is to select and prototype some application ("Foo") that could benefit from running over QUIC but where that Foo/QUIC combination has yet to be investigated. For example, at the time of writing, I'm not aware of any work on NTP/QUIC (though that might make less sense than it seems, not sure). The goal is to identify and prototype an interesting application/protocol that hasn't been widely tested running over QUIC in order to provide more input into the development of the new transport protocol.

  7. Prototype TLS1.3 SNI Encryption proposals/fronting

    The Subject Name Indicator (SNI) extension to TLS is extremely widely used to support multiple web sites on the same host (e.g. VirtualHosts in Apache2) but represents a major privacy leak as SNI has to be sent in clear, given that no scalable way of hiding SNI has been found. The TLS working group have just adopted a draft that describes various opt-in ways in which SNI can be protected should a web site wish to "front" for another. (Think of a CDN like akamai being "cover" for some human-rights organisation that would be censored in the relevant jurisdiction.) The goal here is to prototype and test the various SNI encryption schemes currently under study in order to assist in selecting the eventual mechanism to standardise.

  8. Critically analyse proposals for weakening the TLS specification

    (This one is a tad political but may be attractive for just that reason:-) There have been ongoing attempts to weaken the security guarantees provided by the TLS protocol by standardising ways of making plaintext visible to third parties. Those are typically claimed to be needed for network management or content scanning, and often involve a man-in-the-middle attack on TLS or leaking keys to a third party. All such proposals could also be used for pervasive monitoring/wiretapping, censorship or other "unintended" use-cases. In reaction to the most recent attempt to break TLS in this way, I documented a range of arguments against breaking TLS. The goal here is to provide a more rigorous analysis. The study should analyse both the technical and social/political pitfalls and claimed benefits of such schemes. The scope could be extended to include the deployed TLS-MITM products that are used in various networks and on some hosts. (Or that could be a separate project itself.)

  9. Play with leaked password hashes

    How fast can you check for the prescence of a password (hash) in a list of 320 million leaked hashes? The naive shell scipt I wrote in a few minutes takes 30 seconds on my laptop. The goal here is speed, without requiring any networking (so no sending password hashes over any network), on a reasonably "normal" machine. The list takes 12GB to store in a file, one hash per line. The list may also be updated occasionally, though in bulk, not as a trickle. Some side-channel resistance (e.g. considering timing, OS observation) is also a goal here. As you'd expect, the list was mostly reversed within a few weeks.

  10. Assigned: Develop a useful survey tool of Irish mail server deployments

    censys.io collate Internet-scale surveys and make the results of those public via their web site and an API. A quick search there says there are 12582 email servers in Ireland, of whom 62% do some form of STARTTLS for mail transport security. The goal here is to use the API to develop a tool that can be run periodically, that analyses that data and produces a list of email addresses to which one might send useful advice as to how they could improve their mail transport security. For example, if the data indicate some deployments are running a server that could easily be fixed, (say by just updating a certificate), then crafting a mail with deployment-specific instructions as to how to do that could be good. Actually contacting the postmasters involved is not a part of this project but may be done later based on results found.


Prof Aljosa Smolic


Thesis and Final Year Projects proposals 2017-2018


Please visit our web page:

V-SENSE Student Projects


https://v-sense.scss.tcd.ie/?page_id=1100

Dr. Stefan Weber

Room

Extension

Lloyd 0.32

8423

This year I will be offering projects that are in the general area of distributed systems, information-centric networking, wireless sensor networks and mobile games. My interests include wireless communication among mobile devices such as laptops, mobile phones and embedded devices. Below are a number of projects that cover some of my interests. if you have an idea for an interesting project that involves any of the areas above, please get in touch.

Message Adaptation in the Internet of Things

Transport protocols such as the Transmission Control Protocol (TCP) were developed to support the transfer of files from a source to a destination. The development of the architecture of the Internet has The Internet of Things will present a significant challenge for the current Internet architecture if large numbers of sensors transmit individual small messages. This project will investigate the effect of small messages on routers in the current Internet architecture and develop a solution that attempts to prevent the flooding of networks with small messages.

Game Design using Information-Centric Networking

Information-centric networking enables the caching of content within infrastructure components in a network in order to reduce network traffic and latency. This project will investigate the design of a communication protocol for games based on an information-centric approach such as NDN/CCN.

Design Characteristics of Honeypots - with an Aim to Study Botnets

This project will investigate the design of honeypots and the available information of botnets and develop a design for a honeypot that will allow the gathering and evaluation of information about the distribution and design botnets such as their size, use, etc. As a means to study these botnets, a honeypot or set of honeypots should attract the attention of botnets, collect samples of the communication of these networks and provide an analysis of the collected traffic.

Adaptive Communication in Mobile Networks

Communication between sets of mobile devices in networks such as 3G/4G networks generally rely on well-known hosts that coordinate the communication between the devices. Similar to approaches to multicast communication, this project will investigate the development of a protocol that allows the adaptation of communication from an initial centralized model via a rendezvous node to a distributed model where the devices will communicate directly with one another.

Brendan Tangney

Room

Extension

316, Lloyd Building

1223


  1. Brays Heuristics for Math Education. PROJECT ASSIGNED
  2. A Collaborative Tool to Scaffold the Ph.D. Process. CAWriter is a web-based computer supported collaborative working toolkit to support research students in the academic writing process (Byrne J.R. and Tangney B. 2012). This project will take an existing prototype, extend its capabilities and engage in a user study of the tools efficacy.
  3. A Collaborative Tool to Scaffold Skills Acquisition in Project Based LearningA specification has been developed (Ellis N., 2017) for a tool to assist instructors and students in a portfolio based approach to skills acquisition, in a manner similar to the collection of Scout Badges. This project will take that specification and design, implement and test the tool in a real learning setting.
  4. Argumentation Visualisation. Many arguments, particularly in Plato’s dialogues, have a clear structure or flow. For example backtracking occurs frequently when one partner in the dialogue presents a proposition only to have to later retract it and backtrack. This tool would, given some text,assist an instructor create a visualisation of the argument in that text.

Dr Tim Fernando

Room

Extension

ORI LG.17

3800

I offer projects on knowledge representation and/or natural language semantics. Apart from programming, some mathematical maturity would be useful to survey the research literature on formal methods in artificial intelligence. Below is a list of specific topics, but I am happy to discuss other interests you may have that are broadly related.

Timelines and the semiotic triangle

    Timelines (such as this) order events chronologically, while the semiotic triangle relates the world, language and mind. The aim of this project is to design timelines that distinguish between an event E in the world, a linguistic event S that describes E, and a reference point R representing a perspective from which E and S are viewed (following Reichenbachian accounts of tense and aspect).

Finite State Semantics

    How can we apply finite automata and transducers to represent meaning? More specifically, (i) how can we structure semantic information through the successor relation on which strings are based, and (ii) how far can finite state methods process that information? A project within this topic can take a number of directions, including intensionality, temporality, comics, and formal verification tools.

Frames and grounded cognition

    This project examines the role of frames in a general theory of concepts investigated by, for example, the DFG Collaborative Research Centre 991: The Structure of Representations in Language, Cognition, and Science. The focus is to test the idea of grounded cognition (e.g., Barsalou) against various formulations of frames. Related to this are notions of force (from Talmy to Copley) and image schema in cognitive semantics.

Constraint satisfaction problems and institutions

    The goal of this project is to formulate Constraint Satisfaction Problems (CSPs) in terms of institutions in the sense of Goguen and Burstall, analyzing relations between CSPs category-theoretically as institution (co)morphisms.

Textual entailments from a temporal perspective

    Computational approaches to recognizing textual entailments need to be refined when dealing with time. Temporal statements typically refer to a temporal period that cannot simply be quantified away by a Priorean past or future operator. The challenge facing this project is how to incorporate that period into approaches such as Natural Logic that have thus far ignored it.

Bounded granularity and incremental change: scales

    This project is for someone particularly interested in linguistic semantics. The project's aim is to examine whether or not granularity can be bounded in accounts of incremental change in natural language semantics involving scales (e.g. Beavers) as well as mereology and mereotopology. Attention will be paid to how to model refinements (and coarsenings) of granularity.

Monadic Second Order Logic for natural language temporality

    This project is for someone particularly interested in logic. A fundamental theorem in logic due to Büchi, Elgot and Trakhtenbrot identifies regular languages with the models definable in Monadic Second-Order Logic (for one binary relation). The aim of this project is to explore applications of this theorem to natural language temporality --- including tense and aspect.

Prof. Vinny Cahill

My interests are in middleware and programming models for mobile, ubiquitous and autonomic computing with application to optimisation of urban resource usage and service delivery. My current work is addressing the design of vehicle coordination protocols for connected and autonomous vehicles in mixed traffic environments with the objective of optimizing journey time predictability in both highway and urban settings.

Project 1. Developing a Common Situational Model for Urban Traffic Management

The increasing availability of fine-grained sensor data, especially that expected to be available from individual vehicles as the level of automation increases towards fully autonomous vehicles, means that it will be possible for public authorities to build an ever more detailed picture of the current state of the road network. This might potentially include the intensions of drivers and vehicles, as derived from on-board navigation systems, as well as their locations and trajectories. Such a model, referred to as a Common Situational Model, could be the basis for the implementation of a wide variety of enhanced traffic management services including vehicle routing, urban traffic control, and bus arrival time prediction.

The information required to build such a model will necessarily come from a variety of diverse stakeholders including public authorities, traffic services providers, fleet managers, individual (participating) drivers, and, increasingly, individual vehicles. Moreover, subsets of this information will be useful (typically with different levels of detail) to those stakeholders and others, with different quality of service requirements.

This project will explore the architectural, information, and communication services that will be needed to support the construction and distribution of such a model at different scales and levels of granularity. The project will be expected to prototype the model and develop some small-scale proof-of-concept applications that make use of it.

Project 2. Cooperative Driving

Large-scale events (such as concerts and mass-participation sporting events) often result in traffic congestion in surrounding areas (see, for example, [1]). This may arise due to the presence of an exceptional volume of traffic in the vicinity of and approaches to the event, (planned) road closures, mass jaywalking, or the impact of vehicles searching for, entering and leaving parking spaces (even where adequate parking is available). Our hypothesis is that this is, at least, in part due to competition and, more generally, poor coordination between vehicles and that the situation could be improved if the actions of individual vehicles were coordinated. This might be achieved by providing appropriate (fine-grained) advice and/or directions to drivers to smooth the flow of traffic. Indeed, this is often attempted by the deployment of traffic police or stewards.

This project will explore the design of an app to provide such advice to drivers approaching a major event with the goal of smoothing the flow of traffic either to an appropriate (potentially reserved) parking space or away from the area in the case of through or departing traffic. Advice will be based on a model of vehicle coordination that we term slot-based driving. In slot-based driving, each vehicle is allocated a location-based time slot in which to travel for the duration of its journey in a way not dissimilar to the way in which date time-division multiple access (TDMA) is used in data communication systems to allocate slots to messages in transit. The project is expected to prototype the design of the app and underlying control system and simulate its operation in a sample deployment, e.g., see [2].

[1] Two-hour traffic delay spoils Dublin Half Marathon
https://www.irishtimes.com/news/ireland/irish-news/two-hour-traffic-delay-spoils-dublin-half-marathon-1.3231877

[2] Traffic Management Bloom Festival 2016
https://garda.maps.arcgis.com/apps/SimpleViewer/index.html?appid=88aeb8c266824c9587b57f98ba08832f

Project 3. Driver Information Systems for Autonomous/Mixed Traffic

Large-scale deployment of (semi-)autonomous vehicles (AVs) is inevitable. However, the benefits of this deployment for traffic management in a world in which AVs and other vehicles will necessarily coexist (i.e., in mixed traffic) remain unclear. Reduced congestion, greater energy efficiency, and improved resilience of the traffic system to unexpected events are expected. In this context, our hypothesis is that full advantage of large-scale deployment of AVs for traffic management will only be achieved if AVs and (the drivers of) conventional vehicles can coordinate their actions.

To allow this hypothesis to be tested, this project will develop a framework for testing advanced driver information systems designed to provide drivers with advice on driving actions to be taken to optimise traffic flow in the presence of AVs. The project will include deployment of a 3D driving simulator connected to a mixed traffic simulator as well as the driver information system and advisory system framework. See, for example, [3], [4].

[3] https://www.opends.eu/home

[4] https://sourceforge.net/projects/torcs/

Project 4. Distributed Application Simulator

This project will investigate the design of a distributed systems/network simulator that can be used to facilitate deployment and testing of a wide range of distributed applications and algorithms especially in the presence of a variety of potential failure situations including node failures and network outages. The simulator should be capable of allowing a wide range of different network technologies and deployment scenarios to be simulated, different failure scenarios to be modelled, and experiments conducted to evaluate the resulting application behaviour. Critically, it should be ridiculously easy to deploy potentially large-scale applications and to configure and execute test scenarios and gather results. A developer should not need to write any code that is specific to the use of the simulator and should be able to develop applications using their choice of programming language(s) and middleware. Test scenarios would be defined either graphically or using a high-level declarative language.

Dr. Vasileios Koutavas

Areas of interest: programming language implementation and theory, concurrent and distributed systems, formal methods, software verification.

A number of projects are available in these areas which involve the development of part of a compiler, or the use of tools such as Why3 and Coq for verifying software systems. Theoretical projects are also available. These projects are particularly suitable for students who have taken at least one module on compilers, software verification or programming languages.

Dr Carl Vogel

Room

Extension

ORI.LG16

1538

All projects include review of the relevant literature, and where appropriate, argumentation in support of analyses given.

Note that implementation is not an essential component of every project in computational linguistics -- there's definitely more to the field than computer applications -- however, formal rigor is quite essential.

Don't worry if you don't recognize the name of the systems/languages mentioned. If the theme itself interests you we can sort out the technical details in person. Of course, these are all just suggestions, we're assuming that the final project description will be individually tailored in most cases.

Students who do projects with me will agree to regular weekly meetings at which we discuss the preceding week's work, and plans for the following week's. The initial weeks typically involve a considerable amount of diverse readings. Students intending to work with me on their project are encouraged to contact students who have done projects with me in the past. (See here for some details on that.)

Projects listed here are suitable for final year students on the CSLL/CSL course; students from other undergraduate and postgraduate courses may also find suitable topics here.

  1. Develop an HPSG (Head-driven Phrase Structure Grammar) grammar for a fragment of Irish and Implement it in the LKB focusing on the syntax of one of the following construction types:
    • Noun Phrases
    • Embedding Verbs
      1. proposition embedding verbs
      2. question embedding verbs

    Some examples of comparable projects are available for Irish, French, and German.
  2. Design and implement a chart parser for a CFG grammar with a dominance interpretation for phrase structure rules. This is essentially a framework for underspecified semantics. A disambiguation method must also be provided.
  3. Extend the semantic coverage in one of one of the frameworks included in the CLEARS (Computational Linguistics Education and Research Tool for Semantics) system.
    Particular areas of interest might be: negation, spatial modifiers, belief reports. An example of a project that did this in the past is available here.
  4. Extend the functionality of a generic interface for web-based experimentation in cognitive science (this will involve empirical research in an area of cognitive science to be agreed upon).
    This offers several possible topics and with varying degrees of implementational requirements. For all, some implementational extensions to the underlying system are necessary. Some will involve more or less actual experimentation using the system. Previous stages of the system are described, among other places, here, here, and here.
  5. Improve on the design and implementation of a web based multiplayer scrabble game, with the feature that point assignments to letters are calculated dynamically, on the basis of frequencies derived from corpora. A description of the base system is provided here. An extension of that work is described here here. There are many delightful ways in which this work can be extended. One example is including a facility for team play. Another is in implementing an automated player that humans can choose to play against.
  6. Extend and experitment with a platform for experimenting with models of dynamic systems, with particular attention to modeling evolution of linguistic behaviors. A starting point is described here, subsequent work is described here.
  7. Extend work on utilities for statistical analysis of linguistic corpora and apply them to specific tasks such as detection of grammatical errors, and automated correction suggestion.
  8. Develop and validate lexical resources for sentiment analysis.
  9. Develop methods within computational stylistics for investigating text-internal linguistic variables with external variables using large online textual resources. A comparable project is described. here.
  10. Develop methods for tracking events under varying descriptions in journalistic prose.
  11. Develop a Prolog implementation simulating the operation of theories in dynamic semantics.
  12. Develop a Prolog implementation of real-time belief revision systems.
  13. Extend an automatic crossword generator implemented in Java and Prolog. Documentation of its state in 2003 state is available here A more recent version is documented here. One avenue in which to extend this is to establish it as a system fully anchored on the Suns, with application in language learning and other topical areas.
  14. Develop online tools for other forms of fun with words -- an innovative anagram server, a crossword clue generator, etc.
  15. Formal syntactic and semantic analysis of dialogue. Example past attempts at this are available here and here.
  16. Extend a computational model of Latin morphology, with lexical lookup to achieve Latin to English Machine Translation.
  17. Extend a prototype grammar checker for Irish implemented in Prolog, integrating it with a spelling checker for Irish.
  18. Implement an efficient spelling checker for Irish in java, in the context of a webserver that collects words and their frequencies of use in checked documents, along with some other utilities for corpus linguistics.
  19. Incorporate an Irish language spelling checker and general proofing tools facilities into StarOffice/OpenOffice.
  20. Parse a large Irish-English Dictionary (the O Donaill). A description of the comparable project is provided here, and here.
  21. Projects in psycholinguistics. Past Examples appear here, here, here and here.
    Some specific topics I would like to explore further:
    1. Linguistic priming and unconscious coordination in written communication.
    2. Degrees of grammaticality and acceptability.
    3. Human reasoning with mildly inconsistent information.
    4. Computational stylistics (corpus driven syntactic and semantic analysis).
  22. Some general purpose utilities that can replicate standard offerings such as "DoodlePolls" and shared calendars, but with local data stores that accommodate varying levels of privacy and data protection.
  23. Develop tools to harvest from online sources a multi-lingual database of named entities.
  24. Build computational tools in support of structuralist analysis of myth and mythic-metaphorical representation (in the style of Levi Strauss).
  25. Test empirical dimensions of theories of holism in formulaic language associated with (im)politeness expressions.
  26. Test empirical predictions of recent theories of (im)politeness with respect to third-party and projected self-perception.
  27. Test empirical consequences of theories of gender differences in language use.
  28. Analyze proxy measures of mutual understanding in dialogue.
  29. Examine parameters that influence perception and choice in the ultimatum game.
  30. Topics in collaboration with Dr. Maria Koutsombogera: Analysis and modelling of multimodal and multiparty interactions. The projects will exploit a newly created corpus of multimodal interactions between three participants. The objective of the projects is to address some of the challenges in developing intelligent collaborative systems and agents that are able to hold a natural conversation with human users. A starting point in dealing with these challenges is the analysis and modelling of human-human interactions. The projects consist in the analysis of the low-level signals of speakers (e.g. gaze, head pose, gestures, speech), as well as the perception and inference of high-level features, such as the speakers' attention, the level of engagement in the discussion, and their conversational strategies. Some examples of similar work are documented here and here. Indicative literature is available here. Samples of other existing corpora will also be made available to interested parties.
    1. Prediction of the next speaker in multiparty interactions based on multimodal information provided by the participants' (a) gaze, (b) head turn/pose, (c) mouth opening and (d) verbal content.
    2. Measuring participants' conversational dominance in multiparty interactions by exploring (a) turn length, (b) speech duration, (c) interruptions (d) feedback responses and (d) non-verbal signals (mouth opening, gaze, etc.)
    3. Create a successful attentive listener: investigate and decide upon the features that constitute an active listener, based on the analysis of feedback responses, as well as their frequency, duration, and intensity.
    4. Prediction of success in collaborative task-based interactions: investigate the factors on which the perception of the success on a task depends. This will involve a series of perception tests examining the team role of the speakers and their conversational behavior.
  31. Topics in collaboration with Dr. Erwan Moreau: supervised and unsupervised methods for author verification and related application.
    The author verification problem consists in identifying whether two texts A and B (or two groups of texts) have been written by the same person. This task is the keystone of authorship-related questions, and has a range of applications (e.g. forensics). This problem can be addressed in a number of different ways, in particular in a supervised or unsupervised setting: in the former case, an annotated set of cases is provided (each case is a pair of texts A and B, provided with "yes" or "no" depending on whether A=B); in the latter case, no answer is provided.
    Given the availability of several datasets as well as a state of the art authorship software system, the project consists in exploring a certain aspect or application of the topic, for example:
    1. What makes a case more difficult to answer than another? The task would be to study this question through experiments, and then implement a method to predict the level of difficulty of a given case.
    2. Design and implementation of a web interface around the authorship system, possibly presented as some kind of game with text.
    3. While ML systems can be good at giving the right answer, they are not always able to give a human-understandable explanation of the result. The task would consist in studying how to explain the results of some of the methods.
    4. It is harder to answer the question of authorship verification across genres (e.g. by comparing an email and a research paper). One way to improve the system in this case is to distinguish the features which are related to the author from those which are related to the genre.
  32. Other topics to appear.
  33. Still other topics to be agreed upon individually.

Last Modified: Wed Jul 19 13:44:27 2017 (vogel)

Dr. Jason Wyse

Updated 09/10/17. email: wyseja@tcd.ie

Areas of interest to me are classification, model selection and sequential inference, all with a Bayesian flavour. Generally projects will involve students reviewing appropriate literature, implementing methods extending/exploring those in the literature and evaluating methods critically i.e. what are advantages/disadvantages, how could methods be improved further? The projects below are suitable for students in the data science strand of the MSc.


Dynamic logistic regression

Logistic regression is a staple of the analysts toolbox. Usually, analysts will want to carry out a retrospective analysis using logistic regression, i.e. gather all the data, then fit the model. This project will examine dynamic logistic regression where beliefs are updated as new data arrives in an online fashion. A typical application domain would be for streaming data. This project will review the various approaches available for dynamic logistic regression, but will specifically focus on using Sequential Monte Carlo (SMC) to carry out Bayesian sequential inference. In order to complete this project you will need a good knowledge of simulation based inference, MCMC in particular. You will also be very familiar with Bayesian methods and models.

Bayesian logistic regression at scale

Big data has brought us to a limit in terms of ability to analyse large datasets. Classic models that have been well studied and understood are still preferable to use, but we now face challenges in how these models can be used on large datasets. In some instances, computing a likelihood can be extremely challenging. This project will investigate methods and procedures in the statistics literature that are available for logistic regression. These can comprise Markov Chain Monte Carlo (MCMC) methods for tall data and methods such as coresets. In order to complete this project you will need a good knowledge of simulation based inference, MCMC in particular. You will also be very familiar with Bayesian methods and models. References:
  • Bardenet, R, Doucet, A and Holmes, CC et al., (2017). On Markov chain Monte Carlo Methods for Tall Data. Journal of Machine Learning Research, 18 (47), 1-43.
  • Huggins JH, Campbell, T and Broderick T, (2017) Coresets for Scalable Bayesian Logistic Regression. ArXiv preprint : arXiv:1605.06423



Last updated 27 October 2017 by .