Trinity College Dublin

Skip to main content.

Core links

FYP Projects 2016/17 and Proposals for 2017/18

1. Introduction

The information on this page applies to students taking final year projects, Year 5 dissertations, and M.Sc. dissertations in the School of Computer Science and Statistics under the following programmes:

  • BA (Mod) in Computer Science
  • BA (Mod) in Computer Science and Language
  • BA (Mod) in Computer Science and Business or Business and Computing
  • BAI in Computer Engineering (D Stream)
  • BA Management Science and Information System Studies (MSISS)
  • BAI in Computer Engineering and Microelectronics (CD stream)
  • BA (Mod) Mathematics
  • Master in Computer Science (MCS)
  • Master in Computer Science (M.Sc.)
  • MAI in Computer Engineering

2. Guidelines for students

Important dates and deadlines for academic year 2017/18

Course Activity Date
Integrated Computer Science (Yr 4) Project Selection Fri Oct 13, 2017
Final Year Project Project Demonstration Period April 16-20, 2018
  Project Presentation Material and Poster Submission Date Mon Apr 30, 2018
  Project Report Due Thu May 3, 2018
  Project Presentation and Poster Session Fri May 4, 2018
     
Integrated Computer Science (Yr 4) Internship Details Form Submission Date Fri Dec 15, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Tues Aug 7, 2018
  Final Submission of Reflective Diary Tues Aug 7, 2018
     
Master in Computer Science (Integrated, Yr. 5) Project Demonstration Period Mon May 21, 2018 & Tues May 22, 2018
  Project Presentation Material and Poster Submission Date Mon May 21, 2018
  Project Presentation and Poster Session Thu May 24, 2018
  Dissertation Submission Date Fri May 25, 2018
     
Master in Computer Science (M.Sc.) Project Demonstration Period Mon 6th - Friday 17th August, 2018
  Dissertation Submission Date Friday August 31, 2018
     
Computer Engineering (Yr 4) CS4E2/CE4E2 Ethics Clearance Application Deadline Mon Dec 11-Fri Dec 15, 2017
  Project Demonstration Period April 16-20, 2018
Final Year Project Project Report Due Thu May 3, 2018
     
Computer Engineering (Yr 4) Internship Details Form Submission Date Fri Dec 15, 2017
Internship Goals Document Submission Date Fri Feb 9, 2018
  Poster Submission Date Mon Apr 9, 2018
  Poster Presentation Date Thu Apr 19, 2018
  Mid Point Submission of Reflective Diary Fri Apr 27, 2018
  Technical Report Submission Date Tues Aug 7, 2018
  Final Submission of Reflective Diary Tues Aug 7, 2018
     
Master in Computer Engineering CS5E2 Research Methods - Preparation of a Research Proposal Mon Nov 6- Fri Nov 10, 2017
  CS5E2 Research Methods - Presentation of Research Proposal Mon Nov 6- Fri Nov 10, 2017
  CS5E1 Ethics Clearance Application Deadline Mon Dec 11-Fri Dec 15, 2017
  CS5E1 Interim Report Due Mon Dec 11-Fri Dec 15, 2017
  CS5E2 Research Methods - A short discussion on research ethics related to CS5E1 Fri March 2, 2018
  CS5E2 Research Methods - Research paper submission Mon April 16, 2018
  Project Demonstration Period Mon May 21, 2018 & Tues May 22, 2018
  Dissertation Submission Date Fri May 25, 2018
     
Management Science and Information System Studies Interim Presentations Mon Dec 4, 2017 - Fri Dec 8, 2017
  Project Report Due

Thu March 22, 2018

     
Computer Science & Business / Business and Computing Project Demonstration Period April 16-20, 2018
  Project Report Due Thu May 3, 2018
     
Computer Science Linguistics and Language Project Demonstration Period April 9-13, 2018
  Project Report Due Thu May 3, 2018

 

* Due to scheduling constraints it may be necessary to hold some demonstrations later in the week.

When to choose a project

An initial list of project proposals (from lecturing staff) will be released on the Thursday of the last week of Semester 2 in your Junior Sophister year. Supervisors will not accept supervision requests before this time. Further project proposals may be added to this list by lecturing staff over the summer vacation.

Students should select a final year project before the end of the third week of Semester 1. Where students have not selected a project by the deadline, a project supervisor will be allocated to them in consulation with the relevant course director out of the supervisors who have not yet reached their supervision limits. The chosen supervisor will assign the student a project or help them to specify a project in an area selected by the supervisor.


How to choose a project

Students may either

  • select a project from the list of project proposals put forward by the lecturing staff, or
  • alternatively propose their own projects. If you have a project proposal of your own and if you are having trouble finding an appropriate supervisor, then contact your course director:


In either case students must get the agreement of a supervisor before they will be considered as having selected a project. Supervisors may require a meeting with the student to discuss the project before accepting a supervision request. Once a supervisor agrees to supervise a project, details of the project assignment will be recorded centrally by the supervisor.

Students may only select a single project, but they may change their minds and select an alternative project before the end of the third week of Semester 1. However, if a student selects a new project, they must notify both the old and new supervisors that their previously chosen project is to be cancelled.


Choosing a project supervisor

Students should note that each supervisor will only take a limited number of students. If you find the information is incorrect please send details to Final.Year.Project.Coordinator@scss.tcd.ie

Students should also note that there are only a limited number of supervisors in any area. Hence students are not guaranteed a project in their area of choice.


Project demonstrations and reports

See the following documents:


Notice: Trying to get property of non-object in /srv/scss/StudentProjects/index.php on line 550

 

3. Supervisors' project areas

The following table indicates the broad areas within which projects are generally supervised, together with the potential supervisors in these areas. Each name is linked to a list of projects proposed by that lecturer.

Subject AreaSupervisors willing to supervise projects in this area
Artificial Intelligence Michael Brady, Vincent Wade, Martin Emms, Tim Fernando, Rozenn Dahyot, Carl Vogel, Khurshid Ahmad, Ivana Dusparic, Joeran Beel
Computational Linguistics Martin Emms, Tim Fernando, Carl Vogel, Khurshid Ahmad
Computer Architecture Jeremy Jones, David Gregg, Michael Manzke, John Waldron, Jonathan Dukes
Computer Vision Kenneth Dawson-Howe, Gerard Lacey
Distributed Systems Vinny Cahill, Stefan Weber, Mads Haahr, Dave Lewis, Jonathan Dukes, Melanie Bouroche, Siobhan Clarke, Ivana Dusparic
Foundations and Methods Hugh Gibbons, Andrew Butterfield, Glenn Strong, Tim Fernando, Vasileios Koutavas
Graphics, Vision and Visualisation Kenneth Dawson-Howe, Fergal Shevlin, Gerard Lacey, Michael Manzke, John Dingliana, Carol O'Sullivan, Rozenn Dahyot, Khurshid Ahmad, Rachel McDonnell, Aljosa Smolic
Health Informatics Lucy Hederman, Gaye Stephens, Mary Sharp, Joeran Beel
Information Systems Mary Sharp, Joeran Beel
Instructional Technology Brendan Tangney, Mary Sharp, Glenn Strong, Richard Millwood
Interaction, Simulation and Graphics John Dingliana
Knowledge and Data Engineering Vincent Wade, Lucy Hederman, Mary Sharp, Declan O'Sullivan, Dave Lewis, Owen Conlan, Khurshid Ahmad, Rob Brennan, Seamus Lawless, Kris McGlinn, Kevin Koidl, Alex O'Connor, Joeran Beel
Networks and Telecommunications Donal O'Mahony, Hitesh Tewari, Stefan Weber, Eamonn O'Nuallain, Meriel Huggard, Ciaran McGoldrick, Jonathan Dukes, Stephen Farrell, Melanie Bouroche, Marco Ruffini, Douglas Leith, Lory Kehoe, Georgios Iosifidis
Other David Abrahamson, Michael Brady, Stephen Barrett, Khurshid Ahmad, Melanie Bouroche, Marco Ruffini, Vasileios Koutavas, Douglas Leith, Joeran Beel
Statistics Mary Sharp, Rozenn Dahyot, John Haslett, Simon Wilson, Brett Houlding, Jason Wyse, Arthur White, Douglas Leith, Bernardo Nipoti


4. Project proposals for the academic year 2017/18

The following is a list of suggested projects for final year BA (CS), BA (CSLL), BA (CS&B /B&C), BAI, MAI, MCS, M.Sc., and MSISS students for the current year. Note that this list is subject to continuous update. If you are interested in a particular project you should contact the member of staff under whose name it appears.

This is not an exhaustive list and many of the projects proposed can be adapted to suit individual students.


Warning: include(/users/staff/arwhite/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/arwhite/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr. Joeran Beel

Position: Ussher Assistant Professor
Affiliation: ADAPT Centre & Intelligent Systems Discipline / Knowledge and Data Engineering Group (KDEG)
Contact: If you want to do one of the projects, or have your own idea, please read about how to continue in my WIKI (you need to register to get access, and be signed in to read the WIKI, otherwise you will get a 404/dead-page error).
Last update: 2017-10-013

The following projects are only suggestions, and I am open for your own ideas in the areas of:

  • Recommender Systems
  • Machine Learning
  • User Modelling
  • Information Retrieval
  • Artificial Intelligence
  • Information Extraction
  • Natural Language Processing
  • Text Mining
  • Citation Analysis
  • Bibliometrics
  • Altmetrics
  • Scientometrics
  • Plagiarism Detection
  • Blockchain
  • Digital Libraries
  • Digital Humanities
  • Finance (FinTech)
  • LegalTech
  • Tourism
  • Healthcare
  • Business start-ups

Please note that I am not always sure that the following ideas are novel and feasible. It is your responsibility to do some research before you start the project to find out if the idea is novel and if you are capable of completing the project in the given time frame. Many of the projects are suitable for business start-ups. If you are interested in doing a business start-up based on one of the project ideas (as part of your FYP or in some other context), contact me to discuss the details.

Improving Research-Paper Recommendations with One of Various Methods (Machine Translation, Machine Learning, Natural Language Processing, ...)

One of my main projects is Mr. DLib, which is a recommender-system as-a-service that delivers every month some million research-paper recommendations via an API to partners such as JabRef, Sowiport, MediaTUM and soon also TCD's TARA. In the context of Mr. DLib there are many projects you could do. The advantage of participating in Mr. DLib is that you will work with a real-world system that is used by real-users, i.e. you can evaluate your work with thousands of users instead of evaluating it with a small user study as you would probably do in many other projects. In addition, you will work closely with the team of Mr. DLib, i.e. you are involved in an active ongoing project instead of sitting alone on your desk and pursuing your FYP. To work with Mr. DLib you need good JAVA programming skills and basic Linux knowledge. Knowledge in APIs and (REST) Web Services, Python, and web standards (XML, JSON, HTTP, ...) are helpful, though not a requirement. A few project ideas are outlined in the following.

Self-Learning Recommender Systems

Problem/Background: Our different partners have different needs when it comes to calculating and displaying recommendations. Currently, we manually tune our recommender system to find the ideal algorithm(s) for each partner. However, especially in the long run, when we have hopefully dozens or even hundrets of partners, manually tuning the algorithms is not feasable any more.

Solution/Goal of the Project: We aim at developing a "self-learning" recommender system that identifies for each partner (and each user/item of the partner) the potentially most effective algorithm. The idea is that once we delivered some recommendations to a partner, machine-learning is able to identify in which scenario (i.e. which partner, which user of the partner, ...) which recommendation algorithm is most effective. If a partner then requests recommendations again in a similar scenario, our self-learning recommendation framework should use the potentially most promising algorithm. To accomplish the goal, there are two challenges to solve (you would focus on one of the two in your FYP). First, we need a method to effectively collect data, i.e. doing A/B testing and gain insights about what parameters of algorithms etc. might be most effective. Second, we need to find out which machine-learning algorithms are best suited to learn from the collected data.

Machine-Translation-Based Recommendations: improve our content-based recommendations with machine-translations

Problem/Background: In Mr. DLib, we have millions of documents in various languages (English, German, French, Russian, ...). This leads to a problem when users look at e.g. a German document, and Mr. DLib should recommend related documents. Assuming that every researcher speaks English, it would make sense to recommend English documents, even when a user currently looks at a German document. However, Mr. DLib's current content-based recommendation approach can only recommend documents in the same language as the input document.

Solution/Goal of the Project: You apply different machine-translation frameworks to translate all non-English documents (title and abstracts) to English. This way, all documents are available in the same language in our database, and we can also recommend e.g. English documents when a user looks at a German document. You will find out, which machine translation frameworks are best suitable for this task, and analyze how this approach overall improves the effectiveness of Mr. DLib.

Personalized Research-Paper Recommendations for JabRef Users

Problem/Background: We have integrated Mr. DLib into the reference management software JabRef already. However, so far, users can only receive non-personalized related-article recommendations. This means, a user looks at one article, and receives a list of similar articles. The problem of such recommendations are that they don't take into account what articles a user has previously looked at.

Solution/Goal of the Project: You extend the integration of Mr. DLib in JabRef, so that more comprehensive data of the users is transfered to Mr. DLib's servers, and on the servers persoanlized recommendations are generated. You will work with REST web services during this project, JAVA, MySQL, and recommendation frameworks.

Taken: "Nobel Prize or Not?" (Academic Career/Performance Prediction)

Problem / Background: Researchers' performance needs to be assessed in many situations - when they apply for a new position (e.g. a professorship), when they apply for research grants, or when they are considered for an award (e.g. the Nobel Prize). To assess a researcher's performance, the number of publications, the reputation of the journals and conferences of the publications where published in, and the citations a researcher has received are often used. However, up to now, these numbers are rather focused on the past. For instance, knowing that a researcher has published 50 papers by now and accumulated 2,000 citations says little about how the researcher will perform in the future.

Solution / Goal of the project: Your goal is to develop a tool (website or desktop application) that predicts how well a researcher will perform in the future, i.e. in which venues the researcher will publish, how many citations s/he will receive, etc. Ideally, a user could specify a name in that tool, and then a chart or table would be shown with the predicted citation counts etc. In addition, the tool might predict what the next university could be the researcher will work at.

Methodology (how to achieve the goal): One way to achieve the goal is to 1. create a data set that contains historic data of many researchers. The data should include the researchers' publication lists, citation counts, and ideally a work history (at which universities did the researcher work, and when) and the reputation of the venues the researcher has published in. Such data could be obtained from Google Scholar, SemanticScholar, LinkedIn, ResearchGate, and Scimago. 2. Based on the data, you train a machine learning algorithm that learns how well researchers perform based on the collected data 3. You apply the machine learning algorithm to predict a researcher's performance.

Infer/predict demographics based on data such as names, location, email adress, or text

Problem/Background: There are many applications that utilize users' demographics (age, gender, nationality, ...) such as recommender systems or personalized advertisement. However, demographic data is not always available.

Solution/Goal of the Project: Your goal is to develop a method that can predict a user's demographics (age, income, gender, education, ...) based on some input data (e.g. a name, email address, address, IP, tweet). For instance, the gender should be rather easily inferable based on a person's name. Similarly, an email like ...@gmail.de strongly indicates that a user is German while the email ...@aol.com probably indicates that the user is 30+ years old. You task would be to collect suitable data, and train machine-learning algorithms so they can predict a user's demographics.

"Pimp That Voice" (Eliminate Annoying Accents from Audio/Video or Live Talks)

Problem / Background: You watch a movie on YouTube or on Coursera, and the person that talks in the video has a horrible voice. For instance, the person might have a terrible accent (e.g. German), a super-high pitch voice that hurts your ears, or the person starts every sentence with ‘so’, ends every sentence with ‘right?’ and uses the word ‘like’ in every sentence twice.

Solution / Goal of the project: You develop either a software library or an entire application that gets a video (or audio) file as input and returns the file with the “pimped” audio. The “pimping” could focus on:

  1. Removing the speaker’s accent or simply replacing the voice with a completely different one. This means, from a user’s perspective, a user could either select to “make the original voice more pleasant” or “replace the original voice with a completely new one”.
  2. Improving the speaker’s grammar, i.e. remove unnecessary words such as ‘so’ or unnecessary uses of ‘like’.

Ideally, the tool works in real-time, i.e. it could process videos streams e.g. from YouTube while watching a video, or even when a public speaker talks with a microphone. However, for this project it is also ok to work with normal videos files, and if the processing takes a while. You could also slightly shift the focus of the application from improving the voice to helping people to become better speakers. This means, a user would talk with a microphone in private, and whenever the user starts a sentence with e.g. "So, " or ends a sentence with a rhetorical "right?), a beep would occur to remind the speaker to not do that. A very simple version of this project could be that you simply count how often a user says some of the "prohibited" words and at the end of the talk you display some statistics. Other variations include to change the problem from video/audio to phone calls (e.g. with customer support centres). Your goal would then be to develop a tool that allows a company having e.g. a call centre in India and giving all employees a British accent, or the same voice, when they talk to customers. There is a huge business potential in this. See also http://nationalpost.com/news/canada/canadian-speech-software-could-make-thickly-accented-overseas-operators-easier-to-understand

Mass Job-Application Detector

Problem/Background: Every professor and every HR person know the problem: You get an email from someone asking for a job, and, among others, you need to assess how much the applicant really wants to work with you, or if the applicant has sent the application to dozens of other companies/professors.

Solution/Goal of the Project: Develop a tool that detects mass-job-applications. The tool could be, for instance, an add-on for Gmail and when there is job application, the tool warns the current user that the application (probably) is a mass application. I see two options to realize such a tool (but there might be more)

  • Provide a probability score that the email is a mass application. This score could be machine learned based on the emails text and maybe other features. This task would be similar to detecting spam mails.
  • Check how many other users have received the same (or very similar) email. The more users received the email, the more likely it is a mass application. For professors, it would probably already enough if just a few other colleagues at the department also use the add-on to get reliable results.

Detect Drones with Crowdsourced Distributed Cameras

Problem/Background: There are more and more drones flying around, and the risk of e.g. terror attacks through drones is increasing. Consequently, the need to being able to detect drones as early as possible increases.

Solution/Goal of the Project: Imagine a distributed network of house owners who all install cameras (video or photos) on the roofs of their houses. The cameras could communicate over the Internet and together detect if an object in the air is a drone or a bird, cloud, airplane, ... Such recognized drones would be shown on a map and if they enter no-go zones, the policy would be informed. There are many extensions possible. For instance, cameras could be mounted on rotatable installations. Once, one camera detects a suspicious object, other cameras could rotate to that direction and take additional photos.

Plag.me: The automatic plagiarism creator

Problem/Background: None :-)

Solution/Goal of the Project: Develop a tool that takes a text as input (optionally a text with additional data such as citations, figures, tables, ...) and then rephrases the text and/or translates it, replaces (some) references, and redraws the figures and tables. All this should be done with machine learning. The goal of this project would be to demonstrate how good machine learning is nowadays and how difficult it is to detect such plagiarism. This project would be for demonstration purposes only, and/or to eventually support the detection of plagiarism. You would have to think of a way to minimize the risk of your tool being abused for actually creating plagiarism (one solution could be that all submitted texts and returned output are publicly available).

The Kaggle Machine-Learning Competition Solver

Problem/Background: Kaggle is a platform for machine learning and many companies offer competitions on Kaggle with the winner receiving significant prizes. Participating on these competitions is not really difficult in theory as it's usually always about cleaning the data, trying different machine learning algorithms and calculating whatever evaluation metrics the company wants. However, this is a time consuming process.

Solution/Goal of the Project: You write a tool that automates the process of participating in Kaggle competitions as much as possible. One potential solution would be to use "Automated Machine Learning Frameworks", which are capable, to some extent, to automatically run a large number of algorithms on a given dataset and find the most effective one. Alternatively, such a tool might also be used by Kaggle as a baseline in the long run, i.e. other participants need to be better than the automated machine learning libraries.

"Photo2Location" (Guide a User to the Location a Photo was taken)

Problem / Background: When people visit a nice restaurant or sight seeing spot, they usually take photos. However, when they look at the photos some days, weeks, or even months later they often do not remember where the photo was taken exactly, or, how to get there to e.g. eat again in that nice restaurant.

Solution / Goal of the project: You develop an app for iOS and/or Android that allows opening a photo in that app, and then the app displays the location of the photo on a map, and guides you there. This way, users can easily find e.g. the nice restaurant where they have eaten some weeks ago.

The project as describes here is probably not comprehensive and novel enough to justify a Final Year Project. So, you would have to come up with some additions to extend the project scope and make it a bit more original (maybe add a recommender system?).

Time-Normalized TF-IDF Term Weighting for Enhanced Search and Document Similarity Calculations

Problem/Background: TF-IDF is one of the most common term-weighting schemes to calculate how relevant a document is for a given search query. 'TF' stands for 'term frequency' and the rationale is that the more often a query term appears in a document, the more relevant that document is for the query. 'IDF' stands for 'inverse document frequency' and the rationale is that a document is the more relevant for a query term, the less documents in the corpus contain that term. The problem with this approach is that it does not consider since when the term is used in the documents. For instance, the term 'bitcoin' appears only since few years in documents because Bitcoin was only introduced in 2009. In contrast terms like "search engines" are used since decades, consequently there are much more documents to expect containing that term.

Solution/Goal of the Project: You modify the traditional TF-IDF formula (and/or other term weighting schemes) to consider since then the query term is used, and then you evaluate if the retrieval performance can be improved.

Voice2Text2Voice2Video2...

Problem/Background (1): You get a text message from your dearest and wish she/he had send that message as voice or video message

Problem/Background (2): You voice or video talk with your dearest and the quality is too low to really hear or see the other person

Solution/Goal of the Project: You develop a tool, e.g. a WhatsApp add-on, that is capable to play a text message in the voice of the person who sent the message. This means, for instance, when your mother sends you a text message, you can press a button "Read out loud" and then the message is played in your mother's voice. To accomplish this, the voice of your mother would have to been learned e.g. from previous WhatsApp voice calls with her. Or, even cooler, the text message would not be only read out loud but you could see a video of your mother. Similarly, to address the second problem, you could develop a tool, e.g. a WhatsApp add-on, that transforms the speach of user 1 to text, sends the text user 2, and on user 2's phone the text is then played as voice or video.

There are variations possible. For instance, it would not necessarily have to be Text2Voice or Text2Video. It could also be Voice2Video.

The Journal and Conference Health Monitor (How active/"healthy" are academic conferences and journals?)

Problem/Background: Researchers publish their work usually in academic journals, or on academic conferences. The reputation of such venues is an important metric to judge the reputation and impact of a researcher. To estimate the reputation of a conference or journal, typically citation-related measured are used, and there are many platforms that provide such rankings that typically range from A (top venue) to C (mediocre venue) and unranked. However, these rankings provide rather a static image of a conference or journal. Recently, researchers calculated the "health" of a few selected conferences. The researchers calculated the health not only based on a static citation measure, but considered the development over time, and used additional criteria including the number of authors who publish on a conference, the number of new authors, and the number of programm commitee members. However, this was a semi-manual process that involved lots of work.

Goal of the project: The goal is to develop a method that can automatically calcualte the "health" of academic venues on a large scale. Your tasks would include to define what exactly makes an academic venue "healthy", collect the required data, and calculate the health for a large number of conferences. Ideally, you would also develop a method that predicts how the health will continue to evolve in the next years.

Sketcha: Captchas based on Sketches

Problem/Background: Captchas are common tools to detect spammers and bots and there is a constant battle between the two groups. There are audio captchas, voice captchas, text captchas and so on and so forth. Also sketches have been used to detect spammers and bots http://gfx.cs.princeton.edu/pubs/Ross_2010_SAC/index.php. However, Google recently released a huge dataset with sketches https://techcrunch.com/2017/08/25/google-releases-millions-of-bad-drawings-for-you-and-your-ai-to-paw-through/amp/. To the best of our knowledge, this data has not been used to implement a novel captcha method.

Solution/Goal of the Project: Be creative and find a way to develop a novel captcha method based on sketches (either the ones of Google, or some other method).

A Generic Machine-Learning Based Website Parser/Scraper (for research articles)

Problem/Background: In many situations it is necessary to parse a website and identify e.g. the title, authors or abstract of the text being displayed on a web page. This is important for web crawling but also to e.g. import research articles from the web to your reference manager (see e.g. https://scraper.bibsonomy.org ). As far as I know (please let me know if I am wrong), current parsers use heuristics and templates to identify e.g. the title of a website. With the huge advances in machine learning this seems not appropriate any more.

Solution/Goal of the Project: Develop a web page scraper/parser that identifies certain elements automatically from a web page. The parser should be trained with machine learning, and compared against the state-of-the art parsing tools. I am particularly interested in parsers for academic content, similar to https://scraper.bibsonomy.org but am also open for other disciplines (e.g. parsing news websites).

Extending Word-Embeddings with Citation Context

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: The idea is to replace citations with e.g. the titles of the cited documents. So, while normally a text being used for learning word embeddings would look like this:

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

... the "extended" approach would add the title of the cited document to that text and use this extended text for learning. Hence, the text would change to

One of the most common recommendation approaches is content based filtering. Research Paper Recommender Systems: A Literature Survey found that 53% of all research-paper recommender systems use content-based filtering.

Citation-Embeddings: Applying the Idea of Machine-Learned Word-Embeddings to Citations in the Context of Research-Paper Recommender Systems

Problem/Background: When it comes to indexing documents, each term that appears in the documents typically presents one dimension in a vector space. Consequently, a large document corpus can easily have thousands or even millions of dimensions. Word embeddings have changed this. With word embeddings created by machine learning, the vector space is reduced to a few hundred dimensions. Hence indexes are much smaller, and often retrieval performance is increased. However, some document types such as research articles do not only contain text but additional data such as citations. So far, this data is ignored with traditional word embeddings.

Solution/Goal of the Project: Instead of terms, citations are used for the embedding. The approach would either use citations only, or a hybrid approach of citations and terms. Citation embeddings can rather easily be created when for each citation a unique document ID is given. For instance, if two documents both cite the same document ...

One of the most common recommendation approaches is content based filtering. Beel et al. (2015) found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering [4].

these texts would be converted to

One of the most common recommendation approaches is content based filtering. unique_document_id-4854564 found that 53% of all research-paper recommender systems use content-based filtering.

It was found that many research-paper recommender systems use content-based filtering unique_document_id-4854564.

A Citation-Proximity-Based Ground-Truth to Train Text-Based Recommender Systems / Learning to Predict Citation Proximity based on Terms

Problem/Background: Many recommender systems (e.g. for news, web pages, research articles, ...) need to be able to identify related documents for a given input document. For instance, if a user reads a news article, the website might want to recommend related news articles to keep the user reading. Calculating document relatedness is not a trivial task. Often, text similarity is used (e.g. cosine similarity) or relatedness is learned from some ground-truth. For instance, a machine-learning-based recommender system for research articles, might learn that all articles published in the same journal or the same author are somewhat related. However, this approach often does not achieve satisfying results.

Solution/Goal of the Project: A solution could be to learn text-based document similarity based on citation proximity analysis (CPA) https://en.wikipedia.org/wiki/Co-citation_Proximity_Analysis. You would need to find a document corpus that contains the full-text of research articles and their in-text references. You would then train a machine learning algorithm with the terms and citation proximity. This means, the algorithm should be able to learn how closely two documents will be cited, given their text as input. When you later will have a new un-cited document, you will be able to predict based on the document's text, which other documents potentially would be cited in close proximity to that article.

Taken: The 1-billion Citation Dataset for Machine-learning Citation Styles and Entity Extraction from Citation Strings

Problem/Background: Effective citation parsing is crucial for academic search engines, patent databases and many other applications in academia, law, and intellectual property protection.  It helps to identify related documents, or calculate the impact of researchers and journals (e.g. h-index). “Citation parsing” refers to identifying and extracting a reference like [4] in the full text, and author names, journals, publication year etc. from the bibliography. For instance, in the following example the citation parser would have to identify the citation markers [1] and [2] and [3], [4], and then extract from the bibliography that for the first entry "K. Balog", "N. Takhirov" etc are the atuhors.


1 Introduction
Retrieving a list of ‘related documents’ for a given source document – e.g. a web page, patent, or research article – is a common feature of many applications, including recommender systems and search engines (Figure 1 ). Document relatedness is typically calculated based on documents’ text (title, abstract, full-text) [1] and metadata (authors, journal, …) [2], or based on citations/hyperlinks [3], [4].



6 Bibliography
[1]    K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg, “Multi-step Classification Approaches to Cumulative Citation Recommendation,” in Proceedings of the OAIR’13, 2013.
[2]    D. Aumueller, “Retrieving metadata for your local scholarly papers,” 2009.
[3]    B. Gipp and J. Beel, “Citation Proximity Analysis (CPA) - A new approach for identifying related work based on Co-Citation Analysis,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 2, pp. 571–575.
[4]    S. Liu and C. Chen, “The Effects of Co-citation Proximity on Co-citation Analysis,” in Proceedings of the Conference of the International Society for Scientometrics and Informetrics, 2011.

Over the years many approaches to reference parsing have been proposed, including regular expressions, knowledge-based approaches and supervised machine learning. Machine learning-based solutions, in particular those falling into the category of supervised sequence tagging, are considered a state-of-the-art technique for reference parsing. Unfortunately, they still suffer from two issues: the lack of sufficiently big and diverse data sets and problems with generalization to unseen reference formats. Especially for deep learning, much larger datasets would be needed than exist today.

Solution/Goal of the Project: Your goal would be 1) to create a massive citation dataset 2) use that dataset to train (deep) machine learning approaches to parse citation strings.

Methodology: To achieve the goal, you could do the following

  1. Download/parse millions of structured metadata of academic publications e.g. from ACM Digital Library, IEEE, PubMed, ... (they all offer their metadata as BibTeX, Endnote, ...). For instance, you would have millions of entries like this:

    @INPROCEEDINGS{Beel2017g,
    author = {Beel, Joeran and Aizawa, Akiko and Breitinger, Corinna and Gipp, Bela},
    title = {Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia},
    booktitle = {Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL)},
    year = {2017}
  2. Use http://citationstyles.org/ and https://github.com/michel-kraemer/citeproc-java to create millions or even billions of citation strings based on the parsed metadata. "Citationstyles.org" is a collection of thousands of citation styles and citeproc-java a framework to convert e.g. bibtex into one of the thousand styles, i.e. you could output the previously parsed metadata in thousands of different citation styles, e.g.
    • J. Beel, A. Aizawa, C. Breitinger, and B. Gipp, “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia,” in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • Beel, J., Aizawa, A., Breitinger, C. & Gipp, B. Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia. Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) (2017).
    • Beel, Joeran et al. “Mr. DLib: Recommendations-as-a-Service (RaaS) for Academia.” In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.
    • ...
  3. In addition, it might make sense, to create further artificial citations with a knowledge base. For instance, you download a list of journal names (e.g from http://www.scimagojr.com/) and names (first name, last name), create random page numbers etc. and then create billions of new citation strings.
  4. It could also make sense to create the citation strings, make a Word or LaTeX document that contains a bibliography with e.g. 5-30 citation strings, create a PDF out of it, Parse the PDF with one of the many PDF parsing tools to identify the bibliography and citation strings, and then use that data for learning. This would be a more realistic scenario because probably the PDF creation and parsing would introduce some errors/noise into the citation strings.
  5. Use machine learning frameworks like scikit-learn or Tensorflow to learn the elements of a citation string.

"Unshredd Me" (Reconstruct Shredded Documents)

Problem/Background: Criminal investigators and others often face the problem that suspects shredded documents, i.e. destroyed evidence. Hence, the investigators need to restore the shredded documents, which is a lot of work and sometimes impossible.

Solution/Goal of the Project: You develop an "unshredder" tool (website or desktop application) that takes as input a photo of a shredded document and then returns the unshredded documents. To accomplish the project, you probably have to create a dataset of photos showing shredded documents and the original unshredded versions. With this dataset, you can train a machine learning algorithm and then evaluate how well your algorithm works. When doing the project, you should decide if you want to focus on machine-shredded documents (probably easier) or documents that were torn apart.

"ASEO Me" (Optimize Research Articles for Academic Search Engines)

Problem / Background: Researchers have an interest that their research articles are indexed by academic search engines such as Google Scholar and SemanticScholar as this increases the articles' visibility in the academic community. In addition, researchers should be interested in how well their articles rank for certain keyword searches. Some years ago, I published an article about "academic search engine optimization" that gave advice on how to optimize research articles to make them easily indexable by academic search engines. Nowadays, ASEO is being used by many publishers. However, many researchers are not yet aware of the importance of ASEO and/or they do not have the skills to perform ASEO.

Solution / Goal of the project: Your goal is to develop a website, on which researchers can upload a research article, and your website is analyzing the article and 1. Predicts how well the article will rank for certain keywords 2. makes suggestions on how to improve the article's ranking and 3. (optionally) modifies the article automatically to make it better readable/indexable by academic search engines.

ASEO Reloaded (Academic Search Engine Optimization 2)

Problem / Background: As described in the previous project, I published an article about "academic search engine optimization" that gives advice on how to optimize research articles to make them better indexable by academic search engines. The article was published some years ago, hence not all advices might be sensible, or, due to advances in search engine ranking, some additional aspects might need to be considered.

Solution / Goal of the project: Your goal is to find out how to optimize research articles for academic search engines. In contrast to the other project, the focus here is on the research, i.e. you will do some experiments to find new ways of optimizing research articles, while the focus of the previous project is more on the application (i.e. enabling a user to upload a paper, and getting advice).

Deep Information Extraction

Background: Information extraction from documents allows to automatically obtain information such as document title, author names or dates directly from the file content. Extracting information is usually done by a sequence of steps. For example, first we might recognize objects such as words, text lines and blocks within a PDF document, then such objects could be classified into categories (such as "title", "authors", "date", "section title", "paragraph", etc.), and finally we might perform additional tasks such as splitting authors lists into individual names and surnames. Usually, the most important step is the classification of document fragments (blocks of text) into categories. Traditionally this is done by supervised machine learning, which operates on features such as the words or phrases in the text, the formatting of the text, its font, size and position on the page, etc. These features are manually developed by researchers, which takes time and effort.

Goal: In this project, you will explore the possibility of applying deep neural networks to the problem of classifying such document fragments into categories. One of the strengths of deep networks is that they do not require "smart" hand-crafted features on the input, but are able to work very well even with the raw input. For example, in image classification, we do not have to write code for recognizing basic objects like edges or other geometric shapes in order to provide this information to the network. Instead, it is enough to feed the neural network with only RGB values of the image pixels, and the layers of the network will automatically learn to recognize basic shapes needed to accurately classify the image.

The project will focus on developing a good alternative representation of a text block, such that will result in high classification accuracy without the need to manually engineer features. For example, a text block can be represented as an image from the original PDF file, which could make the network automatically learn features such as text size or justification. To capture the position of the block on the page, as well as its neighborhood, it could be useful to also provide an image of the entire page with the block of interest highlighted in some way. It's also possible to add work embeddings as input features, to capture the textual layer of the block as well.

Why cited?

Background: Referencing other researcher's work is very important in scholarly communication. Usually, we assume that the more citations a certain paper has, the better it is. On the other hand, there are many reasons why we cite a paper: maybe we use the data or tool described in another publication, maybe we extend someone's work by adding new functionality, or we are comparing our results to the competitor. Having such detailed information about why papers are cited might help to better assess their quality.

Goal: The goal of the project is to develop an automated tool able to determine the reason why a paper was cited in another paper. The most natural solution for such task is supervised machine learning, and textual sentiment analysis. A supervised classifier could use features related to the context of a citation (the sentence the citation appears in, these could be words, phrases, word embeddings, etc.). Other features might also be useful, for example, how many times the paper is cited, whether this is a self-citation, in which section the citation appears ("state of the art", "methodology", "evaluation"?). In addition, it might also be interesting to take a large set of citation data and try to automatically (or semi-automatically) discover what are some common reasons for citing papers, without assuming we already know those reasons. This task could be approached with unsupervised machine learning (clustering).

Named Entity Recognition for Computer Science

Background: Named Entity Recognition refers to finding entities like people's names, companies, places or dates in textual documents. Traditionally this can be achieved using supervised classifier which analyses the text represented as a sequence of tokens (words or similar units). Tokens are labeled sequentially based on various features, as well as their neighborhood in the text.

Goal: In this project, you will build an entity recognizer for Computer Science domain, able to process for example academic papers or other documentation. This recognizer will be able to find entities such as library name, algorithm, complexity, dataset, operating system, device, license, programming language, etc. The project will require building a training set of documents with marked entities (this can be automated to some extent using, for example, existing knowledge bases and searching for known entities in the document corpus). The second part will aim at training and evaluating a token tagger. Various features from the traditional named entity recognition, as well as word embeddings, could be used.

This project can be also extended to extract relations between entities, such as "library uses programming language", "algorithm has given complexity", etc. This part would also be based on supervised machine learning.

Extraction of Phrases Answering the Five Journalistic W-Questions using ML

Problem / Background: News articles typically answer the five journalistic W-questions (5W) within the first few sentences, i.e., who did what, when, where, and why. The 5W phrases answering these questions describe the main event of an article. Thus, the 5W phrases are used for various applications in automated analysis and processing of news, including news clustering (finding related articles), news aggregation (such as Google News), and summarization.

Solution / Goal of the project: Your goal is to find proper features that can be used by machine learning (ML) and deep learning methods to extract from a given article for each 5W question one phrase that best describes the main event. Thereby, you can start by experimenting with features and methods that we implemented in a previous, non-ML-based 5W extractor. After you have devised your method, you should compare the results to our previous system using a gold-standard dataset.

Warning: include(/users/staff/bourocm/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/bourocm/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/brady/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/brady/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr. Andrew Butterfield

Room

Extension

ORI G.39

2517

Background

My research interests are in the areas of Formal Methods, for verifying the correctness of hardware and software systems through the use of mathematical modelling and proof. The current focus is on the use of the so-called Unifying Theories of Programming paradigm (UTP) for linking together theories that model different types of programming languages. Also of interest is the use of Functional Languages, particularly pure lazy ones like Haskell and Clean.

Most of these projects can be scoped to suit Senior Sophister projects, or 5th year MCS/MAI Dissertations

Application Areas

Spaceflight Software

Work for the European Space Agency (ESA) has raised the issue of using formal techniques to reason about the correctness of C code that handles interrupts in hypervisors. Of interest is seeing how high level models might be connected to tools that analyse code with appropriate formal annotation (e.g. Frama-C).

A set of baseline requirements for a Seperation Kernel are now available. Of interest is building formal models of these using Haskell and/or CSP in conjunction with the FRD4 Refinement Checker.

A connected project is to take the CSP parser associated with FDR4, which is open-source at libcspm, and revise it to support so-called literate programming, in a similar fashion to that available in Haskell. The parser is written in Haskell, by the way.

Work Processes

I have been working with John Noll who is a colleague in Lero, the Irish Software Research Centre, based in UL. He and colleagues from his former job in Santa Clara had developed a language originally for describing software development processes, called Process Modelling Language (PML). They developed some tools to analyse PML descriptions. I have been collaborating with him to give PML a formal semantics, and to develop new tools written in Haskell. The language can be used to describe a wide range of processes, from general business, through to so-called clinical healthcare pathways. We are also interested in using them to model software development standards for medical device software. Projects could range from progressing the analysis tools, to using it to model some real process.

Facebook's infer

Facebook have open-sourced their radical new approach to verifying memory safety in Android and iOS code (see Infer). It would be interesting to explore the use and extension of this technique.

Haskell based Projects

In addition to specific projects listed below, I am willing to supervise projects that use Haskell or similar functional languages (e.g. OCaml, Lisp).

UTP Calculator (a.k.a. "Theory Hacking")

Developing UTP theories involves a lot of test calculations. A rapid theory prototyping tool UTPCalc has been developed in Haskell that allows the user to code up theory definitions using the Haskell language itself. Possible projects include extending and improving the calculator, and/or applying it to check new or existing UTP theories.

Theorem Prover Tool Support

Of particular interest at the moment is the development of proof support tools for UTP, most notably "Unifying Theories of Programming Theorem-Prover - U(TP)2" , a proof assistant written in Haskell.

I am willing to supervise projects that either use the tool to build a theory of some interesting language, or help to improve the power and quality of the tool in some way. Example project ideas include:

  • Improved pretty-printing of large expressions and predicates, using nice fonts, and with every component "clickable".
  • Enhancing (induction) proofs through the use of "rippling"
  • Connecting the proof-engine to known secure theorem provers to justify/verify basic proof steps.

Working with the tool requires knowledge of Haskell, as would be obtained by taking module CS3016 in JS year and ideally followed up with CS4012 in the SS year.

Warning: include(/users/staff/cmcgldrk/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/cmcgldrk/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Prof Dave Lewis

email with "PROJECT IDEA" in the subject line.

Privacy Canvas:

The Business Model Canvas is a popular tool for the iterative modeling of business ideas. Recently we have adopted the affordances of the business model canvas (simplicity, graphical layout, ease of design iteration) to the problem of modelling the ethical issues of an digitial application project. This has resulted in the Ethics Canvas design and application, which has been used to help teach ethics considerations at undergrad, postgrad and postdoc levels. A similar tool may be useful when considering and teaching privacy and data protection concerns. This project will refactor or redesign the ethics canvas code to offer a canvas style interface for brainstorming the data protection issues in a digital application design, in a way suitable for supporting training in this topic in remote groups.

Multilingual GDPR annotator:

With multiple approaches emerging to support compliance to the EU’s new General Data Protection Regulation, supporting the linking of different privacy policy or privacy impact assessment documents back to the GDPR source text becomes of interest to those needing to demonstrate compliance. This project will provide web annotation tool support for linking GDPR text with organisation specific data protection documents, and enable this for different languages. This could then be used for other regulations or standards requiring compliance tracking internationally. The approach should follow a standardized web annotation approach and should build on the linked data representation of the GDPR text developed in the school. This project would suit a student with strong language skills in a European language in addition to English.

Generic Data Management Artefact Meta-Management:

Data management is becoming an increasingly complex and vital part of any organisation attempting to leverage big data assets. Declarative data objects using standard vocabularies and data manipulation languages provide powerful data management features, but as they become popular these objects themselves must be managed over their useful lifecycle, so they can be indexed, discovered, revised, corrected etc. This project will explore open vocabularies and tools to provide support for such lifecycle management over a small sample set of artefacts, namely, semantic mapping in SPARQL and it explicit representation in SPIN, data uplift mapping in R2RML, data protection compliance queries in SPARQL/SPIN.

Open Data for Research Ethics:

Research ethics clearance needs to be secured for scientific studies within research institutes, but the details and provenance of such data is typically not available if experimental data is later shared with other researchers. This project will explore a linked open data vocabulary to complement existing open science data models (e.g. that of OpenAire) to allow the ethic clearance associated with that data to be recorded and shared in an interoperable manner between research institutes via an open API.

Asserting Collective Control over the Means of Cognition:

Big web-based companies, often referred to as digital ‘platforms’, are able to leverage personal data on a massive scale for use in targeted advertising and other opaque behavioural influencing activities. Modern machine learning techniques lead to a massive information asymmetry between user and such companies, i.e asymmetry between what they know about us and what we know about how they leverage, share and use our data. While data protection regulation aim to redress this balance, it only operates at the level of the rights of individuals, so this power asymmetry may not be greatly impacted for the population of users overall. This project will explore ways in which social media groups can be used to share concerns about the aggregation, sharing and processing of personal data and to organise collective action around these concerns, upto and including mass transfer of personal data to another platform. Tools to enable mass, collectively organised transfer of data to another platform can exploit both the enhanced right to portability users now enjoy and interoperability standards from the World Wide Web Consortium’s working group on the Social Web.

Digital Ethics Observatory:

News stories about Big Data and AI ethics appear in the media daily. However, there are few resources available for those wishing to monitor these fast moving issues. This project will develop an application that allow news stories to be archived and then annotated by interested volunteers using the ethics canvas tool (ethicscanvas.org), to provide an open, searchable index of digital ethics news stories for researchers, journalists and concerned citizens alike.

Data Protection Process Browser Widget:

The EU’s new General Data Protection Regulation offers users across EU common rights on how their data is processed by organisations. This project will develop and evaluate a web widget that can be integrated into different web sites and offer a simple graphical, process-oriented visualization for exploring the rights offered by a specific service’s privacy policy, based on an existing model developed in the ADAPT Centre for Digital Content Research.

Blockchain for Value-chain Consent Management:

The EU’s new General Data Protection Regulation offer users right to rectify or erase data previously provided to a service provider. Response to requests that exercise this right must be propagated to any other organisations with whom that user’s data has been shared and its implementation must be recorded for regulatory compliance purposes. This potentially adds significant complexity to systems for sharing data along business value chains. This project will explore the level to which existing blockchain platforms can reduce this complexity and the cost involved, especially in order to mitigate the risk of this regulation becoming an excessive burden on small to medium enterprises data sharing.

Visualising provenance for data and consent lifecycles:

The upcoming General Data Protection Regulation requires companies and organisations to maintain a record of the user’s compliance and data lifecycles. These lifecycles can be complex as the same consent and data can be used in several activities which makes it difficult to track their usage. Visualisations are a great way to display information in a concise and simpler manner, and can prove to be helpful in navigating complex pathways such as the lifecycles. The project explores various ways to visualise provenance traces in a granular manner so as to enable tracing data and consent from an user to all the activities that use it, based on an existing model developed in the ADAPT Centre for Digital Content Research.

Integration of Building Data Sets in a Geospatial Context:

Currently, building information is often dispersed and fragmented across different storage devices, different file formats and different schemas. This data must be integrated in order to support a range of use cases relevant to smart buildings and cities, such as those related to navigation, building control and energy efficiency. In this project you will explore available standards and data sets, and using established methodologies for data uplift, convert these datasets into Linked Data, making them available over the web and linking them to other available data sets, such as geospatial data. You will answer the question, can existing open datasets be used to derive useful information about buildings to support the aforementioned use cases.

Exploratory technologies for supporting data uplift - https://opengogs.adaptcentre.ie/debruync/r2rml

Conversion of building information geometry into geospatial geometric data:

The Industry Foundation Classes (IFC) is a standard for exchanging building information. Currently, a large part of the standard is dedicated to storing and exchanging geometric data about the geometry of the building and building elements. A complex set of relations are maintained within the IFC schema to support geometry, which when converted to RDF leads to significant overhead in terms of storage as triples. In this project you will explore methods for reducing the size of the geometry of IFC models, in particular, through their conversion to Geographical Information Systems standards such as Well Known Text answering the question, are GIS geometry models a suitable way to store building geometries.

Exploratory technologies for working with IFC geometry (removes geometry from an IFC OWL conversion - https://github.com/pipauwel/IFCtoSimpleBIM)

Visualisation of building geometry in a geospatial context:

Open and accessible building information can support multiple use cases relevant to smart buildings and cities. The OSi has a large dataset of building data, which includes geospatial data about building location, and other properties like its current use, the type of building (its form and function). In this project you will explore an interface for the visualization of the OSi building data to support the querying of buildings, but also interaction with the building geometry through a web interface, e.g. point and click selection of buildings (with HTML5 and the three.js WebGL library). You will examine what is an appropriate way to visualise building data so that it can support users when generating and exploring queries. Exploratory technologies for visualising building information, is available that shows a very simple three.js GIS model which integrates OSi county data.

Online questionnaire tool for GDPR compliance assessment:

The General Data Protection Regulation (GDPR), agreed upon by the European Parliament and Council in April 2016, will replace the EU Data Protection Directive (EU DPD). Organizations dealing with personal data of EU citizens must ensure that they’re compliant with the new requirements of the GDPR before it becomes effective on 2018. It is important for the organization dealing with personal data to assess their compliance with GDPR to identify risks before regulatory violations occur, as the fines under GDPR can be upto 4% of a company's global turnover. This project will build a online support tool for GDPR compliance based on assessment questionnaires. The tool will show the important aspects of GDPR and based on answers to the questions of compliance assessment, the tool will show whether they are fully compliant or they need to work in that area to improve compliance.

Warning: include(/users/staff/dgregg/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/dgregg/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr. John Dingliana

https://www.cs.tcd.ie/John.Dingliana/

Room

Extension

02-014 Stack B

+3680


Some general info

I am a member of the graphics, vision and visualisation research group interested in the areas of:

  • 3D Visualisation
  • Computer Graphics and Virtual Reality
  • Graphical aspects of Mixed and Augmented Reality
  • Stylised Rendering / Non photo-realistic rendering
  • Physically-based Animation

Suggested Projects:

  1. Topics in Visualization: I am interested in visualization of spatialized 2D and 3D data structures (i.e. data that has a geometric structure). Some possible topics:
    • Multi-modal spatialized data visualization e.g. fused visualization of data from different sources
    • Multi-variate visualization e.g. visualizing a vector field with many variables.
    • Time-varying spatialized data visualization
    • Spatio-temporal visualization
    • Visualization using Virtual and Augmented Reality devices
    • Perception in Visualization
  2. Remote telemetry using head mounted Virtual Reality/Augmented Reality displays: video and 3D information acquired from a remote sensor will be transmitted live to a user wearing an HMD (an occulus rift or metavision AR head mounted display will be available for use). Instead of the user merely seeing through "the eyes of the camera", the spatial information gathered will be displayed asynchronously in order to minimize motion sickness effects and allow the user to explore the data independently.
    Potential challenges include the following (each of which could be the focus of a project):
    • reducing latency between acquisition, processing and display of 3d environments on AR/VR displays
    • adaptive level of detail in interactive AR/VR
    • blending different modalties (e.g. fusion of different sensors or seamless integration of real and virtual).
    • ensuring accuracy/fidelity of the visualization
  3. [TAKEN]Spatial perception in AR : This project will explore how users perceive relative distances of objects (e.g. real vs virtual) in mixed environments. Can users reliably judge which object or feature is closer, do users have an accurate sense of scale, can users be convinced that a real and virtual object are collocated/connected? In particular there is limited work in up-and-coming "see-through AR" devices such as the Microsoft Hololens.
    The effort in the project will be in using one or more AR displays to render experimental 3D graphical scenes wherin virtual objects are embedded in the real world; implementing a number of strategies (mostly from the existing literature) to improve spatial perception in such scenes; implementing a testing scenario to compare spatial percpetion from different strategies; and potentially running a pilot experiment.

  4. Interactive Anatomy Illustration from Real-world Data: In anatomy education, it is quite common to use artistic illustrations such as the one pictured to the right by Frank Netter who created thousands of images, used in most of the anatomy textbooks worldwide. Illustrations are purported to be easier to understand, learn, and remember when compared to scanned or photographed images of anatomy. However such illustrations require a skilled artists also well versed in anatomy and a large amount of effort and time to produce.
     
    This project looks at challenges of creating art-like illustrations (also called non-photorealistic renderings) directly from 3D data such as MRI scanners. The images generated should be created in real-time from arbitrary anatomy datasets with little or no pre-processing or parameter tuning. In addition it should be possible to interactively manipluate the views for instance by instance zooming, clipping, rotating the rendered data.
     
    There are a number of previous works that have generally achieved such visualizations, however there are still many interesting challenges that have to be solved for instance:
    • implementing such solutions for web or mobile devices
    • reducing the requirement for parameter tuning or artistic input specific to particular data set
    • generally increasing the quality of automatic illustrations
    • dealing with ambiguous data e.g. in MRI scans, it is often not easy to separate on tissue from another
  5. Abstraction of images and videos Most forms of illustration or art imply some degree of abstraction, in other words simplifying the image to an optimal degree so that the most significant parts of the scene are retained, whilst extranous elments removed. In hand drawn art, the artist makes this choice based on skill and intuition. In computer graphics a number of techniques have been proposed to abstract images such as by retaining detail in important edges whilst smoothing other areas. This project will examine how Principal Components Analysis (PCA) could be used effectively as an automated means of abstracting images, models and 3D animations.
  6. Superhuman vision using augmented reality. [This is an implementation project for a 4th Year FYP and may not be suitable for an MSc Dissertation]: The objective of this project is to address some of the challenges of merging virtual graphical objects with dynamic real-world objects to provide information about the object to the user in augmented reality (AR). Many AR applications already exist that overlay textual and 2D-graphic information for similar purpose. In addition, various graphical applications in entertainment add augmented objects to real-world environments. This project will deal mainly with graphical (and, where possible, 3D) augmentations of the real-world and the augmentation should be done in real-time. Some example application areas include night vision, audio vision, X-Ray vision, enhancing visibility of threats in an environment etc. Microsoft ran a competition for serious applications proposals for the Hololens AR display and some of the winners are listed [HERE]. As an input the shape of the model captured using imaging and depth sensors will be used. Then the data might be post-processed or filter before blending into the real environment. Data from advanced sensors such as X-ray, audio etc. will likely not be available but might be simulated for the purposes of this project.
  7. Spatial perception in games [This is an implementation project for a 4th Year FYP and may not be suitable for an MSc Dissertation]: The objective of the project is to implement a simple game or several mini-games to test how different rendering styles and display techniques affect user performance at spatial perception tasks. Some simple examples are 3D versions of the classic games Pong or Breakout. Many variants of these have been implemented but a major challenge for the user is accurately guaging how far away an object is supposed to be in the z-direction (coming out/ going into the screen). Proposed solutions to enhance a sense of depth include shadows, focal blur with distance (depth-of-field), stereoscopy, size, motion, parallax etc.
    Pre-requisities: students must have taken (or be in the process of taking) a computer graphics module or have some experience in 3D graphics programming.

Dr. Ivana Dusparic


Please note: Due to project selection timelines, I am currently accepting MSc/MAI projects only.

I am open to supervising projects developing novel artificial intelligence techniques and applying these techniques in intelligent urban mobility and transport systems, and smart cities in general.

In particular, I am interested in learning-based agents and multi-agent systems, with particular focus on reinforcement learning, including deep reinforcement learning, transfer learning, multi-agent collaboration and self-aware systems . I am interested in applying these techniques to emerging urban transport models and their impact on cities, eg intelligent urban traffic control, car sharing, ride sharing, mobility as a service, multi-modal travel planning, learning-based personalization of travel etc. I am particularly interested in applications leveraging large amounts of diverse sensor data for learning-based optimization.

I am also open to proposals applying learning-based multi-agent techniques to management and optimization of other large-scale infrastructures - if you have an interest in learning-based optimization and have an application area in mind, let me know!
Warning: include(/users/staff/ebarrett/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/ebarrett/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Fergal Shevlin, Ph.D.

Room

Email

Lloyd UB.73

fshevlin@tcd.ie

Note!

    I feel that project ideas conceived by students are usually the most interesting. If you have any ideas related to the following then let's talk about them to see whether we can specify an appropriate project tailored to your own unique interests and abilities.

    Android Vision

    Projects in the general area of "Computer Vision" (viz. image processing and analysis,) implemented on the Android platform for mobile devices. Thus the programming language(s) required would be at least Java with possibly some C/C++.

    Mathematical Methods

    Projects in the general area of "Mathematical Methods for Computer Graphics, Computer Vision, Robotics, Physical Simulation, and Control" implemented using appropriate method libraries. The most appropriate programming languages are likely to be C/C++ or Python.
Warning: include(/users/staff/gibbons/StudentProjects/main.2014-2015.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/gibbons/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr Gerard Lacey

Room

Extension

TTEC

087 2396567

Project

I am currently a part time academic as I am currently working in a Trinity spinout www.surewash.com My main research areas are computer vision, robotics and augmented reality. My research focus is the development and empirical evaluation of mixed media solutions using real world problems.

Problem / Background

Augmented reality(AR) is the overlay of interactive graphics onto live video such that it reacts to the content of the video image e.g. selfie filters that track face movement. Mobile phones are becoming one of the main platforms for AR. This project focuses on the tracking of hands in mobile phone images for gesture recognition, content overlay and gaming. General purpose hand pose tracking is a complex problem but custom hardware solutions: www.leapmotion.com , www.usens.com and complex software libraries: www.manomotion.com are available.

One of the biggest challenges is achieving high speed and reliable segmentation of the hands against real-world backgrounds and under variable lighting conditions. The next main challenge is the identification of the fingers and matching them to a hand pose model. If the hand gesture problem can be constrained this may be simplified and good performance achieved.

Solution / Goal of the Project:

This project will aim to develop a mixed reality application that will allow someone to “try on a glove” using their mobile phone. The goals of this project are to:

  • Reliably segment the hands on a mobile phone
  • Recognise and track the orientation of the hands
  • Render a 3D glove model over the live video image aligned to the hands
  • Develop a solution for finding an accurate measure of hand size
  • Develop a prototype application on iOS or Android
  • Perform user testing and formal evaluation of performance

The application will be developed using www.unity3d.com and www.emgu.com .

Warning: include(/users/staff/gstrong/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/gstrong/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/haahrm/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/haahrm/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/hederman/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/hederman/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/htewari/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/htewari/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/huggardm/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/huggardm/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/iosifidg/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/iosifidg/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/jdukes/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/jdukes/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/jones/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/jones/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/kahmad/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/kahmad/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/kdawson/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/kdawson/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/leithdo/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/leithdo/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/manzkem/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/manzkem/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/mcglink/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/mcglink/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/millwoor/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/millwoor/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/mtemms/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/mtemms/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/omahony/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/omahony/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/onuallae/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/onuallae/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/osulldps/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/osulldps/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/osullica/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/osullica/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/owconlan/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/owconlan/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/ramcdonn/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/ramcdonn/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr. Rob Brennan

Senior Research Fellow, ADAPT Centre, School of Computer Science and Statistics.
Email:rob.brennan@scss.tcd.ie
My projects are in the areas of data quality, data governance and data value with an emphasis on graph-based linked data or semantic web systems. Please note that I am unlikely to supervise your own project ideas due to current commitments. These projects are only for MSc students.

Projects

TAKEN 1. Extracting Data Governance information and actions from Slack chat channels

Main contact: Dr Alfredo Maldonado(address corrected on 26/9/2017)
Data governance means controling, optimising and recording the flow of data in an organisation. In the past data governance systems have focused on formal, centralised authority and control but new forms of enterprise communication like Slack need to be leveraged to make data governance more streamlined and easier to interact with. However systems like Slack produce vast amounts of unstructured data that are hard to search or process, especially months or years later. Thus we need a way to extract the most relevant conversations in Slack and turn them into structured data or requests for specific data governance actions like a change in a data sharing policy. This project looks at ways to extract relevant conversations and turn them into data governance actions via an interactive Slack bot that uses machine learning and natural language processing to identify relevant conversations and then interjects in Slack conversations to prompt users to interact with a data governance system.
This project is conducted in collaboration with Collibra Inc., a world-leading provider of data governance systems.
Keywords: Natural Language Processing, Machine Learning, Python, Data Governance

NO LONGER AVAILABLE 2. Automated Collection and Classification of Data Value Web Content

Main contact: Dr Rob Brennan
Jointly supervised with: Prof. Seamus Lawless
This research aims to automate the collection and classification of discussions of data value (e.g. "How much is your data worth?", "Data is the new Oil!") on sites like Gartner or CIO.com. This will compliment our traditional survey of academic papers discussing data value managment. The project will attempt to identify from the web content: the most important dimensions of data value (eg data quality), metrics for measuring them, the different models of data value proposed by authors and applications of data value models.The research will explore new ways to classify and conceptualise the domain of data value. Ranking dimensions for importance is also an interesting potential challenge. the project may also consider how to best structure the conceptualisation of the domain for different roles or types of consumers.
Keywords: Information Retrieval, Natural Language Processing, Knowledge and Data Engineering

3. Adding W3C Linked Data Support to Open Source Database Profiling Application

Main contact: Dr Rob Brennan
Jointly supervised with: Dr. Judie Attard
The Data Warehousing Institute has estimated that data quality problems currently cost US businesses more than $600 billion per year. Everywhere we see the rise in importance of data and the analytics based upon it. This project will extend open source tools with support for new types of web data (the W3Cs Linked Data) and sharing or integrating tool execution reports over the web.
Data profiling is an important step in data preparation, integration and quality management. It is basically a first look at a dataset or database to gather statistics on the distributions and shapes of data values. This project will add support for the W3Cs Linked Data technology to an open source data profiling tool. In addition to providing traditional reports and visualisations we want the tool to be able to export the data profile statistics it collects using the W3Cs data quality vocabulary, and data catalog vocabulary. These vocabularies allows a tool to write a profile report as Linked Data and hence share the results with other data governance tools in a toolchain. This will be an opportunity to extend the use of this vocabulary beyond pure linked data use cases to include enterprise data sources such as relational databases.
Keywords: Knowledge and Data Engineering, Java programming, Linked Data, Data Quality

4. Ethical Web Data Integration

Main contact: Dr Rob Brennan
Jointly supervised with: Prof. Declan O'Sullivan
In an era of Big Data and ever more pervasive dataset collection and combinationhow do we know the risks and if we are doing the right thing? This project will investigate the characteristics and requirements for an ethical data integration process. It will examine how ADAPT's semantic models of the GDPR consent process can be leveraged to inform ethical decision-making and design as part of the data integration process. This work will extend the ADAPT M-Gov mapping framework.
Keywords:
Ethics, Knowledge and Data Engineering, Java programming

TAKEN 5. Automatic Identification of the Domain of a Linked Open Data Dataset (New 25/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
As the Web of Data grows, there are more and more datasets that are becoming available on the web [1]. One important challenge in selecting and managing these datasets is to identify the domain (topic area, scope) of a dataset. Typically a dataset aggregator (such as datahub.io) will mandate that minimal dataset metadata is registered along with the dataset but this is often insufficient for dataset selection or classification (such as the dataset types used by the LOD cloud).
The aim of this dissertation topic is to create a process and tools to automatically identify the topical domain of a dataset (using metadata, querying the dataset vocabularies and clustering using ML algorithms). Thus it will go beyond traditional Semantic Web/Linked Data techniques by using a combination of ontology reasoning or queries and machine-learning approaches. Given an input dataset from datahub.io, LODlaudromat or the weekly dynamic linked data crawl (http://swse.deri.org/dyldo/data/), the datasets should be categorised in a specific topical domain so that consumers can filter this large network according to their needs.
Keywords: Knowledge and Data Engineering, Machine Learning
[1] http://lod-cloud.net
Further Reading
[2] http://dws.informatik.uni-mannheim.de/fileadmin/lehrstuehle/ki/pub/SchmachtenbergBizerPaulheim-AdoptionOfLinkedDataBestPractices.pdf
[3] http://data.dws.informatik.uni-mannheim.de/lodcloud/2014/ISWC-RDB/
[4] http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/

6. Automated Selection of Comparable Web Datasets for Quality Assurance (New 25/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
Many open Linked Data datasets suffer from poor quality and this limits their uptake and utility. There are a now number of linked data quality frameworks, eg Luzzu[1], designed to address the need for data quality assessment and publication of quality metadata. However in order to apply some quality measures, e.g. "Completeness Quality"[2], it is necessary to have a comparable dataset to test against. For example, the comparable dataset could form a Gold Standard or benchmark which can be used to compare with other similar data.
This project will investigate the methods required to (1) identify the requirements for a comparable dataset based on a specific set of quality checks and a dataset to be tested, and (2) then use these requirements to find the best possible dataset to act as a Gold Standard from a pool of open datasets such as datahub.io. Example requirements may include matching the domain, ontology language, presence of specific axiom types, ontology size, ontology structure, data instances present and so on.
Keywords:Knowledge and Data Engineering, Data Quality
[1] http://eis-bonn.github.io/Luzzu/
[2] http://www.semantic-web-journal.net/system/files/swj773.pdf

7. Data Quality Dashboard (New 29/9/2017)

Main contact: Dr Rob Brennan
Jointly supervised with: Dr Jeremy Debattista
The Luzzu data quality assessment framework is a flexible, open source Java-based toolset for assessing the quality of Linked Data that is now being maintained by the ADAPT Centre at TCD. Luzzu supports semantic reporting of quality assessments by using the dataset quality vocabulary [2], the quality problem ontology and the Luzzu metric implementation ontology. However it is still a command-line tool and the semantic reports it generates are optimised for machine readability. In this project we will build a data quality dashboard that visualises the semantic outputs of Luzzu and makes it easy for quality manager or data stewards to infer the implications of a data quality assessment task.
Keywords:Knowledge and Data Engineering, Data Quality, User Interface Design
[1] http://eis-bonn.github.io/Luzzu/
[2] http://theme-e.adaptcentre.ie/daq/daq.html
Warning: include(/users/staff/ruffinm/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/ruffinm/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/sclarke/StudentProjects/main.2014-2015.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/sclarke/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/selawles/StudentProjects/main.2016-2017.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/selawles/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/sfarrel6/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/sfarrel6/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/smolica/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/smolica/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/sweber/StudentProjects/main.2014-2015.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/sweber/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/tangney/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/tangney/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr Tim Fernando

Room

Extension

ORI LG.17

3800

I offer projects on knowledge representation and/or natural language semantics. Apart from programming, some mathematical maturity would be useful to survey the research literature on formal methods in artificial intelligence. Below is a list of specific topics, but I am happy to discuss other interests you may have that are broadly related.

Timelines and the semiotic triangle

    Timelines (such as this) order events chronologically, while the semiotic triangle relates the world, language and mind. The aim of this project is to design timelines that distinguish between an event E in the world, a linguistic event S that describes E, and a reference point R representing a perspective from which E and S are viewed (following Reichenbachian accounts of tense and aspect).

Finite State Semantics

    How can we apply finite automata and transducers to represent meaning? More specifically, (i) how can we structure semantic information through the successor relation on which strings are based, and (ii) how far can finite state methods process that information? A project within this topic can take a number of directions, including intensionality, temporality, comics, and formal verification tools.

Frames and grounded cognition

    This project examines the role of frames in a general theory of concepts investigated by, for example, the DFG Collaborative Research Centre 991: The Structure of Representations in Language, Cognition, and Science. The focus is to test the idea of grounded cognition (e.g., Barsalou) against various formulations of frames. Related to this are notions of force (from Talmy to Copley) and image schema in cognitive semantics.

Constraint satisfaction problems and institutions

    The goal of this project is to formulate Constraint Satisfaction Problems (CSPs) in terms of institutions in the sense of Goguen and Burstall, analyzing relations between CSPs category-theoretically as institution (co)morphisms.

Textual entailments from a temporal perspective

    Computational approaches to recognizing textual entailments need to be refined when dealing with time. Temporal statements typically refer to a temporal period that cannot simply be quantified away by a Priorean past or future operator. The challenge facing this project is how to incorporate that period into approaches such as Natural Logic that have thus far ignored it.

Bounded granularity and incremental change: scales

    This project is for someone particularly interested in linguistic semantics. The project's aim is to examine whether or not granularity can be bounded in accounts of incremental change in natural language semantics involving scales (e.g. Beavers) as well as mereology and mereotopology. Attention will be paid to how to model refinements (and coarsenings) of granularity.

Monadic Second Order Logic for natural language temporality

    This project is for someone particularly interested in logic. A fundamental theorem in logic due to Büchi, Elgot and Trakhtenbrot identifies regular languages with the models definable in Monadic Second-Order Logic (for one binary relation). The aim of this project is to explore applications of this theorem to natural language temporality --- including tense and aspect.

Warning: include(/users/staff/vjcahill/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/vjcahill/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/vkoutav/StudentProjects/main.2015-2016.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/vkoutav/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648
Warning: include(/users/staff/vogel/StudentProjects/main.2017-2018.shtml): failed to open stream: Permission denied in /srv/scss/StudentProjects/index.php on line 648 Warning: include(): Failed opening '/users/staff/vogel/StudentProjects/plink.shtml' for inclusion (include_path='.:/usr/share/php:/srv/scss:/srv/scss/Local') in /srv/scss/StudentProjects/index.php on line 648

Dr. Jason Wyse

Updated 09/10/17. email: wyseja@tcd.ie

Areas of interest to me are classification, model selection and sequential inference, all with a Bayesian flavour. Generally projects will involve students reviewing appropriate literature, implementing methods extending/exploring those in the literature and evaluating methods critically i.e. what are advantages/disadvantages, how could methods be improved further? The projects below are suitable for students in the data science strand of the MSc.


Dynamic logistic regression

Logistic regression is a staple of the analysts toolbox. Usually, analysts will want to carry out a retrospective analysis using logistic regression, i.e. gather all the data, then fit the model. This project will examine dynamic logistic regression where beliefs are updated as new data arrives in an online fashion. A typical application domain would be for streaming data. This project will review the various approaches available for dynamic logistic regression, but will specifically focus on using Sequential Monte Carlo (SMC) to carry out Bayesian sequential inference. In order to complete this project you will need a good knowledge of simulation based inference, MCMC in particular. You will also be very familiar with Bayesian methods and models.

Bayesian logistic regression at scale

Big data has brought us to a limit in terms of ability to analyse large datasets. Classic models that have been well studied and understood are still preferable to use, but we now face challenges in how these models can be used on large datasets. In some instances, computing a likelihood can be extremely challenging. This project will investigate methods and procedures in the statistics literature that are available for logistic regression. These can comprise Markov Chain Monte Carlo (MCMC) methods for tall data and methods such as coresets. In order to complete this project you will need a good knowledge of simulation based inference, MCMC in particular. You will also be very familiar with Bayesian methods and models. References:
  • Bardenet, R, Doucet, A and Holmes, CC et al., (2017). On Markov chain Monte Carlo Methods for Tall Data. Journal of Machine Learning Research, 18 (47), 1-43.
  • Huggins JH, Campbell, T and Broderick T, (2017) Coresets for Scalable Bayesian Logistic Regression. ArXiv preprint : arXiv:1605.06423



Last updated 19 October 2017 by .