“Ripe for the Picking? Dataset Maturity Assessment based on Temporal Dynamics of Feature Definitions”

Our paper that has been recently accepted for publication in the International Journal of Geographical Information Science investigates a thus-far unexplored aspect of spatial data of particular relevance to VGI usability (OSM as a case) – the differences between (geometric) feature definitions within a feature class. I am particularly pleased by this paper as it is the outcome of a short, but intense and very satisfying collaboration with Stephen Maguire, a Masters of IT (Spatial) student here at the University of Melbourne. Congratulations Stephen!

feature_developmentlon_mapmel_map

From the abstract:

Map databases traditionally capture snapshot representations of the world following strict data collection and representation guidelines. The content of these map databases is often assessed using data quality metrics focusing on accuracy, completeness and consistency. The success of volunteered geographic information, supporting evolving representations of the world based on fluid guidelines, has rendered these measures insufficient. In this paper, we address the need to capture the variability in quality of a map database. We propose a new spatial data quality measure — dataset maturity — enabling assessment of the database based on temporal trends in feature definitions, specifically geometry type definitions. The proposed measure can be (1) efficiently used to identify feature definition patterns reflecting community consensus that could be formalised in community guidelines; and (2) deployed to identify regions that would benefit from increased editorial activity to achieve greater map homogeneity. We demonstrate the measure based on the content of the OpenStreetMap database in four regions of the world, and show how the proposed dataset maturity measure captures a distinct quality of the datasets, distinct to data completeness and consistency.

Read the full thing in pre-print, supplementary materials here.

Maguire, S., & Tomko, M. (accepted 2017). Ripe for the Picking? Dataset Maturity Assessment based on Temporal Dynamics of Feature Definitions. International Journal of Geographical Information Science. doi:10.1080/13658816.2017.1287370

Advertisements

Our paper on D-Log accepted to Pervasive and Mobile Computing

A new output from our TRIIBE project is now available: “D-Log: A WiFi Log-based differential scheme for enhanced indoor localization with single RSSI source and infrequent sampling rate“, by our TRIIBE team (Yongli Ren, Flora Salim, Martin Tomko, Brian Bai, Jeffrey Chan, Kyle Qin and Mark Sanderson) is now online here. It presents a way to post-process large amounts of single AP+RSSI fixations to better estimate the approximate location of users in an indoor environment. I will publish a pre-print version soon here.

dlog_aim

 

(Re)Starting at The University of Melbourne

“I am looking for PhD students with topics related to computational urban GIScience” is one of the main pieces of content I have updated on this website. Being back in Melbourne is exciting, and I am keen to sink my teeth in some ideas that have been germinating for a while. Please, spread the word, or get in touch. Note that only students with an outstanding profile from their Masters studies may be eligible for local funding. But if you really see yourself in the profile outlined here, get in touch anyway.

Exploring patterns of individual transcontinental oscillation between Australia and Europe. A subjective study.

The cryptic title hides a prosaic content: I will be wrapping up here in Zurich by the end of the year and I will be returning to the University of Melbourne and the Geomatics group at the Department of Infrastructure Engineering, from January 2016. I am looking forward to rekindling my existing ties in Melbourne and developing new ones, and I hope to continue my collaboration with my amazing Swiss colleagues.

 

And I am looking for some great PhD students interesting in exploring urban GIScience with me…

Information Retrieval. What are the temporal trends?

Part II of series, that started by looking at GIScience (my core interest)

This is, as far as I know, the first publication of temporal trends in Google Metrics.

For the last three years, Google Scholar has been releasing their Google Scholar Metrics. Recently, they released the 2015 batch.

These Metrics provide an insight into the most successful/impactful publication outlets for individual disciplines (and subdisciplines) and also allow one to explore the most cited papers by their h5 index (h5 for a venue/author is the n (number) of papers with at least n citations, in this case for a 5 year period).

There are problems with the way these data are collected (not all venues are monitored, and the coverage may not be 100%, see here). The coverage has been slowly improving over the years. While Computer Science is relatively well covered, some conferences/workshop published in the well respected Springer Lecture Notes in Computer Science series are not monitored by Google and the individual volumes can not be well sorted into disciplines anyway.

Anyway, this project has been running for 3 years now and we can start looking at some trends (without any statistical insights, for this the series are too short). It is worth to note some separation of the journals into tears ( purely visually). Note that this may not say anything about the quality of the venue itself but maybe the audience is smaller/more niche).

It would be worth to compare these trends with the sibling disciplijne of data mining/knowledge discovery, where many venues are used by both communities.

Also note the discussion of the h5 index in here (Vrettas and Sanderson, 2015), suggesting that the size of the venue tends to lead to an over-inflation of its h5index. I would be happy to include additional venues into this, and share data for deeper investigation. I acknowledge the seed list of IR venues from @IR_oldie for this analysis.

I am looking forward to comments!

GIScience Google Metrics trend

References:

Vrettas, G. and Sanderson, M. (2015), Conferences versus journals in computer science. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23349

Acknowledgment:

The R Hadleyverse for rvest, tidyr, stringr, dplyr and ggvis! Great little problem to learn these!

GIScience in Google Metrics. What are the temporal trends?

This is, as far as I know, the first publication of temporal trends in Google Metrics.

For the last three years, Google Scholar has been releasing their Google Scholar Metrics. Recently, they released the 2015 batch.

These Metrics provide an insight into the most successful/impactful publication outlets for individual disciplines (and subdisciplines) and also allow one to explore the most cited papers by their h5 index (h5 for a venue/author is the n (number) of papers with at least n citations, in this case for a 5 year period).

There are problems with the way these data are collected (not all venues are monitored, and the coverage may not be 100%, see here). The coverage has been slowly improving over the years. GIScience is, however, still only covered in a patchy way. In particular the conferences (GIScience and COSIT, but also SDH and smaller workshops) are not covered as the Springer Lecture Notes are not monitored by Google and the individual volumes can not be well sorted.

Furthermore, some journals from the field are not covered either: JOSIS is not that new anymore, but together with Spatial Cognition and Computation they have troubles to meet at least 100 publications a year so far (see here again for coverage parameters). I assume this is the case for IJLBS as well.

Anyway, this project has been running for 3 years now and we can start looking at some trends (without any statistical insights, for this the series are too short). It is worth to note some separation of the journals into tears ( purely visually). NOte that this may not say anything about the quality of the venue itself (I consider EPB to be an excellent journal with great content, but maybe the audience is smaller/more niche).

It would be worth to compare this with past work of, say, Kemp, Kuhn and Brox (2013) [here], performing a Delphi study of GIScience journals.

Also note the discussion of the h5 index in here (Vrettas and Sanderson, 2015), suggesting that the size of the venue tends to lead to an over-inflation of its h5index. I would be happy to include additional venues into this, and share data for deeper investigation.

I am looking forward to comments!

GIScience Google Metrics trend

References:

Vrettas, G. and Sanderson, M. (2015), Conferences versus journals in computer science. Journal of the Association for Information Science and Technology. doi: 10.1002/asi.23349

Acknowledgment:

The R Hadleyverse for rvest, tidyr, stringr, dplyr and ggvis! Great little problem to learn these!

More kudos for AURIN

It is great to observe from distance how AURIN is growing in recognition in Australia. Yesterday, AURIN became the Merit recipient in the Government category of the Victorian 2015 IAWARDS . Congratulations to all that helped getting this project where it is now – from vision to realisation. And in particular, to all the users!