Author Archives: Stefan Kasberger

It will be about knowledge production on wikipedia

So, after some talks and research, the topic of my bachelor thesis has been narrowed down a little bit. Together with Fabian Flöck I will look into the mechanisms of collaborative knowledge production on the web, specifically on Wikipedia. Collaborative online plattforms with masses of contributors offer many new phenomena to look at, which has never before been accessible for researchers. Questions like: Who defines what is right? By which criteria will a contribution be accepted or reverted? How does the organization of knowledge and their contributors evolve over time? Are there people, who “own” knowledge? Why do people agree or disagree? How do disputes evolve? Which criteria leads to better quality of the article?

How knowledge appears, evolves and gets communicated are questions coming deeply from my philosophical hassle about what truth is. Especially the last year, as part of my philosophical courses I took at the IFZ in Graz (Einführung in die Technikphilosophie, Technik – Ethik – Politik), these question reflamed and matured through engaging with thoughts from Heidegger, Latour and Nietzsche.

The specific questioning is of course still open, but it will be something around how do editors in Wikipedia work more or less good together to write encyclopedia articles. Looking for specific patterns (they are everywhere¹) on the evolution of an article and the contributions of editors behind it.

Lots of questions now in my head. So next I will try to dive a little bit into Wikpedia and existing literature about it to get an understanding of actual questions and the state of research.

Sources

Thumb
Title: Martin_Heidegger_for_WP
Author: Herbert Wetterauer
Source: https://commons.wikimedia.org/wiki/File:Martin_Heidegger_for_WP.jpg
License: Creative Commons Attribution-Share Alike 3.0 Unported

How I set up my scientific work environment

How to set up your scientific project on the PC? Here is how I do it – from knowledge management over GitHub to this website.

I love preparing projects – maybe this is the most loved part of every project for me. Thinking about all the crazy stuff you will do in the future, letting thoughts fly around and connecting all kinds of ideas and perspectives together. Beeing creative about practice. But maybe it is all about trying to prevent troubles in the future.

The process described here is always at the beginning, long time before I know what I will do exactly in my research, course or thesis. More specific processes like literature research or how I use iPython for data analysis will follow.

Knowledge management

It always starts with organizing knowledge. Writting every thought together and trying to find a structure for it. Some thoughts start on a paper, some directly in my .org file.

Mostly I seperate them in those sections:

  • ToDo for tasks of course. Later on in big projects there are subsections seperated in sequences like now, next week, after publication etc.
  • Timeline for general planing and scheduling. Some planing and scheduling for very important process-steps (like research) will be saved in the specific section of it. This decision is always a checks and balance between having central overview and having it where you work on something specific.
  • Notes for saving all kinds of thoughts and facts.
  • Documentation for the ongoing documentation of my work, like content for my next blog posts as well as schedules for it.
  • Research for all I need to save research specific. Questions around the hypothesis, data structure, notes on results and so on, but just for a specific problem.
*.org-file for knowledge management

*.org-file for knowledge management

All this is rather flexible and changes from project to project a little bit. But why do I not share my *.org-file on a regular basis? It is sometimes really messy, things for private purpose or with security restrictions are in, or german written notes. Basically, it is very important and always under change and so the effort to keep it always able to be published would be way too many effort for me right now.

GitHub

Second step is to create an own folder and initialize in it as git repository for later uploading onto GitHub. As usual, I start with creating the folder structure at my local harddrive. It is not always the same, but seems like some things reoccure.

  • applications: Sometimes, applications have their special files, like project files or templates.
  • code: One subfolder for the sourcecode of every used programming language, like python, shell, c and so on.
  • data: Data collected and created for quantitative analysis. Subfolders are raw (for raw data) and different file-types like json, csv, shape, etc (seperation also inside raw folder). For sharing the data later on, the file-size and file-type is crucial. GitHub is working line based, so it can not create incremental updates of binary files and the file-size is limited to 100mb. So for big datasets I recommend other repositories, like Figshare or domain specific ones.
  • docs: In here is all the literature, notes about it and all other related documents before I save it in my Zotero library. Most documents in here will never be published cause of copyright restrictions! I will write an own post about my literature research process with zotero and how my citation and notes archive work.
  • images: Content depends on usage, but mostly figures created through the analysis. A raw folder is inside for figures which will not be published afterwards and an final folder for the published ones. Also pictures for website or other documentation purpose can be saved.
  • reports: Inside are all reports, mostly written in LaTeX. Every report is one subfolder (except there is only one) with a subfolder for images.
local folder structure

local folder structure

After creating the empty folder structure, I initialize a git-repo (git init) and add the README.md (markdown-template). In there I write all basic informations about the thesis, how to participate, license terms and the requirements necessary for it. Soon after this, the folders get filled with papers, some data and project files of software used – always seperated in raw and processed/edited by me. So, when the README.md is written, the LICENSE information is added and the basic structure created, I upload the first set of files onto my GitHub-repository. But beware of copyright issues: Control if you own the rights to share your content. If not gitignore helps you to not add files to your git repo.

Root folder of github repository with rendered README.md

Root folder of my GitHub-repository with rendered README.md

openscienceASAP

The last step is to add all information to an overview page here at openscienceASAP. This is the central point of contact for the scientific project and connects all dots, from blog posts over data and sourcecode to persons and their social media streams. Before adding the overview page (template), a new category as child of the research category will be created at the backend, so every post can be found via the category functionality (Category Bachelor Thesis Stefan Kasberger). Right now, it is still very empty, but it will be updated over time with the actual status.

Overview page of bachelor thesis at openscienceASAP.org

Overview page of bachelor thesis at openscienceASAP.org

Copyright

And finally: A crucial point is to think about copyright right from the beginning. For me, easy usage of my content is very important, but also that it is cited when it is reasonable. By default I use the MIT license for sourcecode, self-created data is under public domain and published text under CC BY 3.0 AT.

My residence at the GESIS institute in Cologne

The next four months I will live in Cologne and write my bachelor thesis at GESIS. Why that and what is this? Here are the answers.

GESIS Köln

GESIS Köln by Stefan Kasberger (CC BY 3.0 AT)

So, here I’m, living since two weeks in Cologne. Whenever I told friends, that I will go for my bachelor thesis to another country, most people asked me “Why that?”, cause it’s not that usual. So, here my answers:

1. My advisor is Markus Strohmaier (@mstrohm), the scientific director of the Computational Social Science department and lead of the Data Science team here. He is a well reputated and active researcher in the field of data science with focus on social questions and an interdiscplinary approach. Our connection goes back to some talks about Data Science, especially Network Science back in 2012, and his drive to solve problems and focus on research impressed me from the beginning – unfortunately that is something rare in Austria. So, when he surprisingly asked me if I want to come to Cologne for my bachelor thesis, the answer was quite easy.

2. The GESIS – Leibniz-Institute for the Social Sciences is the largest infrastructure institution for the Social Sciences in Germany. It is based in the heart of Cologne, not far away from the Kölner Dom and the main train station, something very pleasant. I have joined there the Data Science team, where problems at the intersection of Computer Science and Sociology are being adressed. I’m looking forward to learn a lot from the many professionals here and hopefully it gives me the opportunity to get deeper insights into how research is done and organized in 2014.

3. Cologne always had a very good reputation: liberal, open and very hospitable. So again, I took the opportunity to live in a city outside Austria, and my amazing experiences with city life and german culture the last semester in Berlin made the decision quite easy.

So, seems like all of the more abstract requirements stated in my first post are fullfilled. So let the games begin!

To be or not to be a scientist

The next 4 months will be fully dedicated to my first scientific work – writing a bachelor thesis in environmental systems science with focus on geography.

This is a very important step for me, here is why. When I thought about my bachelor thesis, it was always about “which method(s) I want to use”. This got more and more precise in the last two years, through learning machinge learning and data science methods mostly by myself and investing a big amount of spare-time in this. So I was pretty sure it will be something around machine learning and/or network science, maybe also about understanding natural language through computers – cause it makes fun and offers a sense of exploration and innovation to me.

The topic itself is secondary, but still really important. It should deepen my knowledge in some of the areas I’m interested in, like geo-politics, urbanism, poverty, openness, knowledge creation, resources or migration. It should be nothing less than relevant, potentially emancipatory and contribute to a more just society.

And of course, to be able to make it open is an important point too. Plans are to share everything all along the way regularly. I will blog here frequently about my struggles, experiences and improvements and try to get a better understanding of how to open science. And as always, an own GitHub repository will be created of course.

Besides my scientific interests, the whole activity has an even more important point for my life as a whole. With my intense dedication to a bachelor thesis (4 months full time with the goal of a publication) I want to get a hands-on experience how the life of a researcher is nowadays and create something I can build upon in the future. At the end everything surrounds around the question: Do I want to live the life of an researcher in 2014? Until now, what I will do after finishing my study is still an open debate: research, working or changing my field of practice again totally.

And hopefully also my english will improve too. 😉

Open Science / Content Mining Hackathon @ Metalab Wien

Wir laden am 5. Juni ins Metalab zum gemeinsamen Hacken ein. Dabei soll Wissenschaft einfacher zugänglich und nutzbar gemacht werden – zusammen, kollaborativ, offen und selbstorganisiert.

Gemeinsam hacken, also ein Problem mit teils unkonventionellen Methoden lösen, das ist das Ziel eines jeden Hackathons. Beim Open Science / Content Mining Hackathon geht es demnach um Probleme in der Wissenschaft und wie diese offener gestaltet werden kann. Wie kann wissenschaftliche Information, wie Publikationen oder Daten, einfacher zugänglich gemacht werden, wie vereinfache ich die wissenschaftliche Arbeit durch Tools und Ähnliches.

Mitmachen und Wissenschaft öffnen!

Bring deine Ideen ein und suche Gleichgesinnte dazu – egal ob Info-Visualisierungen, eine Software oder das Erarbeiten einer Anleitung für JungwissenschaftlerInnen. So bilden sich vor Ort Ideen und kleine Gruppen. Es sind alle interessierten Menschen herzlich Willkommen, und freuen uns besonders über Teilnehmerinnen. Den Projektideen sind dabei keine Grenzen gesetzt – egal ob du was am Computer oder mit Design oder mit Legosteinen machen willst. Es sind GeisteswissenschaftlerInnen genauso gefragt wie FreidenkerInnen oder ProgrammiererInnen. Ideen können auch schon vorab geteilt werden, dazu gibt es ein Etherpad, welches auch vor Ort genutzt wird.

Kollaborieren / Austauschen / Lernen.

Gruppe Content Mining

Eine erste kleine Untergruppe rund um Peter Murray-Rust hat sich schon gefunden. Sie beschäftigt sich mit Content Mining von wissenschaftlichen Publikationen. Peter ist Chemiker in Cambridge, aktueller Shuttleworth Fellow und Mitgründer der Open Science Arbeitsgruppe der Open Knowledge Foundation. Vor Ort wird er mit anderen ein bisschen an seinem Projekt contentmine weiter hacken. Dabei geht es um die Extraktion von Informationen aus einer Unmenge an wissenschaftlichen Publikationen wie PDF’s und Websites. Peter ist auch beim anschliessenden OKF-AT Meetup zu Gast, wie auch am Dienstag im Zuge einer Lecture des FWF.

Event Details

Wann: 5. Juni 2014, Start 9:30 Uhr, Ende 18:30 Uhr
Wo: metalab Vienna, Rathausstraße 6, 1010 Wien (Google Maps)
Registrierung: Bitte vorher via MeetUp oder Email (office [ett] openscienceasap dot org) anmelden.

Der Raum und die Veranstaltung ist für alle offen und kostenlos und vor Ort gibt es billige Getränke. Ansonsten ist für die Versorgung selber zu sorgen. Vermutlich werden einige Leute unter Tags etwas Bestellen.
Sprache ist sowohl Englisch wie auch Deutsch. Wer etwas am Computer machen möchte, bitte den eigenen Laptop mitnehmen. Bei weiteren Fragen, einfach via Mail melden: office [ett] openscienceasap dot org

Ablauf

9:30 Kaffee und Kuchen
Gemüticher Tagesstart mit Kennenlernen
10:00 Begrüßung und Einleitung
10:15 Brainstorming:
Ideen sammeln. (vorher via Etherpad)
11:00 Pitchen:
Projekt-Ideen präsentieren
11:15 Hacken:
Projekte definieren und mit Umsetzung beginnen
13:00 Pause
14:00 Hacken
17:45 Show & Tell:
Ergebnisse präsentieren
18:00 Ende
19:00 OKF-AT Open Science Meetup zu Content Mining.

Partner

Der Hackathon wird von der Open Knowledge Foundation Österreich und openscienceASAP organisiert. Besonders bedanken möchten wir uns beim Metalab für die Räumlichkeiten und die Unterstützung.

Open Knowledge Foundation LogoMetalab