Air-Pocalypse, and a DATA2022 Conference Amidst Wildfires
Data 2022 was a pleasant experience. The journey to the conference will, however, inspire fear in our hearts for years to come! Our flight to Lisbon was canceled—due to the strikes and staffing problems in the airports in Europe—and the tickets for its replacement were lost somewhere in Lufthansa’s databases. Let’s just say that it took us a combined hour and a half of calls to solve everything and we arrived at our gate 6 minutes before the closing time. Phew! 😅
Energized by adrenaline and full of optimism, we arrived at the capital of a country struggling against a major heatwave. Several wildfires swept across Portugal forcing the evacuation of hundreds of locals, some just kilometers away from the city! Visit Reuters.
Fortunately, the conference went smoothly and was quite well-organized, providing some last-minute online viewing opportunities for the attendees that were a bit less lucky than us and didn’t manage to arrive in person. And what a conference it was, covering the topics of data quality, machine learning, and automatic detection of cheaters and crime-doers, both in academic and in industrial applications! We definitely got a lot of inspiration for the future talks we could give on next year’s edition! 🤓🤓🤓
The first speech of the event was already an interesting and truly European one: a panel discussion about advantages and disadvantages of data- vs model-based approaches to AI and Machine Learning, held by experts from Italy, France, Germany, and Poland. Some of the thoughts were directly mirroring our concerns for developing the famous Happiness 2.0 Model, currently planned for development in late 2022!
It was also reassuring to learn that it’s not just me, and really a lot of developer’s time is spent on trying to understand the code that is already there 😅 Approximately 58 % of work time, to be precise. All according to the research of a fellow Czech person attending the conference, future Ph.D. Robert Husák, who managed to win the Best PhD Project award! Warm congratulations from Showmax!
Other talks that sparked our interest:
- Detecting Bots in Social-Networks using Node and Structural Embeddings by Agata Skorupka, who decided to put aside the standard Natural Language Processing algorithms used in tandem with investigation of user metadata, and chose to utilize the benefits of graph theory instead. In principle, they mapped the net of followers for each profile, and applied node-embedding algorithms that assigned the graph nodes to points in a low dimensional space, allowing the data to be easily processed. In effect, graphs of real human users become visibly different from graphs of bot users. Such an approach is potentially directly transferable to our own investigations, such as the one where we verified whether some users may try to obtain multiple free trials at Showmax (btw, the answer was: mostly no). Moreover, the speech won the Best Presentation award of the conference, so we strongly encourage you to take a look at the publicly available preprint!
- Working Efficiently with Large Geodata Files Using AdHoc Queries by Pascal Bormann, who used techniques that resemble how Loki—the log aggregation system used at Showmax—works internally.
- Clustered Majority Judgement by Elio Masciari, who presented that becoming a dictator is the only possible outcome of a poorly-chosen voting system during a cocktail party. We’re not sure how this one relates to Showmax, but we will find a way to apply it.
- A Model Based on 2-tuple Linguistic Model and CRITIC Method for Hotel Classification by Ziwei Shu, a ranking method eliminating the subjectiveness of scale and combining several criteria that people base their opinion on.
- Enhancing the Provision of Labour Market Intelligence Using Machine Learning by Aleksander Bielinski, where he mines textual information from a job-search portal in order to monitor trends in the job market in local areas in quasi-real time.
- State of the art GDPR assessment techniques presented in Compiling Open Datasets in Context of Large Organizations while Protecting User Privacy and Guaranteeing Plausible Deniability by Igor Jakovljevic. It turns out that some of them are actually used at Showmax!
- A Hybrid Approach for Product Classification Based on Image and Text Matching by Christoph Brosch about automatic labeling of products into a large variety of categories, a problem that Showmax also encounters with its detailed taxonomy of tags on assets.
- Modeling of Efficient Graph-Aware Data Storage Using DNA by Asad Usmani – did you know DNA can be used for long-term data storage? The future is now!
All other great talks and papers can be found on the official webpage of the DATA2022 conference.
Inspiring as the talks were, we live off of more than just knowledge. Portuguese cuisine truly delighted us with its emphasis on consumable forms of happiness. The southern sun turned our skin into various shades of cinnamon, and the Porto wine made all discussions flow more naturally, especially with those conference attendees that met us in our hotel lobby.
We can only hope that we’ll reenter the world of conferencing next year, this time in the role of presenters! Last but not least, we hope that we’ll manage to return to the more mild climate conditions of Prague without further flight cancellations! 🤞🤞🤞