Open Password – Friday January 21, 2022
#1019
Conference AI-SDV 2021 – Search – Data analysis – Visualization – Knowledge processing – Bassam Mokbel – Christoph Haxel – Vanessa Lage-Rupprecht – Marc Jacobs – Fraunhofer SCAI – Intuitive mind maps – Verification of user input through knowledge databases – Human-in-the-loop Approach – Competitive Landscape – Social Graph – Angela Bauch – Biomax Informatics – AILANI – centredoc – Deep SEARCH 9 – Harald Jenny – Technology Landscaping – averbis – Francisco Webber – Cortical.io – Semantic Folding – Data Representation for Machine Learning – Dolcera – Klaus Kater – Competitive Intelligence – Insight Apps – Types of Stakeholders – Update Cycles – Blind Spots – Stefan Geißler – Kairntech – Technical Vocabularies – ML Model Training – Data Scientists – Search Technology Inc. – Lighthouse IP – Mazahir Bhagat – Canadian Intellectual Property Office – Patent Landscape Maps – Derwent Technologies – Jay Ven Eman – Access Innovations – Digital Word Processing – Synonyms
dpa – Facebook News – Meta – Christian Röwekamp – Frank Rumpf – dpa-Infocom – Ad Alliance – Podcasts – Mobile 360° – annalect – Podcast User Study – Smartphone app – Smart Speaker – Advertising – Signal in the test – Telegram in the test – Security – Privacy – PSW Group – Usability – Terms and Conditions – WhatsApp
I
Title:
Experience report AI-SDV 2021: On the fronts of search, data analysis,
visualization and knowledge processing – By Dr. Bassam Mokbel
Cooperations: dpa takes over the curation of Facebook News
III.
Media use: 27% of Germans listen to podcasts every week
IV.
Signal and Telegram in the test:
Overall grade 1 for Signal – Telegram deficits in security and privacy
Experience report AI-SDV 2021
On the fronts of search, data analysis,
visualization and knowledge processing
By Dr. Bassam Mokbel*
This year’s conference AI-SDV (“The Artificial Intelligence Conference on Search, Data and Text Mining, Analytics and Visualization”) took place virtually and offered visitors a two-day program with lectures on innovative technologies and applications for data analysis, search, -visualization and knowledge processing were presented. Dates for product presentations and networking were also on the agenda. In my opinion, the aim of the event was a professional exchange between researchers, IT managers, developers and experts from the medical and pharmaceutical industries. However, service providers and software manufacturers should also establish contacts with interested customers.
Below I will briefly summarize all the specialist lectures and, if necessary, note a personal impression of them. I will not summarize the product presentations, but all presentation slides including the product spotlights can be viewed on the event website.
_____________________________________________________
Intuitive mind maps and verification of user input through knowledge bases .
_____________________________________________________
The lectures on the first day of the conference began with a welcome from the organizer Christoph Haxel. In the first specialist lecture entitled “Ping Pong – Playful Knowledge Transfer”, Vanessa Lage-Rupprecht and Marc Jacobs from Fraunhofer SCAI presented a method of how ontologies for knowledge databases in companies and other institutions can be built and expanded in a user-friendly manner. The speakers cited improving decision-making processes and preventing a loss of valuable knowledge when employees leave as the central motivations for this. The core idea was based on the consistent use of intuitive mind maps in user interfaces and the ubiquitous completion and verification of user input through knowledge bases combined with AI.
My positive impression was that the researchers are taking the human-in-the-loop approach further and innovatively beyond the previous state-of-the-art in knowledge processing.
_____________________________________________________________________
Connection of heterogeneous data directories and consideration of ontology knowledge in the search function.
_____________________________________________________
In the second specialist lecture, Angela Bauch from Biomax Informatics presented the AI-supported service “AILANI” under the title “AILANI for clinical competitive landscaping”. This enterprise-oriented application offers an intelligent semantic search through numerous knowledge sources and bibliographies in the medical and pharmaceutical sectors with an AI-based question-answering search interface. The extensive connection of heterogeneous data directories and the consideration of ontology knowledge in the search function are the essential foundations of this service. Integration of customer-specific documents and font recognition in scanned documents are also planned. In particular, “competitive landscaping” was presented as an important use case for the search application. The user would like to gain an overview of market processes in the pharmaceutical domain, identify key opinion leaders and potential competitors and collaborators, and record their activities.
I was particularly interested in the integration of a “social graph”, which visualizes the connection between specialist authors and can thus give literature research an exciting additional dimension.
This was followed by three product presentations for intelligent knowledge databases and search platforms: “AILANI” (Biomax Informatics), “RAPID 5” (centredoc) and “Deep SEARCH 9 Sentinel” (Deep SEARCH 9).
_____________________________________________________
With “Technology Landscaping” integration of many heterogeneous data types and sources.
_____________________________________________________
Harald Jenny, director of the Swiss cooperative company centredoc, presented his experiences from the last seven years in developing his own business intelligence tools under the title “Integrated Artificial Intelligence – A Factory Progress Report”. The focus was on the desired “Technology Landscaping”, which, as I understand it, refers to the creation of a comprehensive overview of a desired technology from available data. Jenny highlighted the technological challenges of the individual steps – from generating a correct semantic abstraction of technology-related search queries to data storage and indexing to analyzing and displaying the hits. He emphasized the strategically important collaboration with averbis, a German provider of AI-based text mining, which has contributed an important part to the further development of the software.
What was impressive to me was the integration of many heterogeneous data types and sources.
_____________________________________________________
Data coding reduces the required computing power to a fraction.
_____________________________________________________
Francisco Webber from Cortical.io presented approaches to overcome the current challenges of text analysis under the title “Semantic Folding – efficiency is the new precision”. He described fundamental problems in information technologies such as the increasing stagnation in the further development of processors. However, such further development would be necessary in order to be able to adequately deal with the rapid increase in available data volumes. Elaborate generic text analysis models may be inappropriate in the context of specialized applications because they tend to have a very general understanding of language. On the other hand, the training of many application-specific machine learning models throughout the industry entails an immensely increasing need for computing power and human annotation effort.
Webber presented “Semantic Folding” as a possible solution, an extremely efficient way of representing data for machine learning (ML) on texts. Through sparse and algebraically favorable data coding, aggregation operations are accelerated so that learning algorithms can get by with a fraction of the computing power. Finally, he named numerous promising areas of application for the new technology.
This idea of coding really surprised me, so I wanted to follow the scientific publications on the topic and try out the methods myself.
The patent search platform “Dolcera” was presented as a further product presentation.
_____________________________________________________
Against the neglect of external data through highly personalized “insight apps”.
_____________________________________________________
In “The secret of successful CI: precise targeting + immediate discovery”, Klaus Kater from Deep SEARCH 9 gave an insight into the in-house search technology for “Competitive Intelligence” (CI) in the context of research and development. In industry, fewer and fewer resources are being used to look at external sources of information, among other things because processing is becoming more difficult due to the availability of large amounts of data. This represents a major risk, especially if external influences on your own company are not noticed at all or are noticed too late. In order to counteract this, a large amount of information must first be collected from heterogeneous sources. In addition, the right information must reach the appropriate stakeholders in the customer’s organization. For this purpose, Deep SEARCH 9 offers very individual “Insight Apps”, which are adapted for certain types of stakeholders. In addition, he described very fast update cycles (in the range of hours) in data collection as a crucial requirement.
What I particularly remembered from the lecture was that there are increasingly ‘blind spots’ that are not covered by various search engines or registers. For example, monitoring for clinical studies from Asia is often inadequate given the existing language barrier.
_____________________________________________________________________
Machine learning training options “on site”.
____________________________________________________
In “AI support for creating and maintaining vocabularies”, Stefan Geißler from Kairntech described the in-house technologies for collaboratively maintaining various types of specialist vocabulary in thesauri, automatically enriching them with AI & ML methods and enabling user-friendly ML model training. On the one hand, modern AI-based language processing methods offer great potential to support human maintenance of knowledge databases through automatic suggestions and term recognition. On the other hand, domain experts feel the need to be able to easily train ML models for the recognition of specialist vocabulary themselves, without having to rely on data scientists. Kairntech offers such options in its own products so that incorrect or missing term recognition can be corrected by the respective user.
The above approach seems extremely promising to me. I can imagine that this type of interactive data maintenance will become the established norm in industry in the future.
The product presentations that followed were “VantagePoint Version 14 (Search Technology Inc.) and “Lighthouse IP Diamond File” (Lighthouse IP).
_____________________________________________________
Canadian “Patent Landscape Maps” – Conventional searches in patent documentation are almost obsolete.
_____________________________________________________
In “Mapping Canadian Patented Inventions” Mazahir Bhagat from the Canadian Intellectual Property Office presented methods with which large amounts of patent documents can be compactly visualized in map-like representations. The “Patent Landscape Maps” shown are regularly used in public reports by the Canadian Patent Office. In the past, they were implemented using licensed algorithms from patent information provider Derwent Technologies. Now they should be modeled using our own implementation based on freely available software. This in-house development enables the processing of larger amounts of data and is being further developed in collaboration with research institutions.
Jay Ven Eman from Access Innovations presented the final lecture on the first day of the conference under the title “Synonym and AI”. Eman presented difficulties that arise from synonymous terms in digital word processing and how these can be overcome with the help of AI. He presented several examples in which massive differences in search results in public databases arise without taking synonyms into account, including in medical terminology. In his opinion, parts of the solution consist in automatically assigning synonyms and suggesting them interactively when entering a search, for example with the help of AI and thesauri as well as the use of Knowledge Organization Systems (KOSs), which must be adapted to the respective industry standards.
*Dr. Bassam Mokbel is Chief Data Scientist at Symantec (Bielefeld).
Read the following part: Combination of rule-based search logic and ML-supported similarity search – Transfer learning application “EXTRA Classifier” expanded to include document classification and information extraction – How we can help ourselves with small data in the face of insufficient training data – Explain AI to users, their skepticism reduce – Preparation of found document quantities in “Spatial Concept Maps” and “Patent Citation Network Maps” – My conclusion
Collaborations
dpa takes over curation of Facebook News
(dpa) On behalf of Meta, the German Press Agency will be responsible for curating Facebook News from April 1st. As part of this offer, selected content from German media companies is published on the platform. Christian Röwekamp (51), who is currently responsible for the editorial department of Germany’s largest news agency, will take over the management of the newly created team. Meta’s contractual partner is the dpa subsidiary dpa-Infocom GmbH.
Under the umbrella of dpa-Infocom, Managing Director Frank Rumpf and Editorial Director Christian Röwekamp are putting together a team of experienced journalists who will exclusively take care of curating Facebook News and will not be involved in any other editorial processes. The dpa ensures that Facebook News users are provided with current information from the German media every day of the year and around the clock.
Media usage
27% of Germans listen to podcasts every week
(Ad Alliance) No other medium has found a permanent place in media usage in such a short time as podcasts. The whole of Germany (96%) knows podcasts, more than half (59%) have “listened” and around a third (27%) of Germans listen to podcasts regularly (at least once a week). This is proven by the special podcast evaluation of the Ad Alliance basic study Mobile 360°. These results are also supported by the “Podcast User Study”, a partner study by the technology and data-driven marketing expert annalect.
Podcasts are now firmly anchored in media usage settings. One indication of this is the increasing usage time: More than half of the podcast listeners surveyed (86%) devote over 60 minutes of their time per week to the medium. The majority (62%) listen to one to three different podcasts regularly, while 19 percent listen to four to five. Podcasts where the individual episodes last 20 to a maximum of 30 minutes are popular. The smartphone app is the preferred device (79%). Eleven percent say they use their “Alexa” or similar smart speakers, and among 16 to 19 year olds the proportion is as high as 32 percent.
Podcast is the medium that is physically very close to the user – in or on the ear. 65 percent of those surveyed stated that they often (41%) or at least occasionally (24%) listen via headphones, i.e. in a focused and isolated manner. 95 percent want to concentrate entirely on the podcast or only get involved in things that have little or no distraction. Knowledge formats (61%) come first, followed by news & politics (48%), comedy (40%) and true crime (37%). The older the users are, the greater their interest in knowledge and information formats.
Three quarters of users welcome advertising if it means the offer can be used free of charge. The expectations for podcast advertising are these: it should be short (83%), clearly labeled (81%), compact (65%), appropriate (64%), professionally produced (62%) and, if possible, with added value (54% ).
Signal and Telegram in the test
Overall grade 1 for Signal – Telegram deficits
in security and privacy
(PSW Group) While the Signal messenger service is recommended by greats like Edward Snowden, the Telegram messenger is increasingly being criticized. The IT security experts at PSW GROUP www.psw-group.de have tested both messenger services for usability, terms and conditions and security .
Patrycja Schrenk, Managing Director of the PSW GROUP, commented on the results:
“Signal performed excellently. The barriers to entry are even lower and the user-friendliness is at least as high as that of WhatsApp. The entire test team was enthusiastic about the encryption, data storage and general protection mechanisms. A small drawback: The Signal Foundation does not host its servers itself and the English language in the legal texts makes it difficult to understand. Telegram, on the other hand, was less convincing. The messenger offers a variety of exciting functions and has no barriers to entry. However, there are deficits in terms of security and privacy. Telegram is not suitable for beginners who are looking for a secure messenger because it requires too many configurations to be able to use it with any degree of security. For security professionals, however, Telegram simply doesn’t offer enough privacy.”
_____________________________________________________
Signal in detail
____________________________________________________
Signal can be installed and used free of charge. Users with Android or iOS devices can install the messenger. Anyone who has Signal installed on their mobile device can also use Messenger on their desktop. Versions exist for Windows, macOS and, thanks to a Debian-based distribution, also for Linux.
Signal can do everything a messenger needs: text and voice messages, media and file sending in all common formats, voice and video calls, stickers and groups are also on board. The operation is just as intuitive as WhatsApp. The fact that a payment function is now to be integrated with Signal Payments can be seen as a practical functional update.
The encryption protocol is considered the “gold standard” in the industry. All communication content, i.e. calls, messages and files, are end-to-end encrypted without the user having to do anything. This means that even less experienced users can easily implement privacy.
Signal formulates its terms of use simply and clearly and the data protection declaration is also short and reduced to the essentials. The legal texts should be made available in German.
____________________________________________________
Telegram in detail
_____________________________________________________
Telegram can be used free of charge on all platforms. Mobile and desktop apps are available as well as web and unofficial apps. Telegram has many functions – Telegram is just as easy to use as any other messenger. The added functionality: Text, voice and video calls are on board, as are photo and video editing tools and an open sticker or GIF platform. “The many functions could possibly be a bit too much for less experienced users, but it is simply a matter of taste. But we got along well in the test,” said Schrenk.
Telegram wants to achieve “tap-proof” communication with end-to-end encryption of message content. The means to this end is the in-house development of the MTProto protocol. Unlike most other messengers, end-to-end encryption is not active by default on Telegram. Chats are only encrypted between the device and the server. The Telegram makers explain in their data protection declaration that chats are stored encrypted on the servers. However, Telegram’s data can be viewed. This also applies to cybercriminals if they manage to gain access to the servers. In addition, Telegram’s central servers are distributed all over the world. With Telegram, all conversations end up in the cloud – including backups, which take place automatically but whose key lies with the provider.
Communication content is permanently stored in the cloud and is only deleted when users delete messages or their accounts. It can happen that third parties receive all information about the saved account when a new telephone number registered with Telegram is assigned. Schrenk: “Anyone who gives Telegram permission to access the contacts allows Telegram to permanently store all numbers, including first and last names, of the contacts on the servers
The legal texts are informative, but the English language makes them difficult to read and the data protection declaration contains various vague wordings that make it difficult to understand.
OpenPassword
Forum and news
for the information industry
in German-speaking countries
New editions of Open Password appear three times a week.
If you would like to subscribe to the email service free of charge, please register at www.password-online.de.
The current edition of Open Password can be accessed immediately after it appears on the web. www.password-online.de/archiv. This also applies to all previously published editions.
International Cooperation Partner:
Outsell (London)
Business Industry Information Association/BIIA (Hong Kong)
Open Password Archive – Publications
OPEN PASSWORD ARCHIVE
DATA JOURNALISM
Handelsblatt’s Digital Reach