Category: Linguistics
-
Exploring the Boundaries of AI: Apple Investigates Generative AI for Siri
Apple is pushing the boundaries of artificial intelligence (AI) with its latest research and development efforts. According to a report by the New York Times, the tech giant is currently exploring generative AI concepts that could eventually be used to make Siri more powerful and effective. Generative AI is a form of artificial intelligence that […]
-
WebScraping As Sourcing Technique For NLP
Introduction In this post, we provide a series of web scraping examples and reference for people looking to bootstrap text for a language model. The advantage is that a greater number of spoken speech domains could be covered. Newer vocabulary or possibly very common slang is picked up through this method since most corporate language […]
-
Word2Vec Mexican Spanish Model: Lyrics, News Documents
A Corpus That Contains Colloquial Lyrics & News Documents For Mexican Spanish This experimental dataset was developed by 4 Social Science specialists and one industry expert, myself, with different samples from Mexico specific news texts and normalized song lyrics. The intent is to understand how small, phrase level constituents will interact with larger, editorialized style […]
-
Tokenizing Text, Finding Word Frequencies Within Corpora
One way to think about tokenization is to consider it as finding the smallest possible unit of analysis for computational linguistics tasks. As such, we can think of tokenization as among the first steps (along with normalization) in the average NLP pipeline or computational linguistics analysis. This process helps break down text into a manner […]
-
Frequency Counts For Named Entities Using Spacy/Python Over MX Spanish News Text
On this post, we review some straightforward code written in python that allows a user to process text and retrieve named entities alongside their numerical counts. The main dependencies are Spacy, a small compact version of their Spanish language model built for Named Entity Recognition and the tabular data processing library, Matplotlib, if you’re looking […]
-
N-Gram Analysis Over Sensitive Topics Corpus
I was recently able to do some analysis over the Sugar Bear AI violence corpus, a collection of documents classified by analysts over at the SugarBear AI group. The group has been classifying manually thousands of documents of Mexican Spanish news over the past year that deal with the new topics of today: “Coronavirus”, “WFH”, […]
-
Using Spacy in Python To Extract Named Entities in Spanish
The Spacy Small Language model has some difficulty with contemporary news text that are not either Eurocentric or US based. Likely, this lack of accuracy with contemporary figures owes in part to a less thorough scrape of Wikipedia and relative changes that have taken place in Mexico, Bolivia and other countries with highly variant dialects […]
-
Linguistics In The Enterprise
Why Linguistics (And Linguists) Are Always On The Back-Foot In An Enterprise Context Linguistics is often questioned by practitioners of Natural Sciences during informal and professional scenarios. Perhaps, this is the case due to the fact that the phenomena relevant to the Natural Sciences is more readily observable through instrumental means. The irony is that […]
-
Wearables, Speech Recognition & Musk: How Intel’s Loss Could Be Tesla Gain
Despite the famously late arrival to mobile computing, Intel did make certain strides before many others in the space of wearables in Mid-2013 and onwards. Much of it may have to do with the company’s strategic diversification which took place in mid-2013. Hundreds of Millions Poured Into Research & Development Intel invested at the very […]
-
Use Open Source Speech Recognition
To start, where would you ideally run a quick and dirty Speech Recognition project? Likely, the best place in a Windows 10 (and this could apply for a Mac as well) is in an Anaconda environment. Assuming this to be your case, I will proceed since some complications are avoided by how Anaconda interacts with […]
Recent Posts
- TikTok Chief Seeks to Allay US Fears of National Security Risks
- Fuerza Regida y Grupo Frontera han hecho lo que parecía imposible con su nueva canción de éxito “Bebe Dame”: han encabezado la lista Latin Airplay en Billboard.
- Russia’s Aggression Against Ukraine Leaves One Dead and 25 Injured
- Luis Conriquez: El Rey de los Corridos Bélicos triunfa a nivel internacional
- Preparing for Potential Unrest: Police Brace for Trump Arrest
Tags
AMLO Apple Asylum seekers Bebe dame Billboard California Canada Candida auris carlos slim CDC China chip Corridos belicos cruise Data manipulation deliverynetworking Donald trump drone EE.UU. Fuerza Rigida fungal infection google Grupo Frontera homelessness Human rights Inmigrantion Intel Latin music Luis Conriquez Mexico money inquiry Music Policy putin RUSSIA satellite Shou Zi Chew siri assistant technology telecommunications TikTok Ukraine crisis video specialist Vladimir Putin WAR