Linguistics is the science of language as it relates to human cognition.
Metaphysical considerations on the properties of organic systems may seem far removed from the lower level details of language data, but the general idea is that the language faculty is ‘perfect’, has nearly exact properties that are recurrent and while not wholly describable by formal logic notation, better described by these systems than by statistical methods that try to mimic the process of predicting language competence.
Language As A Discrete Object
Human language is quirky relative to other organic systems because of its discrete properties, but infinite and principled variety. Language differs from other capacities, like the ability to recognise emotion in facial expressions (where a computer can outperform a human); it’s expressive power is provably infinite but human competence – the store of representative information concerning a given language – easily outperforms any computer model.
The core elements of human language are discrete too; there is no ‘half a sentence’ since at some level the stored model assigns an interpretation based on one whole interpretation of that linguistic element.
Objects in language are discrete and deterministic when their true meanings are clear to a speaker, but interesting ambiguities signal at varied interpretations. Their experiences are noisy and chaotic, but for the most part a restricted set of properties in the human mind define what a possible steady state for a human language.
Trends Within The Field: Stochastic vs Discrete Linguistics
Corpus Linguistics and Theoretical Linguistics have often been thought to be in opposition when investigating linguistic phenomena. They are often deemed as distinct ways to view the same object: competence of a linguistic system. However, each are better seen as valid, complementary approaches and distinct ways to model distinct performance and competence phenomena.
In theoretical linguistics, we’re concerned with the discrete study of linguistic competence. Several biological and psychological arguments premise the related questions of “what are the general conditions that the human language faculty should be expected to satisfy in order to execute a language?” and “how do these conditions define the language faculty”? The latter two questions, sourced from Howard Lasnik’s foreword in the Minimalist Program, are the domain of theoretical linguistics.
The broad characterisation of language in text, the sourcing, curation and stochastic analysis of a domain specific text is corpus linguistics, an analysis of language in the context of performance.
Computational Rules & The Lexicon
There is a firm partition between the functional and substantive parts of a language.
The substantive words are what are commonly called ‘nouns’, ‘verbs’ or ‘adjectives’. These elements describe the world, but their relations, nuance that is less salient but necessary for interpretation are relegated to the functional elements of language.
In computational linguistics, functional words are referred to as ‘stop words’, though the term can be given an application-specific definition used to cover high-frequency vocabulary that recurs within a corpus, but does not signal a topic in text. Without much of a definition, a function word tells you about how the substantive words relate.
A lexicon is a repository of word information.
The repository contains all the unprincipled details of a word that defines it uniquely relative to all other words.
In a lexicon, these unique details are idiosyncratic there is no in-depth explanation to why and how these details emerge. Rather, the details are assumed a priori under any theory of language.
Any language can be represented as a set of principles instantiated with specific parameters. There are multiple modular components in such a system handling different cognitive tasks, like Semantic Interpretation or Grammatical Inference.
The semantic or interpretive module for a language is called it’s Logical Form (May 1977) while its Phonetic Form parallels this module in the spoken sense. Additionally, interfacing with both modules are grammatical principles or Deep Structure that essentially proves a language string is obeying the principles of that language.