The Natural Programming Project is working on making programming languages and environments easier to learn, more effective, and less error prone. We are taking a human-centered approach, first studying how people perform their tasks and then designing languages and environments around people's natural tendencies. We focus on all kinds of programming, including professional programmers, novice programmers who are trying to learn to be experts, and end users, who program to support other jobs or hobbies, such as multimedia authoring, simulations, teaching, prototyping, and other activities supported by computing.
Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 6900 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. SPICE aims to overcome both limitations by providing an interactive language creation and evaluation toolkit that allows everyone to develop speech processing models, to collect appropriate data for model building, and to evaluate the results enabling iterative improvements.
NGramJ is a Java based library containing two types of ngram based applications. It's major focus is to provide robust and state of the art language recognition.
Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.
Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-pe
S. Aluisio, and R. Aires. Relatórios Técnicos do ICMC, 107. Instituto de Ciências Matemáticas e de Computação - Universidade de São Paulo, São Carlos-SP, (March 2000)
G. Angelova. Proceedings of the 13th International Conference on Conceptual Structures (ICCS 2005), volume 3596 of Lecture Notes in Computer Science, page 367-380. Springer, (2005)
L. Antiqueira, and M. Nunes. 3rd Workshop on MSc dissertation and PhD thesis in Artificial Inteligence (WTDIA'06) in the Proceedings of International Joint Conference IBERAMIA-SBIA-SBRN, Ribeirão Preto, Brazil, ICMC-USP, (October 2006)
L. Antiqueira, M. Nunes, O. Oliveira Jr., and L. Costa. Anais do XXV Congresso da Sociedade Brasileira de Computação (III Workshop em Tecnologia da Informação e da Linguagem Humana - TIL), page 2089-2098. São Leopoldo-RS, Brasil, (2005)