Re: Read corpus documents sentence by sentence instead of linewise

2015-05-21 Thread Stephan Ewen
If you want the inputs to be chunked by sentence, you can try and split sentences by the period character. You can do this with the DelimitedInputFormat, by setting the delimiter. The readAsText uses actually a special case delimited input format that splits at line breaks. Greetings, Stephan

Read corpus documents sentence by sentence instead of linewise

2015-05-20 Thread Felix Schüler
Hi! We have implemented a transformer that computes a cooccurrence matrix for words within a given window. This matrix will then be used for unsupervised learning of vector representations for words (we basically implement this: http://nlp.stanford.edu/projects/glove/) Right now, we have implemen