Great that you have LDA up and running. NMF I think works better for smaller datasets with LDA scaling to larger ones. In regards to getting phrases included there are 3 things you can try: 1. Instead of a bag of words matrix, try a matrix of tri-grams and bi-grams, 2. try to identify phrases and include these as words in the bag of words matrix — see my Rake algorithm implementation (https://github.com/aneesha/RAKE) and 3. try this add-on to LDA which labels topics using phrases (https://github.com/xiaohan2012/chowmein). I will be posting a blog post soon on labeling topics derived from topic models.