Fun with MacRuby

To be ready for 2010, I’m taking some time off relaxing and spending time with my family in Florida.

During my free time, I’ve been reading, catching up on movies and TV shows and worked on the MacRuby book that I am writing for O’Reilly.

I wrote a bunch of small apps, played with various APIs and every single time I was amazed by all the goodies Apple makes available to developers. My most recent discovery is very simple but I wanted to share it with you.

I often type text in English, French and Spanish and I even mix the languages from time to time. SnowLeopard comes with a great spellchecker that auto detects the language I’m typing in and is most of the time correct. It’s a very impressive feature and I was wondering if, as a MacRuby developer, I could use one of Apple’s lib to detect what language is being used.  I dug through the documentation but didn’t find anything. I started looking at some header files and found the API to use :)

framework 'Foundation'
class String
  def language
    CFStringTokenizerCopyBestStringLanguage(self, CFRangeMake(0, self.size))
puts "Bonne année!".language
# => "fr"
puts "Happy new year!".language
# => "en"
puts "¡Feliz año nuevo!".language
# => "es"
puts "Felice anno nuovo!".language
# => "it"
puts "أعياد سعيدة".language
# => "ar"
puts "明けましておめでとうございます。".language
# => "ja"

The documentation says that the result is not guaranteed to be accurate and that typically 200-400 characters are required to reliably guess the language of a string. (CFStringTokenizer Doc)

Probably not the most useful piece of code, but really cool none the less :)

Happy new year!

Similar Posts


  1. #1 by Sven - December 30th, 2009 at 02:22

    This is really cool stuff. Thanks for sharing!

    puts “Frohes neues Jahr!”.language
    # => “de”


  2. #2 by /pseudocipher - December 30th, 2009 at 02:30

    Sweet! Any idea when the MacRuby book will be out?

    • #3 by Matt Aimonetti - December 30th, 2009 at 08:21

      I don’t really have a deadline yet. I’d say sometime next year, hopefully in between 0.6 and 1.0 releaes but since the content is under Creative Commons license, some chapters should be online soon.

  3. #4 by Abdulaziz Al-Shetwi - December 30th, 2009 at 03:34

    Nice feature!
    I love it ..

    puts “شكراً جزيلاً”.language

    you may think of translate

    puts “شكراً جزيلاً”.translate

    Hope you enjoy in your vacation

    • #5 by Matt Aimonetti - December 30th, 2009 at 08:24

      Thanks Abdulaziz, I had to use Google translate since OS X doesn’t offer a translation API (yet) :)

  4. #6 by Jamie Orchard-Hays - December 30th, 2009 at 08:53

    Cool stuff. Are you guys aware that you have no left margin for this blog in Safari? I had to open this in Camino to read.

  5. #9 by r4ito - December 30th, 2009 at 09:26

    That’s cool, too bad I don’t own a Mac :(

    Just wanted to say, I also see no left margin here, using Firefox 3.5.6 on GNU/Linux, here a screenie

    Feliz año nuevo!

  6. #10 by Aaron Patterson - December 31st, 2009 at 11:29

    Awesome! I totally tricked it though:

    puts “注文”.language


    puts “class Foo; end”.language

    Reports “en”, but should be “ruby”. ;-)

    • #11 by Matt Aimonetti - December 31st, 2009 at 11:43

      hehe maybe you should work for Apple and make their language analyzer even better, here is what the doc says:

      The result is not guaranteed to be accurate. Typically, the function requires 200-400 characters to reliably guess the language of a string.

      CRStringTokenizer recognizes the following languages:

      ar (Arabic), bg (Bulgarian), cs (Czech), da (Danish), de (German), el (Greek), en (English), es (Spanish), fi (Finnish), fr (French), he (Hebrew), hr (Croatian), hu (Hungarian), is (Icelandic), it (Italian), ja (Japanese), ko (Korean), nb (Norwegian Bokmål), nl (Dutch), pl (Polish), pt (Portuguese), ro (Romanian), ru (Russian), sk (Slovak), sv (Swedish), th (Thai), tr (Turkish), uk (Ukrainian), zh-Hans (Simplified Chinese), zh-Hant (Traditional Chinese).

      Maybe they’ll add Ruby language support to the next version ;)

      In any case, feel free to hack around CF’s string tokenizer and come up with some crazy ideas :)

      • #12 by Aaron Patterson - December 31st, 2009 at 18:37

        Ya, I hand crafted that example because I knew how to trick it. There is no way to disambiguate the example I gave. :-(

        Maybe we can make it support ruby via ripper or something!

  7. #13 by Can Koozies - January 3rd, 2010 at 19:02

    So cool!
    I admire u !

  1. No trackbacks yet.

Comments are closed.