Post reply

Name:
Email:
Subject:
Tags:

Seperate each tag by a comma
Message icon:

Attach:
(Clear Attachment)
(more attachments)
Allowed file types: apk, doc, docx, gif, jpg, mpg, pdf, png, txt, zip, xls, 3gpp, mp2, mp3, wav, odt, ods, html, mp4, amr, apk, m4a, jpeg, aac
Restrictions: 50 per post, maximum total size 150000KB, maximum individual size 150000KB
Note that any files attached will not be displayed until approved by a moderator.
Anti-spam: complete the task

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: អរិយវង្ស
« on: February 11, 2020, 01:50:55 PM »

Vandami Bhante  _/\_ _/\_ _/\_

I kana think the OCR test is 90% acceptable.
 _/\_ _/\_ _/\_
Posted by: Dhammañāṇa
« on: February 10, 2020, 06:53:59 PM »

A short view on it: It seems to identify very well, at least the two fonts in good original quality. Large spaces cause it to place certain special characters like i or / and sometimes the Khmer zero becomes a roman 0 (as just different in size and Khmer uses also latin numbers.

Sounds all very fine and like ever Nyom Moritz thinks it's good and useable, Sadhu.

What does Nyom Cheav Villa think about the test OCR (character recognition)?
Posted by: Moritz
« on: February 09, 2020, 11:31:34 AM »

Just some short test with the, "Tesseract" open source software, which also the software at nextspell.com etc. is probably based on.

This is the result of OCR with Tesseract, for the attached image file:

Quote
បុព្វកាជា
+

មុននឹងអោយទេវតាជួយអ្នក .  ត្រូវអ្នកជ្ជយខ្លួនអ្នកជាមុនសិន = (ពាក្យ
អ្នកបស្ចិមប្រទេស ,

ព្រះធម៌នែព្រះសម្មាសម្ពុទ្ធ = ប្រាកដជាទ្រ[្រង់នូវមនុស្សលោក ៗ ប៉ុន្តែ
មុននឹងអោយ[្រះធម៌នែព្រះពុទ្ធអង្គជួយអ្នក/ ត្រូវអ្នកសិក្សារៀនស្វត្រព្រះធម៌រិន័យ
ជាពិសេលគឺការគោរពប្រតិបត្តិតាមព្រះឱវាទជាមុនសិន /==។

អាស្រ័យដ្ធចនេះហើយបានជា ទូលព្រះបង្គំ ខ្ញុំព្រះករុណា = អាត្មាភាព
បានដកស្រង់រៀបចំ = បោះពុម្ពសៀវភៅថ្វាយបង្គំព្រះជាថ្មីឡើងរិញ = = ដើម្បីចែកជា
ធម្មទាន  ។

សៀវភៅវដែលព្រះករុណា = អស់លោកអ្នកកំពុងអាននេះ  ខ្ញុំបានបញ្ចូល

ធម៌ខ្លះបន្ថែមពីគណៈកម្មការរៀបចំមុន = = ដ្ធចជារបៀបធ្វើវិសាខ = = និងមាឃបូជា
យំកិញ្ចា ក៏សុ សរសើរព្រះរតនរ្រែសង្ខេប [្រះរាជជីវប្រវត្តិសម្តេចប៉ាន និង
បុរាណភាសិត ពុទ្ធភាសិត  ។

តាំងតែរី ព.ស. ២៥៩៣៨ គ.ស. ១9៩៩9  ដែលប្រទេសជាតិ
ផ្តល់អោយមាន  និកាយឆម្មយុត្ត = សម្តេចព្រះសង្ឃរាជ បួរ គ្រី បាន
ព្រះរាជានុញ្ញាតអោយបោះពុម្ពចម្លងតាមច្បាប់ដើម = និងបញ្ចូលបន្ថែមខ្លះ ៗ តម្រូវ

ទៅតាមការខ្វះខាត ចំនួនបីលើក លើកទីមួយ ចំនួន ៥00០ ច្បាប់ លើកទីពីរ

 

ចំនួន ៥០0០ ច្បាប់ និងលើកទីបីនេះចំនួន ៥, ០00០ ច្បាប់ ។

If there are no licensing isseus which make it difficult, it would be possible to install Tesseract software on the new server, and also give it a web interface, maybe only accessible for members, because it might be quite resource-hungry if serving many.

Also, the dataset for Khmer recognition used here could then be trained further here by user input of training data and corrections.

_/\_ _/\_ _/\_
Posted by: Dhammañāṇa
« on: January 10, 2020, 11:51:07 AM »

Sadhu for effort!

(sure, most open and free things are layed out for gains, thats how the so called "free"-thing works. It's surely not easy, as told again and again, to come accross real generosity on internet)

Not sure, but it might be that https://kheng.info is also part of Nyom Danh Hongs undertaking and if seeing right, they all might have come from former given generasity by NGOs, like http://www.khmeros.info and the Christian undertakings by Nyom sungkhum (Nathan), https://www.sbbic.org/

After financal gains are no more seen, such undertakings break apart, and things are mostly sold off to google, windows... what ever.

Of course 99,95% of people don't understand anything but just consume, not knowing toward whom or what they increase debts.

As normal people are not free of sensual desires and also seldom having faith in precepts, all kinds of things are done with skills gained out and with support of faith.
Posted by: Moritz
« on: January 10, 2020, 11:33:28 AM »

In Facebook messenger two days ago:
Quote from: Moritz
Dear Mr. Danh Hong,

my name is Moritz Raguschat. I am from Germany.

I have heard/read about your great work of creating OCR and spell-checking software for the benefit of the public, offered at nextspell.com.

The way I came to know of it, and the reason I am contacting you, is via a Buddhist monk, the venerable Johann, residing near Phnom Aural currently, who was wondering about possible use of this software for the Sangha. (See request in the online forum/online "Wat" at http://forum.sangham.net/index.php/topic,9657.msg21395.html#msg21395)

Since abstaining from taking what is not given, and not using Facebook, Bhante was wondering about possible ways to contact. So I am writing to you here in Facebook. It would be great if you could come directly into contact via the online forum/online monastery http://forum.sangham.net.

Many thanks, and may you have a good day!

_/\_

Moritz


Quote from: Danh Hong
*thumb*

Ok. Thank you

Not sure yet what else to say, possibly not made clear enough and too much other worries to think about (in both heads involved here possibly ^-^).
I will try again later.


Meanwhile, I have learnt about the software, it seems to be based on the open-source OCR machine-learning software "Tesseract", which can also be easily installed on the new server, and a web-frontend built for it. I could do that. Just would need time, like for many other things.

The open-source Tesseract software already had some Khmer recognition for a longer time, but was possibly not yet trained very well yet.

The KhmerOCR project led by Mr. Danh Hong, was for the purpose of improving its accuracy, by feeding it with more training data.
Not sure if that improved training data has been integrated back into the original Tesseract as open-source data, or might be kept private for now.

I will try to ask for it, if it could be given to use for the Sangha.

_/\_ _/\_ _/\_
Posted by: Moritz
« on: January 08, 2020, 09:10:15 PM »

Vandami Bhante _/\_ _/\_ _/\_

I am trying to get in contact with Mr. Danh Hong via Facebook.

_/\_ _/\_ _/\_
Posted by: Dhammañāṇa
« on: January 08, 2020, 03:17:04 PM »

Any way to get in contact and speak about the matter of lack for the Sangha?
Posted by: Dhammañāṇa
« on: January 06, 2020, 09:23:04 AM »

Aramika   *

Ein oder mehrer Beiträge wurden hier im Thema abgeschnitten und damit in neues Thema "Independence " eröffnet, dem angehäng.
One or more posts have been cut out of this topic here. A new topic, based on it, has been created as "Independence " or attached there.
Posted by: Dhammañāṇa
« on: January 05, 2020, 06:04:58 PM »

This https://m.facebook.com/khmerocr/posts or this https://m.facebook.com/danh.hong2016 might be places to reach out to him.
Posted by: Dhammañāṇa
« on: January 05, 2020, 04:56:59 PM »

Nyom Danh Hong, possible the most specialist in Khmer digital writing, has a online portal for OCR (picture into text), spellcheck and word-break (zero white space maker to splitt words)

https://www.nextspell.com/

My person tried to find a contact to adress him. Not sure, since in Cambodia all are satisfied by google and facebook dependency, it might be that he charges for his services because no real founding for a lot of work (nobody here actually understands but uses).