In this paper , we firstly present the whole framework of the system , including the introduction of the main functional module . next , in the part of data preprocessing , we design a method of collecting click - stream data in the application server layer and preprocessing them with real time ; in the part of data mining that is data analyzing , we research and implement an extended attribute - oriented induction algorithm which applies to data generalization analysis , and that , we also design and implement an hybrid - dimensional association rule mining algorithm for associative analysis . in the end , on the e - business web site system of jiangsu changjiang electronic group corp , we design and implement an intelligent dss ( idss ) with the help of the above algorithms 論文首先給出了系統(tǒng)的整體框架體系結(jié)構(gòu)設(shè)計,以及主要的功能模塊介紹;接著,在數(shù)據(jù)預(yù)處理部分,設(shè)計了在應(yīng)用層收集點擊流數(shù)據(jù)并且對其進(jìn)行實時預(yù)處理的方法;在數(shù)據(jù)挖掘即數(shù)據(jù)分析部分,研究與實現(xiàn)了用于數(shù)據(jù)概化分析的面向?qū)傩砸?guī)約的擴(kuò)展算法,以及設(shè)計并實現(xiàn)了用于關(guān)連分析的混合維關(guān)聯(lián)規(guī)則挖掘算法;最后,在江蘇長江電氣集團(tuán)的電子商務(wù)網(wǎng)站系統(tǒng)上,利用已分析的算法設(shè)計并實現(xiàn)了一個智能決策支持系統(tǒng)。
Then , a prototype system of this model has been implemented based on network monitoring and data mining . in this prototype many basic functions have been accomplished such as raw data capturing , behavior data preprocessing , mode definition , mode mining , mode maintenance and mode contrast . finally , this paper have researched mode update and anomaly identification tentatively and given some farther suggestions of improvement 論文中對該模型進(jìn)行了整體規(guī)劃和詳細(xì)設(shè)計,并利用網(wǎng)絡(luò)監(jiān)聽和數(shù)據(jù)挖掘等技術(shù)實現(xiàn)了一個網(wǎng)絡(luò)訪問行為分析的原型系統(tǒng),完成了用戶訪問行為原始流量的捕獲,行為數(shù)據(jù)預(yù)處理,行為模式的定義、挖掘、維護(hù)以及當(dāng)前行為與歷史行為模式比對等基本功能,并對模式更新、異常識別等方面進(jìn)行了嘗試性研究,提出了進(jìn)一步完善模型的若干設(shè)想。
This essay first dicussed the key steps of preprocessing in web log mining , which include data abstract , data cleaning , user and session identification and path completion etc . especialy we proposed the algorithm of the web log data preprocessing include frame page . and secondly we discussed the technology of building an adaptive web site , include log data cluster mining , user visiting pattern learning , site structure transformation and presentation etc . ; and we proposed indual user log visiting pattern , user model onling learning algorithm , index pages synthesising algorithm , site structure transformation and presentation algorithm and so on 本論文首先討論了web日志挖掘預(yù)處理中的各步驟:數(shù)據(jù)抽象、數(shù)據(jù)清洗、用戶與會話識別、訪問路徑補(bǔ)全,給出了每一步驟的算法實現(xiàn);并特別討論了含有frame頁的日志數(shù)據(jù)預(yù)處理過濾算法。其次討論了構(gòu)建自適應(yīng)站點技術(shù),包括日志數(shù)據(jù)聚類挖掘、用戶訪問模式學(xué)習(xí)、站點結(jié)構(gòu)轉(zhuǎn)化與呈現(xiàn)等;提出了單用戶日志訪問模型,給出了用戶模型在線學(xué)習(xí)算法、索引頁面綜合算法、站點結(jié)構(gòu)轉(zhuǎn)化及呈現(xiàn)算法等。
The research results include the follows : present some rules for selecting measuring method according to the characteristics of product , study some problems in data preprocessing such as noise removing , probe radius compensation , edge data extracting , sun - regions merging , and local data mending or re - sampling etc . to ensure the part cad models being built and assembled accurately at dimension and shape in re modeling application , three kinds of model modifying techniques are presented such as model - based modifying , drawing - based modifying and physical model - based modifying 這些問題包括:根據(jù)產(chǎn)品表面形狀及建模方法,提出了一些選擇測量方法的原則;研究了消除測量數(shù)據(jù)噪聲信號、 cmm測頭半徑補(bǔ)償、測量數(shù)據(jù)邊界處理、測量數(shù)據(jù)塊拼合、數(shù)據(jù)補(bǔ)缺與補(bǔ)測等測量數(shù)據(jù)預(yù)處理問題。為了解決復(fù)雜外形產(chǎn)品re建模經(jīng)常遇到的零件cad模型配合問題與尺寸精度、形狀精度問題,作者研究了基于cad模型的修正、基于零件圖樣的修正、基于物理模型的修正等三種模型修正技術(shù)。